Detection and removal of texts burned in medical images using deep learning

Files

Supervisors: Jodogne, Sébastien
Degree label: Master [120] : ingénieur civil en science des données, à finalité spécialisée
Abstract: This master thesis addresses a critical yet under-explored aspect of medical imaging: the removal of burned-in text that can bias AI algorithms and threaten patient confidentiality. While anonymization of DICOM tags is a well-established practice, the removal of patient identifiers burned into the pixels of medical images often poses a challenge due to the lack of accessible, open-source tools. To address this, this study presents a new approach involving the application of a scene text detection algorithm called TextBoxes. Using a synthetic dataset derived from The Cancer Imaging Archive, we trained different models and obtained impressive results. This achievement led to the creation of MedTextCleaner (MTC), an open-source and user-friendly plugin for Orthanc. MTC, incorporating our deep learning model, automates the de-identification process by suggesting potential text regions within an image for removal and allowing for manual user validation and adjustment. The integration of MedTextCleaner into Orthanc represents a valuable development in DICOM image anonymization, which can be especially useful for teaching and research applications.