New Search Engine Capabilities Can Expose Patient Identifiers Thought to Be Anonymized
Search engines can now index patient identifiers in slide presentations that were previously believed to have been de-identified. The American College of Radiology® (ACR®), Radiological Society of North America (RSNA) and Society for Imaging Informatics in Medicine (SIIM) urge radiologists and allied medical professionals to use the guidance below to help ensure that no protected health information (PHI) exists in their presentations. Organizations that may host such content should ensure that it is appropriately de-identified.
Healthcare providers frequently create presentations containing medical imaging for many worthwhile purposes. Patient privacy guidance including the Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR) may extend to these situations. Providers may be responsible for protecting their patients’ privacy in this context just as they are in routine clinical operations.
Advances in web-crawling and content processing technology employed by search engine vendors (e.g., Google, Bing and others) increasingly enable large-scale information extraction from previously stored files. Among other things, this technology can extract source images contained in PowerPoint™ presentations and Adobe® PDF files, and recognize alphanumeric character information that may be embedded in the image pixels. As such, an image with embedded patient information can be indexed by this process. When explicit patient information becomes associated with images in the search engine database, it can be found on subsequent internet searches on the patient’s personal information.
When a patient searches her name in a search engine, images from a diagnostic imaging study performed four years ago appear. When she clicks on the images, she is directed to the website of a professional imaging association which stored an Adobe® PDF file as part of an educational presentation. The association was unaware that the file contained PHI. The author of the file was unaware that PHI had not been sufficiently de-identified prior to creating the original presentation in PowerPoint™ format, and that the saving in Adobe® PDF format also had not preserved privacy.
Only images without PHI should be included in presentations of any kind. To assure no PHI is included, screen capture software should be used to capture the image pixels for the region of interest only. Or, the user can disable patient information overlays or use an anonymization algorithm embedded in the PACS before saving a screen or active window representation. Alternatively, the creator of the presentation can use third-party image processing software (e.g., Adobe® Photoshop, IrfanView, etc.) to crop out or obscure PHI before inserting the resulting imaging information into a presentation.
Simply cropping out PHI with the image formatting tools available in presentation software (e.g., PowerPoint™, Google Slides™, Keynote®) does NOT permanently remove the PHI. Placing “black bars” or similar visual aids to obscure PHI within the presentation software also does not represent a safe and compliant practice for de-identification.
Specific functions are available in some software to permanently delete cropped, obscured or hidden information in presentation files. As a final quality control check, it is recommended that these "sanitization" functions be run on all presentations prior to being made public.
The ACR has created a resource page which collates best practices, highlights some available software functions designed to permanently remove sensitive content and provides additional assistance with safe creation of presentations.