Memorization Risk in Clinical AI

Key Takeaways

  • Patient privacy is a crucial aspect of medical ethics, and confidentiality remains central to practice, enabling patients to trust their physicians with sensitive information.
  • Artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information, potentially violating patient privacy.
  • Foundation models trained on EHRs should normally generalize knowledge to make better predictions, but they can also memorize private data, posing a risk to patient privacy.
  • Researchers have developed a series of tests to evaluate the potential risk EHR foundation models could pose in medicine, assessing various types of uncertainty and practical risk to patients.
  • Patients with unique conditions are especially vulnerable to data breaches, and higher levels of protection may be required to safeguard their information.

Introduction to Patient Privacy
Patient privacy is a fundamental aspect of medical ethics, and the Hippocratic Oath emphasizes the importance of confidentiality in the physician-patient relationship. As stated in the oath, "Whatever I see or hear in the lives of my patients, whether in connection with my professional practice or not, which ought not to be spoken of outside, I will keep secret, as considering all such things to be private." This commitment to confidentiality enables patients to trust their physicians with sensitive information, which is essential for effective healthcare.

The Risk of Data Leakage
However, with the increasing use of artificial intelligence models in healthcare, there is a growing concern about the potential risk of data leakage. Foundation models trained on EHRs can memorize patient-specific information, potentially violating patient privacy. As Sana Tonekaboni, a postdoc at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, notes, "Knowledge in these high-capacity models can be a resource for many communities, but adversarial attackers can prompt a model to extract information on training data." This risk is particularly concerning, as it could compromise the confidentiality of patient information and undermine trust in the healthcare system.

Evaluating the Risk of EHR Foundation Models
To assess the potential risk EHR foundation models could pose in medicine, researchers developed a series of tests to evaluate the privacy risk. These tests are designed to measure various types of uncertainty and assess the practical risk to patients by measuring various tiers of attack possibility. As Marzyeh Ghassemi, an MIT Associate Professor, explains, "We really tried to emphasize practicality here; if an attacker has to know the date and value of a dozen laboratory tests from your record in order to extract information, there is very little risk of harm. If I already have access to that level of protected source data, why would I need to attack a large foundation model for more?" The researchers found that the more information the attacker has about a particular patient, the more likely the model is to leak information.

Vulnerability of Patients with Unique Conditions
Patients with unique conditions are especially vulnerable to data breaches, as they can be easily identified and their information compromised. As Tonekaboni notes, "Even with de-identified data, it depends on what sort of information you leak about the individual. Once you identify them, you know a lot more." The researchers demonstrated how to distinguish model generalization cases from patient-level memorization, to properly assess privacy risk. They also emphasized that some leaks are more harmful than others, such as revealing a patient’s age or demographics versus more sensitive information like an HIV diagnosis or alcohol abuse.

Future Directions
The researchers plan to expand their work to become more interdisciplinary, adding clinicians and privacy experts as well as legal experts. As Tonekaboni notes, "There’s a reason our health data is private. There’s no reason for others to know about it." The study highlights the importance of ensuring the confidentiality of patient information and the need for rigorous testing and evaluation of EHR foundation models to prevent data leakage and protect patient privacy. By developing and implementing effective safeguards, healthcare providers can protect patient information and maintain trust in the healthcare system.

https://news.mit.edu/2026/mit-scientists-investigate-memorization-risk-clinical-ai-0105

Click Spread

Leave a Reply

Your email address will not be published. Required fields are marked *