Big Data analysis does not produce patient privacy nightmare

Health data needs to be shared in anonymised datasets to help more patients, but unfounded worries about privacy may be holding this back, say researchers

Privacy concerns that anonymous patients could be “re-identified” without their consent while being analysed by artificial intelligence may be holding the entire global health industry back from exploiting new opportunities, according to new research. 

Existing healthcare algorithms rely on huge amounts of data designed to be stripped of personal information. A team of MIT researchers has quantified the potential risk of patient re-identification in new research and found that between 2016 and 2021 - the period examined in the study led by MIT Principal Research Scientist Leo Anthony Celi - there were no reports of patient re-identification through publicly available health data.

MIT’s findings suggest the potential risk to patient privacy is outweighed by gains for patients, says Celi, who hopes these datasets will include a more diverse group of patients and become more widely available.

“We agree that there is some risk to patient privacy, but there is also a risk of not sharing data,” he says. “There is harm when data is not shared, and that needs to be factored into the equation.”

Celi is the senior author of the new study and Kenneth Seastedt, a Thoracic Surgery Fellow at Beth Israel Deaconess Medical Centre, is the lead author of the corresponding paper published in PLOS Digital Health. The research was funded by the National Institutes of Health through the National Institute of Biomedical Imaging and Bioengineering.

When patient data is entered into large health record databases created by hospitals and other institutions, certain types of identifying information are typically removed, including patients’ names, addresses, and phone numbers. This is intended to prevent patients from being re-identified and having information about their medical conditions made public.

However, concerns about privacy have slowed the development of more publicly available databases with this kind of information, Celi says. In the new study, he and his colleagues set out to ask what the actual risk of patient re-identification is. 

Patient privacy is important, but cyber security is the biggest threat

Researchers searched PubMed, a database of scientific papers, for any reports of patient re-identification from publicly available health data, but found none. They also examined media reports from September 2016 to September 2021 and say they could not find a single instance of patient re-identification from publicly available health data. During the same time period, the health records of nearly 100 million people were stolen through data breaches, the research team noted.

“Of course, it’s good to be concerned about patient privacy and the risk of re-identification, but that risk, although it’s not zero, is minuscule compared to the issue of cyber security,” Celi says.

More widespread sharing of de-identified health data is necessary, Celi says, to help expand the representation of minority groups in the United States, who have traditionally been underrepresented in medical studies. He is also working to encourage the development of more such databases in low- and middle-income countries.

“We cannot move forward with AI unless we address the biases that lurk in our datasets,” he says. “When we have this debate over privacy, no one hears the voice of the people who are not represented. People are deciding for them that their data need to be protected and should not be shared. But they are the ones whose health is at stake; they’re the ones who would most likely benefit from data-sharing.”

Instead of asking for patient consent to share data, which he says may exacerbate the exclusion of many people who are now underrepresented in publicly available health data, Celi recommends enhancing the existing safeguards that are in place to protect such datasets.

“What we are advocating for is performing data analysis in a very secure environment so that we weed out any nefarious players trying to use the data for some other reasons apart from improving population health,” he says. “We’re not saying that we should disregard patient privacy. What we’re saying is that we have to also balance that with the value of data sharing.”

Share

Featured Articles

Mobile AI in 2024: Unlocking smartphone opportunities

From Samsung, to Google, to Qualcomm, AI Magazine considers how enterprises are unlocking further value in Mobile AI via smartphones and other devices

A year of events: Tech LIVE Virtual, Cloud & 5G LIVE & more

We look back at our events from 2023, which focused on some of the hottest topics in technology: from sustainability and AI to quantum computing

Magazine roundup: Top 100 women in technology 2023

We take a look at some of the leading women in the tech sector and how their contributions to the field are advancing global digital transformation

OpenAI preparedness framework: Enhancing global AI safety

Machine Learning

GenAI as key to accelerating digital transformation in India

AI Strategy

Humane chooses cloud telecom Optiva BSS for AI Pin launch

AI Applications