The Devil is in the Data

Machine Learning Review of Millions of Patient Visits Reveals Many Undetected Cases of Self-Harm

When it comes to making sense of big data, it's sometimes hard to see the forest for the trees.

But Christophe Lambert, PhD, and his colleagues in UNM's Center for Global Health recently used a machine learning method to detect a disturbing pattern hidden in millions of medical insurance billing records.

In a paper published last month in the Journal of the American Medical Informatics Association, the team reported their finding that instances of self-harm among people with major mental illness seeking medical care might actually be as much as 19 times higher than what is reported in the billing records.

The finding suggests that physicians and other care providers often assign standardized billing codes for the care they provide that obscures the possibility a patient's injury might really be due to self-harm, rather than accident.

The finding suggests this could have a bearing on patient care.

"Forthcoming studies of ours suggest that a person faces a more than three-fold risk of self-harm if he or she has done it once before," says Lambert, an associate professor in the Department of Internal Medicine. So in seeking to prevent further self-harm or suicide, "If you're not coding it, it means the future treatment of the patient may be compromised by not having that important information in their history," he says.

Lambert and his team started their study with an anonymized database containing the medical billing records for more than 130 million Americans from 2003 to 2016. They narrowed their research to a subset of about 10 million patients with diagnoses of major mental illness, including major depressive disorder, bipolar disorder, schizophrenia and schizoaffective disorder - people who are already considered at higher risk for self-harm.

Machine learning, in which a computer applies an algorithm to rapidly analyze a large data set, can identify patterns that aren't readily apparent to humans. In this case, the researchers supplied the computer with 185,000 variables to apply to each patient's inpatient and emergency room visits.

"We actually threw in the kitchen sink," Lambert says. "It was basically anything that happened in those visits - including all procedure and diagnostic codes." Among the findings that emerged was that cases of likely self-harm were drastically under-reported.

There were also unexpected discrepancies between cases that were evaluated as self-harm and those that were not.

People who were treated for intoxication and poisoning, accidents, asphyxiation, chest and head surgical repair, wrist wound, self-harming thoughts, depression and psychotherapy were more likely to be coded for self-harm than those presenting with substance use disorder, heroin poisoning, neurological disorder, vehicle accidents or falls.

That suggests that some of the discrepancy may be due to what motivation providers ascribe to a particular behavior, Lambert says.

"We see on average when someone's hurt themselves through an opioid overdose or drugs that have pleasurable effects - they're less likely to code it as self-harm," Lambert says. But an assessment of self-harm is more likely when someone has overdosed on aspirin or sleeping pills, presumably with self-harming intent.

"Males are also more likely to have self-harm be under-coded than females," Lambert added, "and stereotypes that men are less likely to disclose or get help than women were contradicted by the data - it appears likely to be a bias in provider coding based on the sex of their patients."

When under-coded self-harm was revealed, detailed estimates of its risk as a function of age, mental illness diagnosis, sex, and U.S. state emerged. Peak risk for self-harm is age 15 for females and 17 for males, declining after the mid-20s.

Self-harm rates have steadily risen nationally since 2006, and people with more than one major mental illness diagnosis have an 18-25% chance per year of harming themselves between the ages of 15 and 26, where risk is highest.

The study was part of a larger body of research Lambert has been conducting with a $2.4 million Patient-Centered Outcomes Research Institute award to compare the effectiveness of various treatments for bipolar disorder, particularly as they relate to instances of self-harm, hospitalization and the risk of side effects.

While the study focused on how patient care is classified, Lambert believes the method could potentially be used in a predictive framework.

"One could use machine learning in another way, based on your history, including cases of prior imputed self-harm," he says. "Are you in a high-risk category because of that and/or other factors where proactive treatment could help?"

Lambert is also optimistic that large-scale data analysis can reveal useful insights to inform medical decision-making.

"Can we learn something from these data sets?" he asks. "Coding is imperfect, humans are imperfect, but in the aggregate when we have very large datasets a lot of that noise can average out and we can get meaningful answers and evidence."

Categories: Health, Research, School of Medicine, Top Stories

Contact for Members of the Press

Chris Ramirez
(505) 313-3429
cramirez@salud.unm.edu

The Devil is in the Data

Machine Learning Review of Millions of Patient Visits Reveals Many Undetected Cases of Self-Harm

Contact for Members of the Press

Related Stories