Our expert answers 3 Questions
Modern analytic capabilities are highlighting the impact of how medications, lifestyle choices, and both physician and patient behaviors influence health outcomes. However, analytic tools including machine learning, deep learning, and adaptive modeling have outpaced the data sources available to investigators. Our team’s work focuses on methods for data acquisition from source text records and imaging studies to better capture and quantify an individual’s health status in longitudinal datasets. My clinical focus is in the management of high-expenditure, low prevalence diseases, specifically the inflammatory bowel diseases.
The excitement of data science rests in the appreciation of the overwhelming quantity of digitized information that is available. The frustration of data science is that these rich data sources have proven difficult to refine into formats easily analyzed. Techniques for extracting discrete data elements that reliability reflect severity and context are now available. The development of unsupervised automated methods health data extraction is a very stimulating endeavor. Context-sensitive understanding of free text and imaging data requires collaboration between data management engineers, linguistics experts to highlight techniques for defining intention and situational lexicons, and bio-informatics sciences.
Our immediate goal is to develop methods for the automated extraction of quantitative organ-specific data from imaging studies and clinical text reports to capture an individual patient’s condition at a single point in time. At scale, these methods will allow health care researchers to improve characterization of disease status based in much larger populations.