Our expert answers 3 Questions
In recent years, challenges and opportunities associated with representation, modeling, analysis and visualization of Big Data have proliferated in all scientific disciplines and evidence-based investigations. For centuries mankind has dealt with collecting data, developing models and predicting the course of various natural phenomena. For instance, in 1690, Sir Isaac Newton proposed the heliocentric model of the Solar system that allows us to predict the motion of celestial bodies, and in 1797, Sir Henry Cavendish discovered hydrogen and predicted the mean density of the Earth, allowing us to estimate the forces between Earth and other planets. The transition from observational to experimental, theoretical, and computational sciences between the 17th and the 20th centuries continues today in the 21st century in the form of Big Data Discovery science.
Big data is exotic because of the challenges it presents as well as its potential to revolutionize our understanding of the world around us. In my research, I have identified six dimensions of Big Data that make it unique: (1) Size, (2) Incongruency, (3) Incompleteness, (4) Complexity, (5) Multi-scale, and (6) Multi-source. These “6Ds” allow us to begin the processes of model, algorithm and tool development necessary to cope with the specific challenges that make “standard scientific methods” impractical for Big Data. Big Data science is team-based and highly trans-disciplinary. It requires diverse, broad, and complementary expertise, specialized training, continuous development efforts, high-throughput analytics, and powerful techniques for data interrogation. The scope of Big Data challenges are illustrated by the problem of analyzing observational data of thousands of Parkinson’s disease patients based on tens of thousands of signature biomarkers derived from multi-source imaging, genetics, clinical, physiologic, phenomics and demographic data elements (one of my research projects). Similar data-driven exploratory and confirmatory analytic examples can be demonstrated in other disciplines. Software developments, student training, service platforms and methodological advances associated with Big Data Discovery science all present exciting opportunities for learners, educators, researchers, practitioners and policy makers.
The impact of Big Data on healthcare research and practice is two-fold. First, the volume and value of Big Data follow exponential growth models – volume is exponentially increasing and data value is exponentially decreasing (following the point of its acquisition). Thus, embracing the complexities and harnessing the power of large and heterogeneous health and biomedical datasets will be critical to advance and improve clinical practice. Second, there are enormous policy implications related to research funding and resource allocation, protection of personal information, and management of national security risks that may be associated with some collection, storage and analysis of sensitive data. Extremely liberal or conservative policy decisions are likely to stifle creativity, negatively impact stakeholders on the short-term, or suppress knowledge discovery on the long-run. Open-science principles should be embraced to engage the entire community in study designs, methods development, broad validation, collaborative knowledge discovery and sharing of best practices.