Dr. Andrew Weil of the Arizona Center for Integrative Medicine states, “If we can make the correct diagnosis, the healing can begin. If we can’t, both our personal health and our economy are doomed. [1]” Accurate, traceable, and informed diagnosis are paramount to the health of patients. Medicine, within the United States, is an enormous $2.5 Trillion opportunity and with the proliferation of research over many decades new medical findings are being published than could ever be grokked, unassisted, by a doctor [2]. Current advances in data mining have moved to alleviate the shear volume of information available to doctors. Medical data mining offers doctors the tools to distinguish anomalies from trending epidemics; the tools to make evidence based decisions, ultimately leading to more personalized medicine [2]. Hays, describes a tower of Data Mining [2]. The base of this tower is the raw data within the medical field, the images, structured records and unstructured doctors notes. This information is processed, and filtered into patterns and clusters. These clusters form the Information tier of Hays’ tower. Information here is a more refined form of the data, yet remain fully traceable back to the original source. This Information is then processed again to extract knowledge. Knowledge then leads to the top tier of Hays’ model: Decisions. Decisions are the actions of a diagnosis. Once a doctor arrives to a diagnosis, as Dr. Weil states, “healing can begin… ”. How can this theoritical model be realized? And once prototypes are available, how can such a system be rolled out to doctors without losing the valuable Knowledge tier? One such system, offered by IBM, is a predictive analysis engine which offers probable diagnosis with cited reasoning [3], and its internal representation language Predictive Model Markup Language [4]. Predictive Analysis is a branch of data-mining to make predictions about some future event [5]. Predictive Analytics are typified by their use of a scorecard. One of the more common experiences with such a scorecard, at least in the United States, is that of the FICO credit score, which is a single number or score describing a person’s likelihood to default credit [3]. A scorecar is predictive analysis’ greatest strength as a model. While more powerful, non-linear, models such as neural-networks are capable of spectacular predictive results, neural networks do not provide a traceable artifact to substantiate their predictions [6]. In the face of malpractice and other fault finding behaviors, a traceable score card is invaluable. With such powerful tools, how do we assemble the initial Data tier of Hays’ tower: Digital Records.

The medical industry has been slow to incorporate digital record keeping, yet current legislation incentivized the practice. Now, digital records emanate from every facet of medicine. The uptake of these digital records is quickly enabling Big Data technologies of all branches to exercise their might. Yet, the biggest impact these models can make is that in rural areas where doctors might not see a volume of patiences sufficient to judge trends or induce the proper research for truly unique cases. Predictive models, plugged into  the global economy of data can offer even small town doctors exceedingly more accurate and better cited diagnosis [3]. Despite the potential power of these tools, the availability of systems for production use is meager. Hayes, cites that the translation from “scientists’ desk” to the doctors’ hand is fraught with technical challenges ranging from privacy concerns, to IT roll out. During these challenges the Knowledge of these systems is sometimes lost. IBM, offers such a solution to preserve Knowledge by providing a standard schema for the predictive models: Predictive Model Markup Language (PMML). This language derived from XML, is a de facto standard method for encoding the Knowledge leveraged by these predictive systems [4]. Independent of the IT processes, database vendor, or operating systems providing doctors access to these platform PMML preserves Knowledge during both storage and exchange.

Data Mining, has already made a profound impact on medicine by allowing doctors to make informed, cited decisions through highly processed and filtered information, yet the best of the technology is still yet to come. Digital records are still coming into being, and even with the global economy, medicine is still very closed within state governments. Beyond the international privacy and standardization hurdles lie a nirvana of global medical knowledge, Knowledge allowing doctors to improve the health of humanity.

References

[1] D. A. Weil. (Aug. 2009). The wrong diagnosis, The Arizona Center for Integrative Medicine, [Online]. Available: http : / / www . huffingtonpost . com / andrew - weil -md/the-wrong-diagnosis_b_254227.html. [2] T. P. Hays. (Dec. 2012). Medical data mining, National Institute of Standards and Technology, [Online]. Available: http://www.nist.gov/healthcare/upload/Hays-Medical-Data-Mining-slides-for-web.pdf. [3] A. Guazzelli. (Nov. 2011). Predictive analytics in healthcare, IBM, [Online]. Available: http://www.ibm.com/developerworks/library/ba-ind-PMML3/. [4] Wikipedia. (Aug. 2013). Predictive model markup language, Wikipedia, [Online]. Available: http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language. [5] ——, (Aug. 2013). Predictive analytics, Wikipedia, [Online]. Available: http://en.wikipedia.org/wiki/Predictive_analytics. [6] T. Segaran, Programming Collective Intelligence, 1st ed., O’Reilly, Ed. O’Reilly, 2008.