Ontosight® – Biweekly NewsletterJune 17th – June 30th, 2024 –Read More
AI Is All Hype If We Don’t Have Access to the Right Data
Summary:
- AI could potentially speed drug discovery and save time in rejecting treatments that are unlikely to yield worthwhile results
- AI has evolved to parse this heterogeneous data, with programs that combine machine learning, natural language processing, and advanced text analytics.
- AI has evolved from first-wave optimization programs or “knowledge engineering,” to second-wave statistical learning programs or “machine learning,” finally arriving at third-wave hypothesis generation programs or “contextual normalization.”
As the pioneering neuroscientist Vilayanur S. Ramachandran says, “The fact that hype exists doesn’t prove that something is not important.”
It is true, that Artificial Intelligence is a hot topic. Currently, there are a lot of buzzwords and marketing dollars going into “data science” and “machine learning.” However, throughout the foreshortened modern history of technological innovation, disruptive technologies tend to undergo substantial revision evolving from what could happen to what does happen.
In this article, we will examine an underappreciated limitation of AI’s real-world utility: access to real-world datasets.
Artificial Intelligence in the Life Sciences
By definition, a disruptive technology is an innovation that generates a novel industry. In this context, the new AI programs constitute a significant future disruption. AI has evolved from first-wave optimization programs or “knowledge engineering,” to second-wave statistical learning programs or “machine learning,” finally arriving at third-wave hypothesis generation programs or “contextual normalization.” Now third-wave AI programs have the potential to look at big data, find the statistical patterns in it, then generate novel algorithms that explain why these patterns exist.
The potential of this technology to discover novel treatments has prompted pharmaceutical giants such as GlaxoSmithKline, Merck & Co, Johnson & Johnson, and Sanofi to invest in it as a potential competitive edge. Previously loose associations have proven capable of generating novel treatments such as the well-publicized and manually generated link between Raynaud’s disease and fish oil. Automating this process with the extended capacity of AI could potentially speed drug discovery and save time in rejecting treatments that are unlikely to yield worthwhile results.
Computers have certainly demonstrated superiority and even novel insights into analyzing patterns from complex datasets. In 2011, IBM Watson beat a human on Jeopardy requiring mastery of general knowledge and natural language processing. Pattern recognition programs are routinely used in ECG interpretation, although as an adjunct or “clinical decision support.” However, without context, and specifically vertical context, this analysis might have little transferability and even less real-world impact.
Indeed, AI also has several well-publicized failures. MD Anderson’s recent problems with IBM Watson highlight a vital issue in the field, namely that of dataset integrity. According to a University of Texas audit, when MD Anderson changed its electronic medical record (EMR) provider, the Watson software couldn’t access the data, and its conclusions became out-of-date.
Perhaps looking at this data issue in more depth can guide us in defending AI for the future. It’s possible the baby is fine, and it’s the bathwater that’s the problem.
How Data-Access Limits AI
AI and machine learning on their own are not enough. While there has been a lot of progress on advanced algorithms, if there is not enough access to all the data that’s out there, the algorithms can’t genuinely do their job.
Biological data is deep, dense, and diverse. In the past, most life-sciences datasets were insufficient to represent biological data accurately. Biological research relied on manually curated datasets collected and cleaned specifically to test a preconceived hypothesis. Curators allayed the expense of generating these datasets with proprietary interests in controlling access or an incentive to market the results. Dissemination of results through academic journals meant significant delays making conclusions obsolete and limiting access through profit-based portals that were industry and discipline-specific.
The results of the legacy system for creating and analyzing biological datasets have included system-wide publication bias and inaccuracies in the medical science.
Even the recent open-science movement — which attempts to democratize access to unpublished clinical research datasets and raw data from clinical trials — relies on narrow, manually curated datasets, often created by companies with proprietary interests.
While first-wave AI might be able to parse biased datasets, second-wave AI is heavily dependent on properly coded data-sets for training. However, the real limitation is for third-wave AI software, which takes observations from seemingly unrelated contexts and normalizes them.
A classic example is abbreviations in medical terminology, where one medical acronym might be the same as another, but interpreted differently by context, does “Ca” mean “cancer” or “calcium”?
Third-wave AI needs complex contextual information in order to optimize its function and manually curated datasets inherently reduce its utility.
A Change In Data
With the 2009 HITECH Act, medicine began the legislated introduction of EMR systems. The result has been pooled datasets of real-time, comprehensive biological information. This is in addition to datasets from elsewhere in the innovation ecosystem, data from biological patents, clinical trials, congresses, theses and more.
Previously this unstructured data was inaccessible to computing systems without human collation, checkboxes, drop-down menus, or diagnosis codes. Now, AI has evolved to parse this heterogeneous data, with programs that combine machine learning, natural language processing, and advanced text analytics.
Previously we had outdated, incomplete, and inaccessible structured data. Now for the first time, we can structure previously unstructured data, enabling real-time analysis from a wider pool of loosely-associated contexts.
With third-generation AI, we can get clean data, all in one place, that mirrors the complexity of true biological systems. Parsing this data give us quick, crisp and summarized snapshots of the current biomedical landscape in any given context.
About the author:
Gunjan Bhardwaj is the founder and CEO of Innoplexus, a leader in AI and analytics as a service for life science industries. With a background at Boston Consulting Group and Ernst & Young, he bridges the worlds of AI, consulting, and life science to drive innovation.
Featured Blogs
Machine learning as an indispensable tool for Biopharma
The cost of developing a new drug roughly doubles every nine years (inflation-adjusted) aka Eroom’s law. As the volume of data…
Find biological associations between ‘never thought before to be linked’
There was a time when science depended on manual efforts by scientists and researchers. Then, came an avalanche of data…
Find key opinion leaders and influencers to drive your therapy’s
Collaboration with key opinion leaders and influencers becomes crucial at various stages of the drug development chain. When a pharmaceutical…
Impact of AI and Digitalization on R&D in Biopharmaceutical Industry
Data are not the new gold – but the ability to put them together in a relevant and analyzable way…
Why AI Is a Practical Solution for Pharma
Artificial intelligence, or AI, is gaining more attention in the pharma space these days. At one time evoking images from…
How can AI help in Transforming the Drug Development Cycle?
Artificial intelligence (AI) is transforming the pharmaceutical industry with extraordinary innovations that are automating processes at every stage of drug…
How Will AI Disrupt the Pharma Industry?
There is a lot of buzz these days about how artificial intelligence (AI) is going to disrupt the pharmaceutical industry….
Revolutionizing Drug Discovery with AI-Powered Solutions
Drug discovery plays a key role in the pharma and biotech industries. Discovering unmet needs, pinpointing the target, identifying the…
Leveraging the Role of AI for More Successful Clinical Trials
The pharmaceutical industry spends billions on R&D each year. Clinical trials require tremendous amounts of effort, from identifying sites and…
Understanding the Language of Life Sciences
Training algorithms to identify and extract Life Sciences-specific data The English dictionary is full of words and definitions that can be…
Understanding the Computer Vision Technology
The early 1970s introduced the world to the idea of computer vision, a promising technology automating tasks that would otherwise…
AI Is All Hype If We Don’t Have Access to
Summary: AI could potentially speed drug discovery and save time in rejecting treatments that are unlikely to yield worthwhile resultsAI has…