AWS leader talks about technologies needed to take precision medicine to the next level
One of the most significant challenges to the advancement of precision medicine has been the lack of an infrastructure to support translational bioinformatics, supporting organizations as they work to uncover unique datasets to find novel associations signals.
By supporting greater interoperability collaboration, data scientists, developers, clinicians pharmaceutical partners have the opportunity to leverage machine learning to reduce the time it takes to move from insight to discovery, ultimately leading to the right patients receiving the right care, with the right therapeutic at the right time.
To get a better understanding of challenges surrounding precision medicine its future, Healthcare IT News sat down with Dr. Taha Kass-Hout, director of machine learning at AWS.
Q: You’ve said that one of the most significant challenges to the advancement of precision medicine has been the lack of an infrastructure to support translational bioinformatics. Please explain this challenge in detail.
A: One of the challenges in developing utilizing storage, analytics interpretive methods is the sheer volume of biomedical data that needs to be transformed that often resides on multiple systems in multiple formats. The future of healthcare is so vibrant dynamic there is an opportunity for cloud big data to take on a larger role to help the industry address these areas.
For example, datasets used to perform tasks such as computational chemistry molecular simulations that help de-risk, advance molecules into development, contain millions of data points require billions of calculations to produce an experimental output. In order to bring new therapeutics to market faster, scientists need to move targets through development faster find more efficient ways to collaborate both inside outside of their organizations.
Another challenge is that large volumes of data acquired by legacy research equipment, such as microscopes spectrometers, is usually stored locally. This creates a barrier for securely archiving, processing sharing with collaborating researchers globally. Improving access to data, securely compliantly, while increasing usability is critical to maximizing the opportunities to leverage analytics machine learning.
For instance, Dotmatics’ cloud-based software provides simple, unified, real-time access to all research data in Dotmatics third-party databases, coupled with integrated, scientifically aware informatics solutions for small molecule biologics discovery that expedite laboratory workflows capture experiments, entities, samples test data so that in-house or multi-organizational research teams become more efficient.
Today we are seeing a rising wave of healthcare organizations moving to the cloud, which is enabling researchers to unite R&D data with information from across the value chain, while benefiting from compute storage options that are more cost-effective than on-premises infrastructure.
For large datasets in the R&D phase, large-scale, cloud-based data transfer services can transfer hundreds of terabytes millions of files at speeds up to 10 times faster than open-source tools. Storage gateways ensure experimental data is securely stored, archived available to other permissioned collaborators. Uniting data in a data lake improves access helps to eliminate silos.
Cloud-based hyperscale computing machine learning enable organizations to collaborate across datasets, create leverage global infrastructures to maintain data integrity, more easily perform machine learning-based analyses to accelerate discoveries de-risk candidates faster.
For example, six years ago Moderna started building databases information-based activities to support all of their programs. Today, they are fully cloud-based, their scientists don’t go to the lab to pipette their messenger RNA proteins. They go to their web portal, the Drug Design Studio that runs on the cloud.
Through the portal, scientists can access public private libraries that contain all the messenger RNA that exists the thousands of proteins they can produce. Then, they only need to press a button the sequence goes to a fully automated, central lab where data is collected at every step.
Over the years, data from the portal lab has helped Moderna improve their sequence design production processes improve the way their scientists gather feedback. In terms of research, all of Moderna’s algorithms rely on computational power from the cloud to further their science.
Q: You contend that by supporting greater interoperability collaboration, data scientists, developers, clinicians pharmaceutical partners have the opportunity to leverage machine learning to reduce the time it takes to move from insight to discovery. Please elaborate on machine learning’s role here in precision medicine.
A: For the last decade, organizations have focused on digitizing healthcare. In the next decade, making sense of all this data will provide the biggest opportunity to transform care. However, this transformation will primarily depend on data flowing where it needs to, at the right time, supporting this process in a way that is secure protects patients’ health data.
It comes down to interoperability. It may not be the most exciting topic, but it’s by far one of the most important, one the industry needs to prioritize. By focusing on interoperability of information systems today, we can ensure that we end up in a better place in 10 years than where we are now. And so, everything around interoperability – around security, around identity management, differential privacy – is likely to be part of this future.
Machine learning models trained to support healthcare life sciences organizations can help automatically normalize, index structure data. This approach has the potential to bring data together in a way that creates a more complete view of a patient’s medical history, making it easier for providers to understrelationships in the data compare this to the rest of the population, drive increased operational efficiency, have the ability to use data to support better patient health outcomes.
For example, AstraZeneca has been experimenting with machine learning across all stages of research development, most recently in pathology to speed up the review of tissue samples. Labeling the data is a time-consuming step, especially in this case, where it can take many thousands of tissue-sample images to train an accurate model.
AstraZeneca uses a machine learning-powered, human-in-the-loop data-labeling annotation service to automate some of the most tedious portions of this work, resulting in at least 50% less time spent cataloging samples.
It also helps analysts spot trends anomalies in the health data derive actionable insights to improve the quality of patient care, make predictions for medical events such as stroke or congestive heart failure, modernize care infrastructure, increase operational efficiency scale specialist expertise.
Numerate, a discovery-stage pharmaceutical, uses machine learning technologies to more quickly cost-effectively identify novel molecules that are most likely to progress through the research pipeline become good candidates for new drug development.
The company recently used its cloud-based platform to rapidly discover optimize ryanodine receptor 2 (RYR2) modulators, which are being advanced as new drugs to treat life-threatening cardiovascular diseases.
Ryanodine 2 is a difficult protein to target, but the cloud made that process easier for the company. Traditional methods could not have attacked the problem, as the complexity of the biology makes the testing laborious slow, independent of the industry’s low 0.1% screening hit rate for much simpler biology.
In Numerate’s case, using the cloud enabled the company to effectively decouple the trial-and-error process from the laboratory discover optimize candidate drugs five times faster than the industry average.
Machine learning also is helping power the entire clinical development process. Biopharma researchers use machine learning to design the most productive trial protocols, study locations, recruitment patient cohorts to enroll. Researchers not trained as programmers can use cloud-based machine learning services to build, train deploy machine learning algorithms to help with pre-clinical studies, complex simulations predictive workflow optimization.
Machine learning can also help accelerate the regulatory submission process, as the massive amounts of data generated during clinical trials can be captured effectively shared to collaborate between investigators, contract research organizations (CROs) sponsor organizations.
For example, the Intelligent Trial Planner (ITP) from Knowledgent, now part of Accenture, uses machine learning services to determine the feasibility of trial studies forecast recruitment timelines. The ITP platform enables study design teams at pharma organizations to run prediction analysis in minutes, not weeks, allowing them to iterate faster more frequently.
machine learning, real-time scenario planning helps to facilitate smarter trial planning by enabling researchers to determine the most optimal sites, countries and/or protocol combinations.
By eliminating poor performing sites, trial teams have the potential to reduce their trial cost by 20%. And by making data-driven decisions that are significantly more accurate, they can plan execute clinical trials faster, leading to hundreds of thousands in cost savings for every month saved in a trial.
Additionally, purpose-built machine learning is supported by cost-effective cloud-based compute options. For example, high-performance computing (HPC) can quickly scale to accommodate large R&D datasets, orchestrating services simplifying the use management of HPC environments.
Data transformation tools can also help to simplify accelerate data profiling, preparation feature engineering, as well as enable reusable algorithms both for new model discovery inference.
The healthcare life sciences industry has come a long way in the last year. However, for progress transformation to continue, interoperability needs to be prioritized.
Q: The ultimate goal of precision medicine is the right patients receiving the right care, with the right therapeutic, at the right time. What do healthcare provider organization CIOs other health IT leaders need to be doing with machine learning other technologies today to be moving toward this goal?
A: The first things IT leaders need to ask themselves is: 1) If they are not investing yet in machine learning, do they plan to this year? And 2) What are the largest blockers to machine learning in their teams?
Our philosophy is to make machine learning available to every data scientist developer without the need to have a specific background in machine learning, then have the ability to use machine learning at scale with cost efficiencies.
Designing a personalized care pathway using therapeutics tuned for particular biomarkers relies on a combination of different data sources such as health records genomics to deliver a more complete assessment of a patient’s condition. By sequencing the genomes of entire populations, researchers can unlock answers to genetic diseases that historically haven’t been possible in smaller studies pave the way for a baseline understanding of wellness.
Population genomics can improve the prevention, diagnosis treatment of a range of illnesses, including cancer genetic diseases, produce the information doctors researchers need to arrive at a more complete picture of how an individual’s genes influence their health.
Advanced analytics machine learning capabilities can use an individual or entire population’s medical history to better understrelationships in data in turn deliver more personalized curated treatment.
Second, healthcare life sciences organizations need to be open to experimenting, learning about embracing both cloud technology – many organizations across the industry are already doing this.
Leaders in precision medicine research such as UK Biobank, DNAnexus, Genomics England, Lifebit, Munich Lukemia Lab, Illumina, Fabric Genomics, CoFactor Genomics Emedgene all leverage cloud technology to speed genomic interpretation.
Third, supporting open collaboration data sharing needs to be a business priority. The COVID-19 Open Research Dataset (CORD-19) created last year by a coalition of research groups provided open access to the plenary of available global COVID-19 research data.
This was one of the primary factors that enabled the discovery, clinical trial delivery of the mRNA-based COVID-19 vaccines in an unprecedented timeframe. Additionally, our Open Data Program makes more than 40 openly available genomics datasets accessible, providing the research community with a single documented source of truth.
Commercial solutions that have leveraged machine learning to enable large-scale genomic sequencing include organizations such as Munich Leukemia Lab, who has been able to use the Field Programmable Gate Array-based compute instances to greatly speed up the process of whole genome sequencing.
As a result, what used to take 20 hours of compute time can now be achieved in only three hours. Another example is Illumina, which is using cloud solutions to offer its customers a lower-cost, high-performance genomic analysis platform, which can help them speed their time to insights as well as discoveries.