Among the most promising and important applications of machine learning is finding better ways to diagnose and treat life threatening conditions, including diseases such as cancer that cut far too many lives short. In the United States, cancer is the second most common cause of death and accounts for nearly one in four deaths. Prevention and early detection are critical to improving survival, but there remains much that medical professionals do not understand about lifestyle factors, diagnosis, and treatment of specific subtypes of cancer.
The American Cancer Society is using Google Cloud to reinvent the way data is analyzed so they can save more lives. For the past few decades ACS has conducted the Cancer Prevention Study-II (CPS-II) Nutrition cohort, a prospective study of more than 184,000 American men and women, to explore how factors such as height, weight, demographic characteristics, personal and family history can affect cancer etiology and prognosis.
Mia M. Gaudet, PhD, Scientific Director of Epidemiology Research at ACS, was able to use an end-to-end ML pipeline built on Google Cloud to perform deep analysis of breast cancer tissue samples, the most commonly diagnosed type of cancer among women and the second leading cause of cancer death.
After obtaining medical records and surgical tissue samples for 1,700 CPS-II study participants who were diagnosed with breast cancer from hundreds of hospitals throughout the U.S., Dr. Gaudet studied high-resolution images of the tumor tissue in an effort to determine what lifestyle, medical, and genetic factors are related to molecular subtypes of breast cancer, and whether different features in the breast cancer tissue translate to a better prognosis.
She faced a few technical challenges in analyzing the 1,700 images of breast tumor tissue:
They were captured in a high-resolution, uncompressed and proprietary format–up to 10GB each. Image conversion would be exceedingly costly and time consuming.
Even if the images were converted to a usable format, it would take a team of highly trained pathologists up to three years to analyze all 1,700, and at significant cost.
Analysis would be subject to human fatigue and bias, and some patterns might not be detectable by humans at all.
How Slalom used Cloud ML Engine to help Dr. Gaudet complete her research
To overcome these challenges, Dr. Gaudet and ACS teamed up with Slalom, a Cloud premier partner, to facilitate deep learning at scale. Quality of preprocessing standardization was critical and the images needed to be translated consistently, with colors normalized.
The interpretation of colors across images was standardized through the reduction of color variance and every image was broken into evenly sized tiles to distribute the workload and optimize the data structure required to train the models.
Slalom used GCP to build an end-to-end machine learning pipeline, including preprocessing, feature engineering, and clustering:
Cloud Machine Learning Engine (Cloud ML Engine) preprocessing enabled model training and batch prediction.
Images were stored using Cloud Storage.
Compute Engine orchestrated image conversion and initiated Cloud ML Engine training and prediction jobs in the correct sequence.
Using Keras with a TensorFlow backend for prototyping, Slalom created an auto-encoder model. It then used distributed training on Cloud ML Engine to convert the images into feature vectors that represent patterns in the images as a sequence of numbers.
The features were then clustered with TensorFlow, once again using Cloud ML Engine. The result was a set of cluster assignments, one for each tile in the image, that American Cancer Society plans on using in follow-up analyses.
With this approach, analysis was completed in only three months–twelve times faster than projected–and with a higher degree of accuracy and consistency. The analysis found interesting results: it isolated potentially significant patterns in the cancer tissue that might help inform risk factors and prognosis in the future.
“By leveraging Cloud ML Engine to analyze cancer images, we’re gaining more understanding of the complexity of breast tumor tissue and how known risk factors lead to certain patterns,” said Gaudet.
ACS now has established processes and a cloud infrastructure that will be reusable on similar projects to come. We’re enormously proud that our technology is helping medical professionals who are working tirelessly to prevent cancer deaths and improve outcomes.
For more information on Cloud ML Engine, visit our website.