Prescriptions for healthcare data management systems on GCP

What is Mimikatz? (And Why Is It So Dangerous?)
November 4, 2019
Red Hat Ups the IQ of the Intelligent Operating System with the Latest Release of Red Hat Enterprise Linux 8
November 5, 2019

As the document notes, this is not a final, production-ready system; it’s a reference architecture that’s designed to illustrate the components you need and how they fit together. The expectation is that you’d use this as a base for creating an architecture that incorporates your own requirements and usage.

The document also delves into different facets of a full healthcare solution. It describes security and permissions, connectivity with your on-premises system, and logging and monitoring–all from the perspective of what’s necessary for a system that aligns with healthcare concerns.

From there, you can turn to the related solution Setting up a HIPAA-aligned project. This document provides detailed instructions for using the toolkit to build out an instance of the reference architecture. The tutorial walks you through every step, from creating a new project all the way through checking BigQuery logs to check for suspicious activity. When you’re done, you’ll not only have exercised the toolkit, but you’ll have a system that you can extend to meet your own needs.

Ingesting medical records

In the last few years, the Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a way to store and share medical records. FHIR defines both a way to represent data (JSON, XML, RDF) and a protocol for sharing records (REST, HTML).

We recently published a solution that explains in detail how you can use the Cloud Healthcare API to work with FHIR in GCP. In Importing FHIR clinical data into the cloud using the Cloud Healthcare API, we lay out the benefits of a Cloud Healthcare API FHIR store. For example, the store can become a source for other GCP-based apps or for analysis in BigQuery and for machine learning. The API can also help with de-identifying data if you want to use it for apps that require anonymous data.

The solution then gets into the details of how to load (ingest) data into GCP, covering the following scenarios:

  • Near real-time ingestion, which loads one record at a time.

  • Bundled ingestion, in which you pass a set of records to be ingested. These can either be a simple batch of individual records, or a set of records that you ingest using transactions in order to have an all-or-nothing import of related records.

  • Batch ingestion, in which the import process reads a series of prepared files from Cloud Storage.

The solution explores additional options, such as automating the ingestion process, using Cloud Functions to pre- or post-process records, using Cloud Pub/Sub to create subscriptions that watch events on buckets or other data stores, and using Cloud Dataflow to work with streaming data.

The solution explicitly notes where you should be careful with security and permissions. For example, it lays out the Cloud IAM roles that the Cloud Healthcare API uses for creating and managing the data, and what permissions those roles need.

De-identifying medical data

Finally, healthcare information isn’t used only for patient care. For example, medical images that help medical professionals diagnose patients are also valuable to researchers as data. But clearly, privacy concerns mean that researchers should not share or publish their research in a way that shows patient information. Therefore, any personally identifying information (PII) and protected health information (PHI) should be removed from the data before publication.

In GCP, researchers can perform this process, known as de-identification, by using the Cloud Healthcare API. The following image shows an x-ray after it’s been de-identified:

Leave a Reply

Your email address will not be published. Required fields are marked *