Modernizing financial risk computation with Hitachi Consulting and GCP

RSAC DAY 3 – The Ultimate Integration: Hardware, Software, and….CryptoCocktails!
March 7, 2019
VMware Skyline Update – February 2019
March 8, 2019
RSAC DAY 3 – The Ultimate Integration: Hardware, Software, and….CryptoCocktails!
March 7, 2019
VMware Skyline Update – February 2019
March 8, 2019

Ingestion

The first step is to ingest data into the pipeline. Here, the inputs take the form of aggregated portfolio and trade data. One key design goal was the ability to handle both batch and stream inputs. In the batch case, CSV files are uploaded to Google Cloud Storage, and the file upload triggers a message onto a Cloud Pub/Sub topic. For the streaming case, information is published directly onto a Cloud Pub/Sub topic. Cloud Pub/Sub is a fully managed service that provides scalable, reliable, at-least-once delivery of messages for event-driven architectures. Cloud Pub/Sub enables loose coupling of application components and supports both push and pull message delivery.

Preprocessing

Those Cloud Pub/Sub messages feed a Cloud Dataflow pipeline for trade data preprocessing. Cloud Dataflow is a fully managed, auto-scaling service for transforming and enriching data in both stream and batch modes, based on open-source Apache Beam. The portfolio inputs are cleansed and split into individual trade elements, at which point the required risk calculations are determined. The individual trade elements are published to downstream Cloud Pub/Sub topics to be consumed by the risk calculation engine.

Intermediate results from the preprocessing steps are persisted to Cloud Datastore, a fully managed, serverless NoSQL document database. This pattern of checkpointing intermediate results to Cloud Datastore is repeated throughout the architecture. We chose Cloud Datastore for its flexibility, as it brings the scalability and availability of a NoSQL database alongside capabilities such as ACID transactions, indexes and SQL-like queries.

Calculation

At the heart of the architecture sits the risk calculation engine, deployed on GKE. GKE is a managed, production-ready environment for deploying containerized applications. We knew we wanted to evaluate GKE, and Kubernetes more broadly, as a platform for risk computation for the following reasons:

  • Existing risk models can often be containerized without significant refactoring
  • Kubernetes is open source, minimizing vendor lock-in
  • Kubernetes abstracts away the underlying compute infrastructure, promoting portability
  • Kubernetes provides sophisticated orchestration capabilities, reducing dependency on expensive third-party tools
  • GKE is a fully managed service, freeing operations teams to focus on managing applications rather than infrastructure

The risk engine is a set of Kubernetes services designed to handle data enrichment, perform the required calculations, and output results. Pods are independently auto-scaled via Stackdriver metrics on Cloud Pub/Sub queue depths, and the cluster itself is scaled based on the overall CPU load. As in the preprocessing step, intermediate results are persisted to Cloud Datastore and pods publish messages to Cloud Pub/Sub to move data through the pipeline. The pods can run inside a private cluster that is isolated from the internet but can still interact with other GCP services via private Google access.

Output

Final calculation results output by the risk engine are published to a Cloud Pub/Sub topic, which feeds a Cloud Dataflow pipeline. Cloud Dataflow enriches the results with the portfolio and market data used for the calculations, creating full-featured snapshots. These snapshots are persisted to BigQuery, GCP’s serverless, highly scalable enterprise data warehouse. BigQuery allows analysis of the risk exposures at scale, using SQL and industry-standard tooling, driving customer use cases like regulatory reporting.

Lessons learned building a proof-of-concept data platform

We learned some valuable lessons while building out this platform:

  • Choosing managed and serverless options greatly improved team velocity
  • Be aware of quotas and limits; during testing we encountered BigQuery streaming-insert limits. We worked around that using a blended streaming and micro-batch strategy with Cloud Dataflow.
  • We had to do some testing and investigation to get optimum auto-scaling of the Kubernetes pods.
  • The system scaled well under load without warm-up or additional configuration

What’s next for our risk solution

We built a modernized, cloud-native risk computation platform that offers several advantages over traditional grid-based architectures. The architecture is largely serverless, using managed services such as Cloud Dataflow, Cloud Pub/Sub and Cloud Datastore. The solution is open-source at its core, using Kubernetes and Apache Beam via GKE and Cloud Dataflow, respectively. BigQuery provides an easy way to store and analyze financial data at scale. The architecture has the ability to handle both batch and stream inputs, and scales up and down to match load.

Using GCP, we addressed some of the key challenges associated with traditional risk approaches, namely inflexibility, high management overhead and reliance on expensive third-party tools. As our VP of financial services, Suranjan Som, put it: “The GCP risk analytics solution provides a scalable, open and cost-efficient platform to meet increasing risk and regulatory requirements.” We’re now planning further work to test the solution at production scale.

Read more about financial services solutions on GCP, and learn about Hitachi Consulting’s financial services solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *