Editor’s note: Today we’re hearing from the founder of Quantum Metric, a digital intelligence platform that analyzes huge amounts of digital customer data to improve the customer experience, enhance sales, and increase loyalty. The company credits a huge leap in innovation–along with a 10-fold increase in business–to their decision to adopt Google Cloud. Here’s more detail on how Quantum Metric uses Google Cloud’s BigQuery.
At Quantum Metric, we’re in the business of bringing our customers business insights that are based on customer experience data and analytics for mid-market and Fortune 500 companies. Our software, powered by big data, machine intelligence, and Google Cloud, helps our customers identify, quantify, prioritize and measure opportunities to improve digital experiences. As companies move to a more agile product lifecycle, including continuous deployment and continuous integration, they’re finding that it’s critical to receive perpetual quantified feedback and insights from their data in real time to understand where the largest opportunities exist.
Each year, billions of customer interactions are captured through browsers or mobile apps on PCs, tablets, and mobile devices. This data, fed into the Quantum Metric platform, can show if a customer had a password problem they couldn’t solve or struggled when trying to purchase something and abandoned their cart. It also can show if the customer tried in vain to complete an online change to their service provider’s subscription, to reach tech support, or couldn’t find the size or color they were looking for while shopping online. Most importantly, the Quantum Metric platform quantifies the business value of the issue, helping organizations prioritize where they can make the largest impact to their business.
Success overwhelms our initial architecture
Initially, the Quantum Metric experience analytics software ran on a MySQL open source relational database management system (RDBMS). The MySQL RDBMS worked great for simple queries, when there was a specific question to ask of the data. Soon, though, we knew we needed to offer more advanced data science capabilities. Our bigger customers wanted to ask questions across very large data sets–days, weeks, months, and years worth of data. They wanted to pose iterative questions using complex filters to answer their most challenging business questions.
With more complex queries across more data, response times from our RDBMS went from 100 to 500 milliseconds to as long as 20 minutes. That delay was slowing down our ability and time to insights, which also reduced the value we could provide to our customers, since iterative exploration and analysis requires real-time query responses. Because of the need for real-time responses, there were certain questions that we just weren’t able to ask of the data. It became clear that we needed a much more robust data warehouse solution.
There were also operational challenges with MySQL and massive-scale data ingestion. We spent a lot of time into the wee hours of the night and morning handling errors and recovering databases. We tried to address these challenges by sharding, partitioning, and indexing the data to optimize for the types of questions customers were asking. But the problems were escalating and happening more often, from once a month across the customer base to monthly for at least 20 different customers. We could tune the platform for today and tomorrow’s workload, with good guesses at where indexes could be used, but we simply couldn’t continue to horizontally scale MySQL in a cost-efficient and operationally efficient manner.
Speed breeds innovation
Once we started exploring options that could better scale with our business, we looked at NoSQL technologies like Cassandra (a partitioned row-store database), MySQL’s Column Store (a columnar store database), and Vertica (a columnar store database)–each with unique ways of handling data storage and accessibility. But with high volumes of complex queries across large data stores, all of these solutions began to fail, bogged down with multiple, simultaneous users. We could have solved the problems with more raw compute and storage, but it would have been prohibitively expensive to run and require a large team to operate.
We then decided to try BigQuery, and it was transformative. We connected our front end to BigQuery via APIs. Once data is 15 minutes old, it is automatically extracted, loaded, and transformed (ETL) to BigQuery. We continuously update the legacy MySQL RDBMS so its data is integrated with BigQuery data when queries require real-time data. Most query response times are within 100-200 milliseconds, matching what we initially experienced with MySQL. When traffic from our customers scales up, we can now scale on-demand to accommodate it, thanks to BigQuery’s hundreds of thousands of CPUs. Our customers no longer run into slow response times, and we’ve gained confidence that we can offer them–and their users–advanced insights and better experiences without delay. More importantly, with this scale of query power, we were able to build data science algorithms into the platform, which iteratively query BigQuery based on the results, and help quantify the impact of a specific issue to a specific segment of users. Adding these capabilities was possible because of the massive scale of BigQuery.
In addition to new insights and fast response times, we wanted our customers to be able to ask complex questions using very simple language. For example: “Show me high-loyalty customers, located in specific geographic areas, who visited the web site at least five times, based on specific campaigns, and never booked a seat on a flight.” This was exactly the kind of query that was used by a major U.S. airline to understand the multi-million dollar impact of a failure affecting their most valuable customers: their high-loyalty members. And this was all done while maintaining the highest standard of care of customer data and privacy by default, using multiple layers of encryption of data in transit, at rest, and a unique military-grade encryption approach. This approach encrypts PII, including even session cookies, with a RSA-2048 key available only to a select few and used for use cases such as fraud analysis.
It’s no exaggeration to say that BigQuery has totally transformed our business. It provides the petabyte scale and speed we were missing, in addition to taking care of operational maintenance, a task that was burying our team with MySQL. We’re now able to support some of the largest companies in the world that require real-time, petabyte-scale analytics. That lets them serve more customers faster with higher quality, and take advantage of BigQuery’s power and scale to innovate. There are other cloud solutions that can address petabyte analytics, but the most unique value proposition of BigQuery was its on-demand scaling and operational management, with extremely cost-effective pay-as-you-go billing. While today we are at a scale where we have round-the-clock querying needs, our early days had very sporadic query loads where we needed instant scale, then a long lull of nothing. The unique business model of BigQuery’s pay-by-bytes-scanned allowed us to have access to a massive-scale querying platform without breaking the bank.
Using BigQuery powers better customer experience and reduces purchasing friction
Among the many features of Quantum Metric is the ability to replay online customer sessions. In the example below from a mobile e-commerce site, each action is displayed chronologically. Why did this customer’s transaction fail? Diving deeper, the session replay shows that the user tried to change the item quantity in the checkout cart, which resulted in a failed API call. Powered by BigQuery, Quantum Metric can then show how many other end users had this issue, with a simple click of “Show More Errors Like This.” With BigQuery’s massive scale, Quantum Metric will then quantify the impact of that issue, so companies can prioritize which issues need attention immediately. If this is the issue that’s impacting the business the most, our customer can use a single click to open a Jira ticket, forwarding the discovery to their product and engineering teams. Those teams can then re-engineer the experience in near-real time, addressing the failed API call and cutting out the frustrating time it takes for engineers to reproduce the issue.