Privacy regulations place strict controls on how to examine and share sensitive data. At the same time, you can’t let your business come to a standstill. De-identification techniques can help you strike a balance between utility and privacy for your data. In previous “Take charge of your data” posts, we showed you how to gain visibility into your data using Cloud Data Loss Prevention (DLP) and how to protect sensitive data by incorporating data de-identification, obfuscation, and minimization techniques. In this post, we’ll dive a bit deeper into one of these de-identification techniques: tokenization.
Tokenization substitutes sensitive data with surrogate values called tokens, which can then be used to represent the original (or raw) sensitive value. It is sometimes referred to as pseudonymization or surrogate replacement. The concept of tokenization is widely used in industries like finance and healthcare to help reduce the risk of data in use, compliance scope, and minimize sensitive data being exposed to systems that do not need it. It’s important to understand how tokenization can help protect sensitive data while allowing your business operations and analytical workflows to use the information they need. With Cloud DLP, customers can perform tokenization at scale with minimal setup and overhead.
First, let’s look at the following scenario: Casey works as a data scientist at a large financial company that services businesses and end users. Casey’s primary job is to analyze data and improve the user experience for people using the company’s vast portfolio of financial applications. In the normal course of doing business, the company collects sensitive and regulated data including personally identifiable information (PII) like Social Security numbers.
In order to demonstrate the benefits of tokenization, let’s consider a task that Casey might do as part of her job: The company wants to determine what products they can build to help users improve their credit scores depending on their age range. In order to answer this question, Casey needs to join user information in the company’s banking app with customers’ credit score data received from a third party.
Casey requests access to both the users table and the table filled with the third party’s credit score data.