Sensitive data in Snowflake

When finding sensitive data & PII amongst the terabytes of data flowing constantly to Snowflake becomes tech companies' most daunting data challenge.

 min read

With over 15% of market share in the data warehousing category, Snowflake now has more than 5,000 companies using it for their cloud-based data storage and analytics service. Fairly credited as having revived the whole data warehousing industry, their IPO in September 2020 is now well-known as the largest software IPO in history. Snowflake’s platform has been available on Amazon S3 since 2014, on Microsoft Azure since 2018 and on the Google Cloud Platform since 2019.

Why do people use Snowflake?

Since it’s built specifically for the cloud, Snowflake’s elastic nature helps address many of the hardware issues of the older data warehouses (selecting, installing, configuring & managing hardware or software) and helps load data faster, as well as run a high level of queries. Not having to dedicate resources for setup, support and continued maintenance of in-house servers makes it a prime choice for tech startups & scaleups, especially those of the FinTech, InsurTech and HealthTech sectors.

More (sensitive) data, more problems

“When in doubt, store as much as you can.” This leitmotif for startups means that they gradually end up storing a gigantic amount of data that they need for data analytics and machine learning purposes. And this data moves so rapidly to and inside of Snowflake, that it becomes impossible to make sure the data is secure at all times. Yet, regulations like GDPR, CCPA, PCI and HIPAA mean that businesses need more sophisticated ways to identify and remove sensitive data such as PII from Snowflake.

How is it therefore possible to protect sensitive data and PII as you access and store files in Snowflake?

Having first to carefully examine all Snowflake databases to search for this data becomes a daunting task - and we all know that an engineering team is already busy with actually building the product and helping the company scale.

The challenge of discovering sensitive unstructured data in Snowflake

The combination of structured and unstructured data, and the unstoppable growth of the latter, means that even when looking, finding data types that are considered sensitive (like email addresses, financial information, medical information, social security identifiers, numbers like licence plates etc.) is very time-consuming.

Snowflake did announce support for unstructured data (like video, audio, images and PDFs) last year. Customers can now avoid dealing with a large number of systems, or being forced to deploy very granular governance for these unstructured files and metadata. In addition, with improved insights that take into account more data, they are able to find new revenue opportunities.

Controlling sensitive data and PII in a data warehouse starts with knowing where it is. Identifying where all this data is hidden, to be able to secure it and use it is the first step in knowing where your risks lie and above all, managing it on a regular basis.

Stopping the wrong kind of data from landing in Snowflake

Metomic helps you manage and audit your sensitive data by tracking what is sensitive and what isn't. It gives you an overview of all the sensitive data in the environment at any time, allowing you to correctly apply controls such as deletion & redaction to both structured and unstructured sources.

It then becomes easier to comply with protection and privacy regulations for data stored in Snowflake.

By accurately identifying and mapping sensitive data across all of your cloud data warehouse, in addition to your cloud apps and infrastructure, you will know precisely when it was uploaded, and who has access to it. Metomic has an extensive list of off-the-shelf classifiers or you can build your own, and then perform a full scan. You’ll also be able to stop the wrong data from landing in Snowflake in the first place.

We can help you by masking the data automatically in Snowflake while leaving the data itself untouched, so that your team doesn’t have to go and edit all their reports and queries. If needed, you can also implement an automatic data retention period. With our data Data Discovery API, you can create your own data-driven workflows from any app using our Webhooks or Query API.

Metomic's secure architecture helps you eliminate your security risks, without adding new ones - you can now keep your dev team focussed on building the product instead of fearing the Damocles sword of a data leakage in Snowflake.

Get in touch today for a chat with our team and a demo of our product.

Subscribe to our newsletter now!

Thanks for joining our newsletter.
Oops! Something went wrong.