Advanced Data Warehousing
May 24, 2024
Snowflake Data Classification & Tagging
As businesses accumulate vast amounts of data, it becomes increasingly crucial to understand the sensitivity of different data elements and apply appropriate protection measures. Snowflake Data Classification addresses this challenge head-on, providing a streamlined way to:
Identify Sensitive Data: Pinpoint Personally Identifiable Information (PII), financial data, or other confidential information within your tables.
Apply Security Measures: Tailor access controls, masking, or anonymization based on the sensitivity level of each data element.
Meet Compliance Requirements: Ensure adherence to data privacy regulations like GDPR, CCPA, and HIPAA.
How Data Classification Works in Snowflake
Snowflake offers two types of data classifiers:
System Classifiers: Pre-defined classifiers provided by Snowflake to identify common sensitive data types like PII, financial data, and more.
Custom Classifiers: Allow you to define your own rules for identifying specific patterns or types of data that are unique to your organization.
A Practical Example: Sales Data Analysis
Let's explore how to use Snowflake Data Classification in a practical scenario.
⛁ Sample Data Model: salesdb-data-model
Setup the SalesDB sample data model and proceed with the rest of the exercises. This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities.
>_ SQL
Let's delve into how Snowflake Copilot can supercharge your data workflow:
1. System Classification: A Starting Point
Snowflake's system classifiers are an excellent starting point. They can automatically identify common patterns in your data and apply corresponding tags.
>_ SQL
This will attempt to identify sensitive data in your schema and automatically apply relevant tags like PII.
2. Custom Classifiers
While Snowflake provides built-in classifiers for common data types like PII (Personally Identifiable Information), your organization may have unique data patterns that require custom classification. This chapter will guide you through creating and applying custom classifiers to identify and tag specific data elements within your Snowflake environment, enhancing your data governance and security practices.
Custom classifiers allow you to define your own rules for identifying specific types of sensitive data. This is crucial when you have data patterns that don't fit the mold of Snowflake's built-in classifiers.
Imagine you have a LoyaltyNumber column in your Customer table that follows the format "LTY-XXXXX" (where X's represent digits). Snowflake's built-in classifiers might not recognize this as sensitive data. Custom classifiers let you tailor your data governance strategy to your specific needs.
3. Creating a Custom Classifier
Follow these steps to create a custom classifier in Snowflake:
Create the Classifier:
>_ SQL
This creates an empty custom classifier instance named loyalty_codes.
Verify Creation:
>_ SQL
This command verifies that your custom classifier has been created successfully. The output should show your loyalty_codes classifier.
Add Regular Expression (Regex) Rules:
>_ SQL
This adds a rule to the loyalty_codes classifier. The regular expression \bLTY-\d{5}\b matches the specific format of your loyalty numbers (LTY followed by a dash and five digits).
Verify the Regex:
>_ SQL
Run this query to verify that the regex pattern is working. You may need to replace varnumber with the actual column name in your table.
List the Rules:
>_ SQL
This will display the details of your added rule, including the regex pattern, semantic category, and comment.
4. Classifying Your Data
With your custom classifier in place, you can now use it to classify the relevant columns in your tables. This process involves two methods:
Automated Tagging (Recommended):
>_ SQL
Manual Tagging:
>_ SQL
Review the suggested classifications and manually apply them to your table columns.
❗Note: Repeat this Step using the Snowsight UI Interface.
5. Dropping the Custom Classifier (Optional)
If you no longer need the custom classifier, you can drop it:
>_ SQL
6. Applying Masking Policies to Tags
Now that our data is tagged, we can create masking policies to protect sensitive information: To apply the masking policies to the tagged columns, you would use the following SQL commands:
>_ SQL
⛁ Sample Data Model: salesdb-data-model -> STEP #7 CREATE MASKING AND APPLY TO TAGS/COLUMNS
These masking policies will automatically obscure data in columns tagged with 'PII' or 'Financial', ensuring that sensitive data is hidden from unauthorized viewers.
7. Key Takeaways
Custom classifiers are a powerful way to identify and tag sensitive data that doesn't fit standard patterns.
Snowflake provides a flexible framework to define your own classification rules using regular expressions.
By combining custom classifiers with RBAC, you can implement granular access controls for your sensitive data.
Regular review of your data classification strategy ensures ongoing data protection and compliance.
8. Documentation
Data Classification: https://docs.snowflake.com/en/user-guide/classify-intro
Tagging Data: https://docs.snowflake.com/en/user-guide/object-tagging
Masking Policies: https://docs.snowflake.com/en/sql-reference/sql/create-masking-policy