Advanced Data Warehousing

May 24, 2024

Snowflake Data Classification & Tagging

As businesses accumulate vast amounts of data, it becomes increasingly crucial to understand the sensitivity of different data elements and apply appropriate protection measures. Snowflake Data Classification addresses this challenge head-on, providing a streamlined way to:

  • Identify Sensitive Data: Pinpoint Personally Identifiable Information (PII), financial data, or other confidential information within your tables.

  • Apply Security Measures: Tailor access controls, masking, or anonymization based on the sensitivity level of each data element.

  • Meet Compliance Requirements: Ensure adherence to data privacy regulations like GDPR, CCPA, and HIPAA.

How Data Classification Works in Snowflake

Snowflake offers two types of data classifiers:

  • System Classifiers: Pre-defined classifiers provided by Snowflake to identify common sensitive data types like PII, financial data, and more.

  • Custom Classifiers: Allow you to define your own rules for identifying specific patterns or types of data that are unique to your organization.

A Practical Example: Sales Data Analysis

Let's explore how to use Snowflake Data Classification in a practical scenario.

⛁ Sample Data Model: salesdb-data-model

Setup the SalesDB sample data model and proceed with the rest of the exercises. This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities. 

>_ SQL

// Assign the database schema
use schema

Let's delve into how Snowflake Copilot can supercharge your data workflow:

1. System Classification: A Starting Point

Snowflake's system classifiers are an excellent starting point. They can automatically identify common patterns in your data and apply corresponding tags.

>_ SQL

-- Classify all tables in the custs schema using built-in classifiers
CALL SYSTEM$CLASSIFY_SCHEMA('SalesDB.custs', {'auto_tag': true})

This will attempt to identify sensitive data in your schema and automatically apply relevant tags like PII.

2. Custom Classifiers

While Snowflake provides built-in classifiers for common data types like PII (Personally Identifiable Information), your organization may have unique data patterns that require custom classification. This chapter will guide you through creating and applying custom classifiers to identify and tag specific data elements within your Snowflake environment, enhancing your data governance and security practices.

Custom classifiers allow you to define your own rules for identifying specific types of sensitive data. This is crucial when you have data patterns that don't fit the mold of Snowflake's built-in classifiers.

Imagine you have a LoyaltyNumber column in your Customer table that follows the format "LTY-XXXXX" (where X's represent digits). Snowflake's built-in classifiers might not recognize this as sensitive data. Custom classifiers let you tailor your data governance strategy to your specific needs.

3. Creating a Custom Classifier

Follow these steps to create a custom classifier in Snowflake:

Create the Classifier:

>_ SQL

CREATE OR REPLACE SNOWFLAKE.DATA_PRIVACY.CUSTOM_CLASSIFIER loyalty_codes()

This creates an empty custom classifier instance named loyalty_codes.

Verify Creation:

>_ SQL

This command verifies that your custom classifier has been created successfully. The output should show your loyalty_codes classifier.

Add Regular Expression (Regex) Rules:

>_ SQL

CALL Loyalty_codes!ADD_REGEX(
    'LTY_CODES',             -- Semantic category name for the tag
    'IDENTIFIER',            -- Privacy category (e.g., 'IDENTIFIER', 'QUASI_IDENTIFIER', 'SENSITIVE')
    '\\bLTY-\\d{5}\\b',      -- Regular expression to match the loyalty number pattern
    'LTY*',                   -- Optional: column name pattern to match 
    'Regex to identify Loyalty codes in a column' -- Comment describing the rule
)

This adds a rule to the loyalty_codes classifier. The regular expression \bLTY-\d{5}\b matches the specific format of your loyalty numbers (LTY followed by a dash and five digits).

Verify the Regex:

>_ SQL

SELECT varnumber -- Assumes the column name is 'varnumber' in this example
FROM customer
WHERE varnumber REGEXP('\\bLTY-\\d{5}\\b')


Run this query to verify that the regex pattern is working. You may need to replace varnumber with the actual column name in your table.

List the Rules:

>_ SQL

SELECT loyalty_codes!LIST()

This will display the details of your added rule, including the regex pattern, semantic category, and comment.

4. Classifying Your Data

With your custom classifier in place, you can now use it to classify the relevant columns in your tables. This process involves two methods:

Automated Tagging (Recommended):

>_ SQL

CALL SYSTEM$CLASSIFY(
    'salesdb.custs.customer',  -- Specify the table to classify
    {'auto_tag': true, 'custom_classifiers': ['Loyalty_codes']} -- Enable auto-tagging and use your custom classifier
)

Manual Tagging:

>_ SQL

SELECT SYSTEM$GET_CLASSIFICATION_RESULT('salesdb.custs.customer')

Review the suggested classifications and manually apply them to your table columns.

Note: Repeat this Step using the Snowsight UI Interface.

5. Dropping the Custom Classifier (Optional)

If you no longer need the custom classifier, you can drop it:

>_ SQL

DROP

6. Applying Masking Policies to Tags

Now that our data is tagged, we can create masking policies to protect sensitive information: To apply the masking policies to the tagged columns, you would use the following SQL commands:

>_ SQL

⛁ Sample Data Model: salesdb-data-model -> STEP #7 CREATE MASKING AND APPLY TO TAGS/COLUMNS     

These masking policies will automatically obscure data in columns tagged with 'PII' or 'Financial', ensuring that sensitive data is hidden from unauthorized viewers.

7. Key Takeaways

  • Custom classifiers are a powerful way to identify and tag sensitive data that doesn't fit standard patterns.

  • Snowflake provides a flexible framework to define your own classification rules using regular expressions.

  • By combining custom classifiers with RBAC, you can implement granular access controls for your sensitive data.

  • Regular review of your data classification strategy ensures ongoing data protection and compliance.

8. Documentation

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

More from
Advanced Data Warehousing

Snowflake Data Quality Metrics/Functions

Snowflake Data Quality Metrics/Functions

Snowflake Data Quality Metrics/Functions

Snowflake Higher Order Functions (HoF)

Snowflake Higher Order Functions (HoF)

Snowflake Higher Order Functions (HoF)

Snowflake Data Classification & Tagging

Snowflake Data Classification & Tagging

Snowflake Data Classification & Tagging

Snowflake ASOF JOIN

Snowflake ASOF JOIN

Snowflake ASOF JOIN

Snowflake Soundex & Fuzzy Matching

Snowflake Soundex & Fuzzy Matching

Snowflake Soundex & Fuzzy Matching

Snowflake Aggregation Policies

Snowflake Aggregation Policies

Snowflake Aggregation Policies

Snowflake Projection Policies

Snowflake Projection Policies

Snowflake Projection Policies

Snowflake JMeter Load & Concurrency Test

Snowflake JMeter Load & Concurrency Test

Snowflake JMeter Load & Concurrency Test

Snowflake Memoizable Functions

Snowflake Memoizable Functions

Snowflake Memoizable Functions

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved