Advanced Data Warehousing

May 24, 2024

Snowflake Aggregation Policies

In the world of data sharing and analysis, safeguarding sensitive information while enabling meaningful insights is a constant challenge. Snowflake's Aggregation Policies offer a powerful mechanism to strike this balance. By defining rules that enforce minimum levels of aggregation in query results, you can protect individual-level data while still allowing users to extract valuable knowledge from summarized information.

What are Aggregation Policies?

Aggregation policies are schema-level objects in Snowflake that control how data can be accessed and queried from a table or view. They serve as gatekeepers, ensuring that queries adhere to specific aggregation requirements to prevent the exposure of individual records.

Key Concepts

  • Aggregation-Constrained: A table or view with an aggregation policy assigned to it.

  • Minimum Group Size: The minimum number of rows that must be aggregated in a query result.

  • Remainder Group: A group in the query result containing rows that don't meet the minimum group size; their grouping key value is set to NULL.

How It Works

When a query is executed on an aggregation-constrained table, Snowflake performs these checks:

  1. Aggregation Requirement: The query must aggregate data, either through an allowed aggregation function (e.g., SUM, AVG, COUNT) or by using GROUP BY.

  2. Minimum Group Size: Each group created by the query must contain at least the specified minimum number of rows.

If a query violates either of these requirements, Snowflake blocks it to protect the privacy of individual records.

A Practical Example: Sales Data Analysis

Let's explore how to use Snowflake Aggregation Policies in a practical scenario.

⛁ Sample Data Model: salesdb-data-model

Setup the SalesDB sample data model and proceed with the rest of the exercises. This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities. 

>_ SQL

-- Assign the database schema
  use schema

Let's delve into how this Snowflake feature can supercharge your data workflow:

 Protecting Sales Transactions

Let's consider your sales data model (SalesDB.Custs). Suppose you want to prevent users from accessing individual customer transactions in the Opportunities table. You decide that results should be aggregated to a minimum of 5 opportunities.

1. Creating the Policy:

>_ SQL

CREATE OR REPLACE AGGREGATION POLICY custs.min_opportunities
AS
    () RETURNS AGGREGATION_CONSTRAINT -> AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 3)

2. Assigning the Policy:

>_ SQL

ALTER TABLE custs.Opportunities SET

3. Testing the Policy:

>_ SQL

-- This will fail (less than 3 rows)
SELECT * FROM custs.Opportunities WHERE OPPORTUNITYID = 1003;  

-- This will succeed (aggregates to 3 or more rows)
SELECT LeadSource, COUNT(*) AS NumOpportunities
FROM custs.Opportunities
GROUP BY

This will run the test to ensure the aggregation is enforced, and in the first case, it fails due to aggregation policy enforcement.


Advanced Use Cases

Multiple Conditions: You can define more complex policies based on multiple criteria. For example, a policy could require a minimum of 3 rows and an aggregated amount exceeding $50,000.

Here's an example query that aggregates six records from the Opportunities table, complying with the min_opportunities policy:

>_ SQL

SELECT
    LeadSource, 
    COUNT(*) AS NumberOfOpportunities, 
    AVG(Amount) AS AverageOpportunityValue
FROM custs.Opportunities
GROUP BY


This will run the aggregation of the data based on LeadSource. The result will only include the rows where the count of rows is greater than or equal to 3, based on the defined policy.

Important Considerations and Limitations

  • Granularity: Choose the minimum group size carefully, balancing data utility with privacy protection.

  • Not Foolproof: While aggregation policies are effective, determined adversaries might still deduce information with targeted queries.

  • Not for Entity-Level Protection: Policies protect individual records, not the entire entity. If multiple records exist for the same entity, some information might still be revealed.

  • Limited Query Types: Certain query constructs like PIVOT, recursive CTEs, and window functions are not allowed.

  • Joins and Set Operators: Carefully consider the impact of aggregation policies when using joins and set operators like UNION ALL.

Removing Aggregation Policies

>_ SQL

ALTER TABLE

Discovering Aggregation Policies

>_ SQL

-- Show aggregation policies in your account
  SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.AGGREGATION_POLICIES ORDER BY

Conclusion

Snowflake Aggregation Policies are a powerful tool for maintaining data privacy while still enabling insightful analysis. By defining appropriate aggregation rules and incorporating them into your data governance strategy, you can confidently share data while safeguarding sensitive information. This enhances the trust your customers and partners have in your data governance practices.

For Further Information:

Refer to Snowflake's official documentation for detailed information and advanced usage scenarios:

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

More from
Advanced Data Warehousing

Snowflake Data Quality Metrics/Functions

Snowflake Data Quality Metrics/Functions

Snowflake Data Quality Metrics/Functions

Snowflake Higher Order Functions (HoF)

Snowflake Higher Order Functions (HoF)

Snowflake Higher Order Functions (HoF)

Snowflake Data Classification & Tagging

Snowflake Data Classification & Tagging

Snowflake Data Classification & Tagging

Snowflake ASOF JOIN

Snowflake ASOF JOIN

Snowflake ASOF JOIN

Snowflake Soundex & Fuzzy Matching

Snowflake Soundex & Fuzzy Matching

Snowflake Soundex & Fuzzy Matching

Snowflake Aggregation Policies

Snowflake Aggregation Policies

Snowflake Aggregation Policies

Snowflake Projection Policies

Snowflake Projection Policies

Snowflake Projection Policies

Snowflake JMeter Load & Concurrency Test

Snowflake JMeter Load & Concurrency Test

Snowflake JMeter Load & Concurrency Test

Snowflake Memoizable Functions

Snowflake Memoizable Functions

Snowflake Memoizable Functions

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved