Data Engineering & Lake
Jun 1, 2024
Snowflake Dynamic Tables
In the ever-changing world of data, real-time processing and analysis are paramount. Snowflake's Dynamic Tables (DTs) provide a powerful mechanism to create automated data pipelines without the overhead of complex code or manual scheduling. This chapter will guide you through building a multi-stage data pipeline using dynamic tables, showcasing how they can streamline your workflows and deliver timely insights.
Overview
Snowflake Dynamic Tables offer a revolutionary approach to data pipelines. They empower you to define your data transformation logic in a simple, declarative SQL query. Snowflake then takes care of the rest, automatically updating the results as your source data changes. This eliminates the need for complex, procedural code and tedious scheduling tasks.
Compared to traditional streams and tasks, dynamic tables provide a more streamlined and intuitive experience. They are easier to set up, maintain, and understand, especially for users who prefer a SQL-centric approach. Streams and tasks offer more granular control and flexibility, but they can be more challenging to implement and manage. Dynamic tables strike a balance between simplicity and power, making them an ideal choice for many data transformation use cases.
A Practical Example: Sales Data Analysis
Run and setup the SalesDB sample data model and proceed with the rest of the exercises.
⛁ Sample Data Model: salesdb-data-model
This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities.
>_ SQL
Let's delve into how Snowflake Dynamic Table feature can supercharge your data workflow:
Building the Sales Data Pipeline
We'll work with the SalesDB database and the custs schema, which contains the following tables:
Customer: Stores information about potential and existing customers.
Opportunities: Tracks sales opportunities at various stages.
Our goal is to create a pipeline that cleans and enriches customer data, filters out irrelevant opportunities, joins the two datasets, and finally aggregates the results for analysis.
Step 1: Enriched Customer Data (Dynamic Table 1)
>_ SQL
Purpose: This dynamic table cleans and enriches the raw customer data by:
Trimming whitespace from emails.
Standardizing names by capitalizing the first letter.
Adding an EmailValidity flag to indicate whether an email is valid.
Recording the LoadDate for tracking freshness.
Step 2: Filtered Opportunities (Dynamic Table 2)
>_ SQL
Purpose: Filters out invalid or irrelevant opportunities based on these criteria:
Positive Amount
Valid SalesStage (not 'InvalidStage')
Not null ExpectedCloseDate
Step 3: Joined Customer Opportunities (Dynamic Table 3)
>_ SQL
Purpose: Joins the enriched customer data with the filtered opportunities, using a LEFT JOIN to include customers without any opportunities.
Step 4: Aggregated Sales (Dynamic Table 4)
>_ SQL
Purpose: Aggregates sales data for each customer based on their opportunities.
Step 5: Customer Opportunities with Aggregated Sales (Dynamic Table 5)
>_ SQL
Purpose: This new dynamic table combines the detailed customer opportunity information from customer_opportunities with the aggregated sales metrics from aggregated_sales.
Dynamic Tables In Action
Here are three batches of INSERT statements designed to demonstrate the dynamic updates of your Snowflake tables, along with explanations:
Batch 1: New High-Value Customer and Opportunities
>_ SQL
This batch introduces a new high-value customer ("Sophia Miller") with two associated opportunities, one closed and one still in negotiation. These additions will trigger updates in the enriched_customers, filtered_opportunities, customer_opportunities, and aggregated_sales dynamic tables.
Batch 2: New Opportunities with Data Issues
>_ SQL
This batch adds three new opportunities but introduces data quality issues:
An invalid SalesStage value.
A missing ExpectedCloseDate.
A negative Amount.
This demonstrates how the filtered_opportunities table will exclude these records due to the applied filters.
Batch 3: Updates to Existing Records
>_ SQL
These updates modify existing records, showcasing how dynamic tables can react to changes in the source data.
Key Points for Demonstration:
Observe the Pipeline: After each batch insertion, query the dynamic tables to see how the data is transformed and propagated through the pipeline.
Highlight Filtering: Demonstrate how the filtered_opportunities table excludes records with invalid or missing data.
Aggregation Results: Show how the aggregated_sales table updates to reflect the new opportunities and their impact on the total potential revenue.
Data Classification & Masking: If you have implemented masking policies, demonstrate how they protect sensitive information in the enriched_customers table after the updates.
This showcases dynamic updates, and illustrates the power of Snowflake Dynamic Tables in creating real-time data pipelines.
Bonus: Streamlit App
Use this optional Streamlit app to visualize the record flow across the different tables. Add additional records and see how they get updated.
>_ Python
Packages:
Key Points
Automation: Snowflake automatically updates these dynamic tables whenever the underlying Customer or Opportunities tables change, keeping your data pipeline fresh.
Modularity: Each dynamic table performs a specific step in the pipeline, making it easier to maintain and modify the process.
Real-Time Insights: The customer_opportunities_with_sales table provides a real-time view of your customers, their opportunities, and aggregated sales data, facilitating rapid decision-making.
Resources
Dynamic Tables Overview: Get a comprehensive introduction to the concept, benefits, and use cases of dynamic tables: https://docs.snowflake.com/en/user-guide/dynamic-tables-about
Snowflake Dynamic Tables and Declarative Streaming Data Pipelines: A detailed blog post from Snowflake explaining the concept and benefits of dynamic tables: Snowflake Dynamic Tables and Declarative Streaming Data Pipelines