OrdersDB

Before diving into advanced Snowflake features, it's essential to have a well-structured data model to work with. Here, we'll create a basic sales-oriented data model within Snowflake, leveraging external data stored in an S3 bucket. This model will serve as the foundation for subsequent demonstrations and explorations.

DemoHub - OrdersDB Data Model - Version 1.2.7 (updated 05/26/2024)

1. Database, File Format, and Stage Creation

>_ SQL

/*
------------------------------------------------------------------------------
-- Snowflake Demo Script: Orders Data Model-- 
-- Description: 
-- This script sets up a sales data model in Snowflake. It includes the creation of tables for customers, device,
-- sales_order, and sales_order_item, along with sample data insertion and tagging of 
-- columns for PII, lead source, and sales stage. Additionally, it defines RBAC 
-- privileges, functions, stored procedures, and views for analysis purposes.
--
-- Author: Fru N.
-- Website: DemoHub.dev
--
-- Date: May 26, 2024
--
-- Copyright: (c) 2024 DemoHub.dev. All rights reserved.
--
-- Disclaimer:  
-- This script is for educational and demonstration purposes only. It is not
-- affiliated with or endorsed by Snowflake Computing. Use this code at your 
-- own risk.
------------------------------------------------------------------------------
*/

>_ SQL

-- +----------------------------------------------------+
-- |             1. DATABASE AND SCHEMA SETUP             |
-- +----------------------------------------------------+

CREATE OR REPLACE DATABASE ordersdb;
USE ordersdb;

CREATE OR REPLACE FILE FORMAT CSV_SCHEMA_DETECTION
    TYPE = CSV
    PARSE_HEADER = TRUE
    SKIP_BLANK_LINES = TRUE
    TRIM_SPACE = TRUE
    ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE;

CREATE OR REPLACE STAGE DEMOHUB_S3_INT 
    URL = 's3://demohubpublic/data/'
    DIRECTORY = ( ENABLE = true )
    COMMENT = 'DemoHub S3 datasets'

Database (ordersdb): We begin by creating a new database named ordersdb to house all the objects related to our sales data. This organizational practice helps keep your Snowflake environment tidy and easy to manage.
File Format (CSV_SCHEMA_DETECTION): Next, we establish a file format to define how Snowflake should interpret the incoming data. This file format specifies that the data is in CSV (Comma-Separated Values) format, should parse headers for column names, skip blank lines, trim extra spaces, and be lenient towards mismatched column counts.
Stage (DEMOHUB_S3_INT): A stage is a named location in cloud storage (in this case, an S3 bucket) where Snowflake can access your data files. This stage points to the demohubpublic/data/ directory on S3, from which we'll load our sample data.

2. Loading Data with Schema Inference

>_ SQL

-- +----------------------------------------------------+
-- |             2. LOADING DATA WITH SCHEMATIZATION            |
-- +----------------------------------------------------+
-- Customer
CREATE OR REPLACE TABLE customer USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
 FROM TABLE (INFER_SCHEMA(
 LOCATION=>'@demohub_s3_int/orders/customer/',
 FILE_FORMAT=>'CSV_SCHEMA_DETECTION')));

-- Device
CREATE OR REPLACE TABLE device USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
 FROM TABLE (INFER_SCHEMA(
 LOCATION=>'@demohub_s3_int/orders/device/',
 FILE_FORMAT=>'CSV_SCHEMA_DETECTION')));

-- Sales_Order
CREATE OR REPLACE TABLE sales_order USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
 FROM TABLE (INFER_SCHEMA(
 LOCATION=>'@demohub_s3_int/orders/sales_order/',
 FILE_FORMAT=>'CSV_SCHEMA_DETECTION')));

-- Sales_Order_item
CREATE OR REPLACE TABLE sales_order_item USING TEMPLATE (
SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
 FROM TABLE (INFER_SCHEMA(
 LOCATION=>'@demohub_s3_int/orders/sales_order_item/',
 FILE_FORMAT=>'CSV_SCHEMA_DETECTION')))

Schema Inference (INFER_SCHEMA): Snowflake's powerful INFER_SCHEMA function examines the files in your S3 bucket and automatically determines the appropriate columns and data types for your tables. This saves you the manual effort of defining the schema.
USING TEMPLATE: This clause creates a table based on the inferred schema. It's a convenient way to avoid writing explicit CREATE TABLE statements with detailed column definitions.
ARRAY_AGG and OBJECT_CONSTRUCT: These functions are used to transform the inferred schema information into a format suitable for creating the table.

3. Adding Constraints and Referential Integrity

>_ SQL

-- +----------------------------------------------------+
-- |             3. CONSTRAINTS AND REFERENTIAL INTEGRITY            |
-- +----------------------------------------------------+

-- Device table
ALTER TABLE Device ADD PRIMARY KEY (device_id);

-- Customer table
ALTER TABLE Customer ADD PRIMARY KEY (customer_id);

-- Sales_Order table
ALTER TABLE Sales_Order ADD PRIMARY KEY (order_id);
ALTER TABLE Sales_Order ADD CONSTRAINT fk_sales_customer FOREIGN KEY (customer_id) REFERENCES Customer(customer_id);

-- Sales_Order_Item table
ALTER TABLE Sales_Order_Item ADD PRIMARY KEY (order_item_id);
ALTER TABLE Sales_Order_Item ADD CONSTRAINT fk_order_item_order FOREIGN KEY (order_id) REFERENCES Sales_Order(order_id)

Primary Keys: Unique identifiers are established for each table (device_id, customer_id, order_id, order_item_id) to ensure data integrity and enable efficient lookups.
Foreign Keys: Relationships between tables are defined to maintain data consistency. For example, a foreign key constraint on sales_order(customer_id) references customer(customer_id), guaranteeing that every order is linked to a valid customer.

4. Copying Data into Tables

>_ SQL

-- +----------------------------------------------------+
-- |             4. COPY DATA INTO TABLE             |
-- +----------------------------------------------------+

-- Customer
Copy into customer from '@demohub_s3_int/orders/customer/'
FILE_FORMAT = 'CSV_SCHEMA_DETECTION'
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

-- Device
Copy into device from '@demohub_s3_int/orders/device/'
FILE_FORMAT = 'CSV_SCHEMA_DETECTION'
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

-- Sales_Order
Copy into sales_order from '@demohub_s3_int/orders/sales_order/'
FILE_FORMAT = 'CSV_SCHEMA_DETECTION'
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

-- Sales_Order_Item
Copy into sales_order_item from '@demohub_s3_int/orders/sales_order_item/'
FILE_FORMAT = 'CSV_SCHEMA_DETECTION'

COPY INTO: This command efficiently loads the data from your S3 bucket into the corresponding tables.

MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE: Ensures smooth data loading even if there are minor inconsistencies in column name casing between your CSV files and table definitions.

Ready for Exploration

With your data model set up and populated, you now have a playground to explore Snowflake's advanced features like data classification, masking, forecasting, anomaly detection analysis and much more.

This sales data model enables a wide range of analyses, including:

Customer Insights: Analyze customer demographics, purchase history, and lifetime value.
Sales Performance: Track sales trends over time, identify top-selling products, and evaluate the effectiveness of different sales channels.
Inventory Management: Monitor stock levels, analyze product popularity, and optimize inventory based on sales patterns.
Order Analysis: Examine order details, shipping information, and customer satisfaction metrics.
Predictive Modeling: Develop models to forecast future sales, predict customer churn, or identify potential upsell opportunities.

This data model will be referenced in subsequent demo’s as we delve deeper into Snowflake's capabilities.