Classic AI / ML (Data Science)

Jun 1, 2024

Snowflake Cortex AI: ML Anomaly Detection

Uncovering anomalies in your data is vital for understanding unusual patterns, identifying potential problems, and ultimately making more informed business decisions. 


Whether it's a sudden spike in sales due to a successful marketing campaign or an unexpected dip caused by supply chain disruptions, anomaly detection provides valuable insights that might otherwise go unnoticed.

Snowflake Cortex simplifies this process, allowing you to pinpoint outliers and investigate the underlying causes.


Note: In addition to scripting provided in this chapter, you can also create Cortex anomaly detection models using Snowsight's visual interface, which streamlines the process by prompting you for input and generating the corresponding SQL script.

Hands-On Anomaly Detection with the OrdersDB Data

Let's use the ordersdb sample data model to illustrate how to detect anomalies in your daily sales figures.

1. Setting Up Your Data

⛁ Sample Data Model: ordersdb-data-model

Make sure you have the ordersdb sample data model loaded into Snowflake. Then, set the context to the ordersdb.public schema:

>_ SQL

-- Assign the database schema
USE SCHEMA ordersdb.public

2. Preparing and Splitting Your Data

Preparing Your Time-Series Data

We'll use the daily_sales view (created in the forecasting section) as our time-series data source for anomaly detection. Remember, this view aggregates sales data by day.

Next, divide your daily_sales data into two equal halves based on the date. The first half will train the model, and the second half will be analyzed for anomalies.

In this example, we're filtering on SHIP_MODE = 'MAIL' and return_flag = 'N' to focus on standard mail shipments without returns. You can customize these filters based on your specific interests.

>_ SQL

-- Create or replace the table `split_info` to store the split date
CREATE OR REPLACE TABLE split_info AS
-- Select the maximum date from the first half of the ranked dates
SELECT MAX(date) AS split_date
FROM (
  -- Rank the dates into two halves using NTILE(2)
  SELECT date, NTILE(2) OVER (ORDER BY date) AS half
  FROM daily_sales
) ranked_dates
-- Filter to only include the first half
WHERE half = 1;

-- This view is used for training the model (50% of the data)
CREATE OR REPLACE VIEW daily_sales_train_anomaly AS
SELECT 
  TO_TIMESTAMP_NTZ(date) AS date,
  total_sales
FROM daily_sales
WHERE SHIP_MODE = 'MAIL' AND return_flag = 'N' 
  AND to_timestamp_ntz(DATE) < (SELECT to_timestamp_ntz(split_date) FROM split_info)
ORDER BY date;

-- This view is used for running anomaly detection (the other 50% of the data)
CREATE OR REPLACE VIEW daily_sales_analyze_anomaly AS
SELECT 
  TO_TIMESTAMP_NTZ(date) AS date,
  total_sales
FROM daily_sales
WHERE SHIP_MODE = 'MAIL' AND return_flag = 'N'
  AND to_timestamp_ntz(DATE) >= (SELECT to_timestamp_ntz(split_date) FROM split_info)
ORDER BY

3. Building the Anomaly Detection Model

Create the anomaly detection model (daily_sales_anomaly_model), providing the training data, timestamp column, and the target column (total_sales):

>_ SQL

-- Create the model using the training view and specify the timestamp and target columns
CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION daily_sales_anomaly_model (
  INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'daily_sales_train_anomaly'),
  TIMESTAMP_COLNAME => 'date',
  TARGET_COLNAME => 'total_sales'
)

4. Detecting and Analyzing Anomalies

Use the DETECT_ANOMALIES function to identify potential outliers in the sales data and store the results. This will produce a table (daily_sales_anomaly_predictions) with details about each day, including whether an anomaly was detected (IS_ANOMALY).

>_ SQL

-- Run anomaly detection on the analysis view and store results
CREATE OR REPLACE TABLE daily_sales_anomaly_predictions AS
SELECT * 
FROM TABLE(
    daily_sales_anomaly_model!DETECT_ANOMALIES(
        INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'daily_sales_analyze_anomaly'),
        TIMESTAMP_COLNAME => 'date',
        TARGET_COLNAME => 'total_sales'
    )
) ORDER BY 1, 2 DESC;

-- Query and visualize the anomaly detection results
SELECT * FROM

Isolating Anomalies

You can create another table (daily_sales_anomalies_only) to specifically store the days that were flagged as anomalies. This table can be used for further investigation or as input for alerts or automated actions.

>_ SQL

-- Save only the anomaly records for further investigation
CREATE OR REPLACE TABLE daily_sales_anomalies_only AS
SELECT * 
FROM daily_sales_anomaly_predictions
WHERE IS_ANOMALY = TRUE 
ORDER BY 1, 2 DESC

5. Visualizing Anomalies in Snowsight

After calling the daily_sales_anomaly_model!DETECT_ANOMALIES method, you can visualize the results as shown below. To do so:



  • Select Charts: Click on the "Charts" tab above the results table in Snowsight.

  • Configure Data:

    • In the "Data" section, select the VALUE (actual sales) column and set Aggregation to "None."

    • Select the TS (timestamp) column and set Bucketing to "None."

    • Add the LOWER_BOUND and UPPER_BOUND columns, also setting Aggregation to "None."

  • Add Anomaly Column: Click "Add Column" on the right side and select IS_ANOMALY.

  • Generate Chart: Click on "Chart" to display the visualization.

You should now see a line chart with TS on the X-axis and VALUE, LOWER_BOUND, and UPPER_BOUND on the Y-axis. Points where IS_ANOMALY is true will be highlighted, indicating anomalies.

Further Exploration

You can use SHOW_EVALUATION_METRICS and SHOW_TRAINING_LOGS to gain deeper insights into the model's performance and training process.

Experiment with incorporating other features from your daily_sales view (e.g., SHIP_MODE, return_flag) as exogenous variables to potentially improve anomaly detection accuracy.

Resources

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

More from
Classic AI / ML (Data Science)

Snowflake Cortex AI: ML Forecasting

Snowflake Cortex AI: ML Forecasting

Snowflake Cortex AI: ML Forecasting

Snowflake Cortex AI: ML Anomaly Detection

Snowflake Cortex AI: ML Anomaly Detection

Snowflake Cortex AI: ML Anomaly Detection

Snowflake Cortex AI: ML Classification

Snowflake Cortex AI: ML Classification

Snowflake Cortex AI: ML Classification

Snowpark ML (+Notebooks, +Model Registry, +Feature Store)

Snowpark ML (+Notebooks, +Model Registry, +Feature Store)

Snowpark ML (+Notebooks, +Model Registry, +Feature Store)

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved