Classic AI / ML (Data Science)
Jun 1, 2024
Snowflake Cortex AI: ML Anomaly Detection
Uncovering anomalies in your data is vital for understanding unusual patterns, identifying potential problems, and ultimately making more informed business decisions.
Whether it's a sudden spike in sales due to a successful marketing campaign or an unexpected dip caused by supply chain disruptions, anomaly detection provides valuable insights that might otherwise go unnoticed.
Snowflake Cortex simplifies this process, allowing you to pinpoint outliers and investigate the underlying causes.
Note: In addition to scripting provided in this chapter, you can also create Cortex anomaly detection models using Snowsight's visual interface, which streamlines the process by prompting you for input and generating the corresponding SQL script.
Hands-On Anomaly Detection with the OrdersDB Data
Let's use the ordersdb sample data model to illustrate how to detect anomalies in your daily sales figures.
1. Setting Up Your Data
⛁ Sample Data Model: ordersdb-data-model
Make sure you have the ordersdb sample data model loaded into Snowflake. Then, set the context to the ordersdb.public schema:
>_ SQL
2. Preparing and Splitting Your Data
Preparing Your Time-Series Data
We'll use the daily_sales
view (created in the forecasting section) as our time-series data source for anomaly detection. Remember, this view aggregates sales data by day.
Next, divide your daily_sales data into two equal halves based on the date. The first half will train the model, and the second half will be analyzed for anomalies.
In this example, we're filtering on SHIP_MODE = 'MAIL'
and return_flag = 'N'
to focus on standard mail shipments without returns. You can customize these filters based on your specific interests.
>_ SQL
3. Building the Anomaly Detection Model
Create the anomaly detection model (daily_sales_anomaly_model
), providing the training data, timestamp column, and the target column (total_sales
):
>_ SQL
4. Detecting and Analyzing Anomalies
Use the DETECT_ANOMALIES
function to identify potential outliers in the sales data and store the results. This will produce a table (daily_sales_anomaly_predictions
) with details about each day, including whether an anomaly was detected (IS_ANOMALY
).
>_ SQL
Isolating Anomalies
You can create another table (daily_sales_anomalies_only
) to specifically store the days that were flagged as anomalies. This table can be used for further investigation or as input for alerts or automated actions.
>_ SQL
5. Visualizing Anomalies in Snowsight
After calling the daily_sales_anomaly_model!DETECT_ANOMALIES method, you can visualize the results as shown below. To do so:
Select Charts: Click on the "Charts" tab above the results table in Snowsight.
Configure Data:
In the "Data" section, select the VALUE (actual sales) column and set Aggregation to "None."
Select the TS (timestamp) column and set Bucketing to "None."
Add the LOWER_BOUND and UPPER_BOUND columns, also setting Aggregation to "None."
Add Anomaly Column: Click "Add Column" on the right side and select IS_ANOMALY.
Generate Chart: Click on "Chart" to display the visualization.
You should now see a line chart with TS on the X-axis and VALUE, LOWER_BOUND, and UPPER_BOUND on the Y-axis. Points where IS_ANOMALY is true will be highlighted, indicating anomalies.
Further Exploration
You can use SHOW_EVALUATION_METRICS and SHOW_TRAINING_LOGS to gain deeper insights into the model's performance and training process.
Experiment with incorporating other features from your daily_sales view (e.g., SHIP_MODE, return_flag) as exogenous variables to potentially improve anomaly detection accuracy.
Resources
Anomaly Detection (Snowflake Cortex ML Functions): https://docs.snowflake.com/en/user-guide/snowflake-cortex/ml-functions/anomaly-detection
Getting Started with Snowflake Cortex ML-Based Functions: https://quickstarts.snowflake.com/guide/ml_forecasting_ad/index.html