Classic AI / ML (Data Science)
Jun 1, 2024
Snowflake Cortex AI: ML Forecasting
In the world of data-driven business, accurate forecasting is a critical component of success. Snowflake Cortex, with its powerful machine learning capabilities, empowers you to predict future trends with confidence be it sales, inventory or patient visits to providers.
This guide will walk you through the process of building and evaluating sales forecast models using Snowflake Cortex, incorporating both simple and more sophisticated techniques.
Note: In addition to scripting, as demonstrated here, you can also create Cortex forecast models using the intuitive UI in Snowsight. This visual interface prompts you for input and automatically generates the corresponding SQL script, making it accessible for both technical and non-technical users.
A Practical Example: Sales Data Analysis
If you haven't done so already; run and setup the OrdersDB sample data model and proceed with the rest of the exercises.
⛁ Sample Data Model: ordersdb-data-model
>_ SQL
To practice the steps of building sales forecast models using Snowflake Cortex, we'll begin by assuming you have sales data available in tables named Sales_Order and Sales_Order_Item within your ORDERSDB database.
Preparing and Exploring Your Sales Data
Create a view (daily_sales) to consolidate your sales data. This view will aggregate daily sales totals while keeping track of relevant attributes like shipping mode (SHIP_MODE) and return status (return_flag). The SQL code for this step is:
>_ SQL
Getting Acquainted with Your Data:
Before diving into forecasting, take a moment to understand your data. Calculate key statistics like the minimum and maximum dates in your dataset, the range of sales values, averages, medians, and standard deviations. It also shows us the interval of actual sales data up to Aug '98. We'll predict/forecast out from there.
This information provides a valuable baseline for evaluating the accuracy of your forecasts. You can achieve this using a simple SELECT query on your daily_sales view:
>_ SQL
Splitting Data and Creating a Basic Forecast
To assess the accuracy of your forecast models, you need to split your data into a training set (used for building the model) and a testing set (used for evaluating the model's performance). A common practice is to use 80% of the data for training and 20% for testing. This step involves calculating a split date and creating views for your training (daily_sales_std_noreturn_train) and testing (daily_sales_std_noreturn_test) data:
>_ SQL
Building Your First Simple Univariate Forecasting Model:
Start by creating a simple forecast model, daily_sales_std_noreturn_model. This model doesn't utilize any additional variables besides the date and the sales values. The process includes defining the model, generating predictions, and storing them in a table (daily_sales_std_noreturn_forecasts). You can then visualize your forecasts against the actual sales data.
First, build a forecast model (daily_sales_std_noreturn_model) using the training data:
>_ SQL
Generate predictions and store them in a table (daily_sales_std_noreturn_forecasts):
>_ SQL
Visualizing the Forecast
Combine historical data with predictions to create a visual representation of your forecast, including confidence intervals.
To visualize this model in Snowsight as show above:
Run the following query to prepare your data for visualizing your forecasts against the actual sales data.:
>_ SQL
Switch to the "Chart" tab in Snowsight.
In the Data section, select the ACTUAL column and set Aggregation to "None."
Select the DATE (or TS) column and set Bucketing to "None."
Add the FORECAST column, choose "Use as Line", and set Aggregation to "None."
Add the LOWER_BOUND and UPPER_BOUND columns in the same way.
You should now see a line chart displaying your historical sales data (ACTUAL), the forecasted values (FORECAST), and the upper and lower bounds of the prediction interval. This visualization helps you understand the model's predictions over time.
Incorporating More Factors into the Forecast
Exogenous Variables: To make your forecast more comprehensive, you can incorporate exogenous variables like shipping mode and return status. This involves creating new views that include these variables and training a new model that leverages this additional information.
Create new training (daily_sales_train) and testing (daily_sales_test) views with these variables, then train a model (daily_sales_forecasts_model) that utilizes them:
>_ SQL
Evaluating Model Performance:
Inspection and Analysis: Snowflake Cortex provides convenient functions to inspect the accuracy and internal workings of your model. Use SHOW_EVALUATION_METRICS to get performance metrics, EXPLAIN_FEATURE_IMPORTANCE to understand the impact of different variables, and SHOW_TRAINING_LOGS for detailed diagnostics.
Evaluate both models using Snowflake's built-in functions to gain insights into their accuracy, feature importance, and training logs:
SQL
This allows you to compare the performance of the simple and enhanced models, understand which features contribute most to the predictions, and diagnose potential issues.
Conclusion
By following these steps, you'll have a solid foundation for utilizing Snowflake Cortex to create accurate sales forecasts that can drive your business decisions. Remember to adapt and refine this process based on the specific characteristics of your data and business requirements.
Resources
Time-Series Forecasting (Snowflake Cortex ML Functions): Time-Series Forecasting (Snowflake Cortex ML Functions)
Getting Started with Snowflake Cortex ML-Based Functions: Getting Started with Snowflake Cortex ML-Based Functions