Snowflake Cortex AI: ML Classification

Classification is a fundamental machine learning task that involves categorizing data points into predefined classes or labels. This powerful capability enables you to gain deeper insights into your structured data, identify patterns, and make informed predictions or recommendations. Snowflake Cortex simplifies the classification process, offering an intuitive interface and robust algorithms to build, evaluate, and deploy models that accurately classify your data.

Why Classification Matters

In a business context, classification can be applied to a wide range of use cases, including:

Customer Churn Prediction: Predict which customers are likely to churn based on their behavior and demographics.
Lead Scoring: Classify leads based on their potential to convert, helping sales teams prioritize their efforts.
Fraud Detection: Identify fraudulent transactions by classifying them as legitimate or suspicious.
Product Recommendation: Recommend products to customers by classifying them into groups with similar preferences.
Sentiment Analysis: Classify customer feedback or social media posts as positive, negative, or neutral to understand brand perception.

These are just a few examples. The ability to categorize data based on patterns and relationships opens up a world of possibilities for data-driven decision-making.

Note: In addition to scripting as you’d see in this chapter, you can also create Cortex classification models using Snowsight's visual interface, which simplifies the process by guiding you through each step.

A Practical Example: Segmenting Customers by Spending with ordersdb

Let's use the ordersdb sample data model to demonstrate how to classify customers into three categories based on their total spending: "High Spenders," "Medium Spenders," and "Low Spenders."

1. Setting Up Your Data

⛁ Sample Data Model: ordersdb-data-model

Make sure you have the ordersdb sample data model loaded into Snowflake. Then, set the context to the ordersdb.public schema:

>_ SQL

-- Assign the database schema
USE SCHEMA ordersdb.public

2. Preparing and Splitting Your Data

First, calculate the total spending for each customer and create a view (customer_spending) summarizing this information along with a spending_category column that will serve as our target for classification:

>_ SQL

-- Create a view to aggregate total spending per customer and assign spending categories using NTILE
CREATE OR REPLACE VIEW customer_spending AS (
    SELECT
        c.customer_id,
        SUM(si.quantity * si.extended_price) AS total_spending,
        0 as spending_category  -- For the classification
    FROM customer c
    JOIN sales_order o ON c.customer_id = o.customer_id
    JOIN sales_order_item si ON o.order_id = si.order_id
    GROUP BY c.customer_id
)

Next, split your customer_spending view into training (80%) and testing (20%) sets based on the customer_id:

>_ SQL

-- Calculate the customer ID at the 80th percentile for splitting
CREATE OR REPLACE TABLE split_info AS
SELECT PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY customer_id) AS split_id FROM customer_spending;

-- Create a training view with customers below the 80th percentile
CREATE OR REPLACE VIEW customer_spending_train AS
SELECT * 
FROM customer_spending
WHERE customer_id <= (SELECT split_id FROM split_info);

-- Create a testing view with customers above the 80th percentile
CREATE OR REPLACE VIEW customer_spending_test AS
SELECT * 
FROM customer_spending
WHERE customer_id > (SELECT split_id FROM split_info);

-- Verify the split by counting rows in each view
SELECT COUNT(*) FROM customer_spending_train; -- Should be approximately 80% of the total rows
SELECT COUNT(*) FROM customer_spending_test;  -- Should be approximately 20% of the total rows

3. Building the Classification Model

Create a classification model (customer_segmentation_model), specifying the training data, target column (spending_category), and input features (in this case, just total_spending):

>_ SQL

-- Create a classification model using the training data
CREATE OR REPLACE SNOWFLAKE.ML.CLASSIFICATION customer_segmentation_model (
    INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'customer_spending_train'),
    TARGET_COLNAME => 'spending_category',
    CONFIG_OBJECT => {'on_error': 'skip'}  -- Handle any potential errors during training
)

4. Evaluating Model Performance and Generating Predictions

Use the trained model to generate predictions on your testing data and store the results:

>_ SQL

-- Generate predictions on the testing view and store in a table
CREATE OR REPLACE TABLE customer_spending_predictions AS
SELECT *, customer_segmentation_model!PREDICT(INPUT_DATA => object_construct(*)) AS prediction 
FROM

5. Refining Prediction Results

Create a view (customer_spending_classifications) to interpret the predictions, including additional information for analysis:

>_ SQL

-- Create a view to interpret and categorize the predictions
CREATE OR REPLACE VIEW customer_spending_classifications AS
SELECT Customer_ID, Total_spending,
       prediction:class::string AS class,  -- Raw prediction class
       CASE
           WHEN prediction:class::string = '1' THEN 'High'
           WHEN prediction:class::string = '2' THEN 'Medium'
           WHEN prediction:class::string = '3' THEN 'Low'
           ELSE 'Unknown'                   -- Handle any unexpected classes
       END AS spender_category
FROM

Now, you can visualize the results in Snowsight.

6. Identifying High-Value Customers for Targeted Campaigns

Create a table (customer_campaign_high_spenders) to store the customers identified as "High" spenders:

>_ SQL

-- Create a table for high spenders
CREATE OR REPLACE TABLE customer_campaign_high_spenders AS
SELECT * FROM customer_spending_classifications WHERE spender_category = 'High'

Retrieve the information of high spenders for your marketing campaigns:

>_ SQL

-- Select high spender information from customer dimension table including email for targeting
SELECT hs.customer_id, hs.total_spending, hs.spender_category, c.email
FROM customer_campaign_high_spenders hs 
INNER JOIN customer c ON

Conclusion

By following these steps, you can effectively segment your customers based on spending patterns using Snowflake Cortex's classification capabilities. This information can be invaluable for creating tailored marketing campaigns.

High-value customers identified through this process can be easily exported to your Customer Data Platform (CDP) for sophisticated campaign management. Alternatively, you can leverage Snowflake's native Send email capabilities for direct outreach, ensuring your message reaches the right audience with the right offer.

This targeted approach drives customer engagement, fosters loyalty, and ultimately leads to higher sales and increased revenue for your business.

Resources