Advanced Data Warehousing

May 23, 2024

Snowflake Higher Order Functions (HoF)

Higher-order functions are functions that can take other functions as arguments or return functions as results. In the context of data analysis, higher-order functions are crucial as they enable a more abstract and expressive way to manipulate and process data. They allow for the creation of reusable and composable code, facilitating tasks such as filtering, mapping, reducing, and transforming datasets with concise and readable syntax. By utilizing higher-order functions, data analysts can write code that is more modular, easier to understand, and maintainable, ultimately leading to more efficient and effective data analysis workflows.

For example, when dealing with JSON data, FILTER can be used to extract or filter out specific JSON objects or arrays based on criteria defined within the JSON structure. Similarly, when data is stored in Snowflake arrays, TRANSFORM allows for selective inclusion or exclusion of array elements based on conditions, providing a streamlined approach to data processing.

Benefits of Higher Order Functions over Lateral Flatten

  1. Readability: HOFs offer more concise and expressive syntax than LATERAL FLATTEN. They let you keep the focus on the transformation logic, avoiding the nested structure of LATERAL FLATTEN.

  2. Performance: HOFs can often execute faster than LATERAL FLATTEN, especially with complex transformations. They operate directly on the array data without the need to create a temporary table.

  3. Maintenance: HOFs lead to more maintainable code. The lambda expressions within HOFs clearly define the operation to be performed, making it easier to understand and modify the logic.

Quick Example: 

-- Higher-order function (HoF) example in SQL - Given an Iterable array; FILTER out even numbers and return only odd numbers
SELECT FILTER(ARRAY_CONSTRUCT(36, 21, 87, 14, 59, 42, 73, 5, 68, 92), 
              x -> MOD(x, 2) = 1) AS odd_values;

-- Output: [21, 87, 59, 73, 5]

A Practical Example: Sales Data Analysis

Let's explore how to use Snowflake Higher Order Functions (HoF) in a practical scenario.

⛁ Sample Data Model: salesdb-data-model

Setup the SalesDB sample data model and proceed with the rest of the exercises. This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities.

>_ SQL

// Assign the databaseschema used co-pilot
use schema

Create and load records into the customer table. 

>_ SQL

CREATE OR REPLACE TABLE customer_orders AS 
SELECT 1 AS customer_id, '2024-01-01' AS order_date, [{'device':'Cardiac Monitor', 'quantity':67, 'price':1500}, {'device':'Patient Monitor', 'quantity':1, 'price':1200}] AS order_detail UNION 
SELECT 2 AS customer_id, '2024-01-02' AS order_date, [{'device':'Insulin Pump', 'quantity':5, 'price':56}, {'device':'Hearing Aid', 'quantity':5, 'price':1000}] UNION 
SELECT 3 AS customer_id, '2024-01-03' AS order_date, [{'device':'MRI Scanner', 'quantity':3, 'price':3000}, {'device':'X-ray Machine', 'quantity':2, 'price':2500}] UNION 
SELECT 4 AS customer_id, '2024-01-04' AS order_date, [{'device':'Ultrasound Machine', 'quantity':4, 'price':4000}, {'device':'Surgical Robot', 'quantity':1, 'price':5000}] UNION 
SELECT 5 AS customer_id, '2024-01-05' AS order_date, [{'device':'Defibrillator', 'quantity':2, 'price':2000}, {'device':'Anesthesia Machine', 'quantity':3, 'price':3500}]

Let's delve into how Snowflake HoFs can supercharge your data workflow:

Task: You need to calculate the total revenue generated by each customer across all their orders and only show customers with revenue GREATER THAN 15000.

Without Higher Order Functions (HOF)

Verify that the records have been loaded into the table. Notice the last column is a Snowflake variant data type holding the Json payload.   

>_ SQL

SELECT * FROM

Let’s dial in and look at one specific record, where Customer ID is 1

>_ SQL

SELECT * FROM customer_orders where customer_id = 1

Notice one sample JSON payload has multiple records. This is an iterable we need to loop through to calculate the Total Revenue (i.e. Price * Quantity), and then FILTER where the value doesn’t meet the predicate (i.e. greater than 15000). 

[  {    "device": "Cardiac Monitor",    "price": 1500,    "quantity": 67  },  {    "device": "Patient Monitor",    "price": 1200,    "quantity": 1  }]

First we Flatten the records. To go from one record in a column, into multiple rows. 

>_ SQL

SELECT 
  customer_id, item.value
FROM 
  CUSTOMER_ORDERS,
  LATERAL FLATTEN(input => order_detail)

Pull out the values of interest from the ARRAY record

>_ SQL

SELECT 
  customer_id, item.value:quantity::NUMBER as Quantity, item.value:price::NUMBER as Price
FROM 
  CUSTOMER_ORDERS,
  LATERAL FLATTEN(input => order_detail)

Calculate the total_revenue with basic math.

>_ SQL

SELECT 
  customer_id, SUM(item.value:quantity::NUMBER * item.value:price::NUMBER) AS total_revenue
FROM 
  CUSTOMER_ORDERS,
  LATERAL FLATTEN(input => order_detail) item
  group by all

FILTER records that don’t meet the predicate of GREATER THAN 15000

>_ SQL

SELECT 
  customer_id, SUM(item.value:quantity::NUMBER * item.value:price::NUMBER) AS total_revenue
FROM 
  CUSTOMER_ORDERS as co,
  LATERAL FLATTEN(input => order_detail) item
  group by all
  having total_revenue >= 15000

Return the result to show the entire ARRAY of  records that weren’t filtered out. Just for readability and formatting purposes. 

>_ SQL

SELECT 
  customer_id, item.value
FROM 
  CUSTOMER_ORDERS as co,
  LATERAL FLATTEN(input => order_detail) item
  group by all
  having SUM(item.value:quantity::NUMBER * item.value:price::NUMBER) >= 15000

This is one way to achieve the task using the existing FLATTEN function. Next we would explore how to do that using the newer Higher Order Functions (HoF) in Snowflake. 

With Higher Order Functions

For this task, we would use HoF to identify high-value customers based on the total purchase amount.

First, Verify the records in the variant column as JSON

>_ SQL

SELECT

Using FILTER HoF to go through the iterable (i.e. order_detail) and apply a function to it that calculates price * quantity and checks the predicate. This only returns records that meet the check criteria (15000) in this case. Records that don’t meet that criteria are filtered or simply ignored in the result set. 

>_ SQL

SELECT customer_id, FILTER(o.order_detail, i -> (i:quantity::NUMBER * i:price::NUMBER) >= 15000)FROM

Notice, Null arrays are returned. Filter those out by adding a predicate in the where clause to check for array size. 

>_ SQL

SELECT 
    customer_id, FILTER(o.order_detail, i -> (i:quantity::NUMBER * i:price::NUMBER) >= 15000) AS filtered_orders
FROM customer_orders o
    WHERE ARRAY_SIZE(filtered_orders) > 0
    group by all

Comparison:

The HOF version is more concise and focuses on the transformation (UPPER) directly. It's also more flexible, as you can easily modify the lambda expression to perform other operations.

When to Use Each Approach:

  • HOFs: Preferred for simple transformations or filtering of array elements.

  • Lateral Flatten: Useful when you need to join the flattened data with other tables or perform complex aggregations that require a relational structure.

Additional Considerations:

  • Nested Arrays: HOFs are primarily designed for simple arrays. Lateral Flatten is more suitable for handling nested arrays.

  • Feature Support: Snowflake continues to enhance its HOF capabilities, so check the latest documentation for the most up-to-date functionality.

Resources

Snowflake Documentation: Querying Semi-structured Data | Snowflake Documentation

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

Join Our Community

Join our newsletter list for occasional updates, products and insights.

More from
Advanced Data Warehousing

Snowflake Data Quality Metrics/Functions

Snowflake Data Quality Metrics/Functions

Snowflake Data Quality Metrics/Functions

Snowflake Higher Order Functions (HoF)

Snowflake Higher Order Functions (HoF)

Snowflake Higher Order Functions (HoF)

Snowflake Data Classification & Tagging

Snowflake Data Classification & Tagging

Snowflake Data Classification & Tagging

Snowflake ASOF JOIN

Snowflake ASOF JOIN

Snowflake ASOF JOIN

Snowflake Soundex & Fuzzy Matching

Snowflake Soundex & Fuzzy Matching

Snowflake Soundex & Fuzzy Matching

Snowflake Aggregation Policies

Snowflake Aggregation Policies

Snowflake Aggregation Policies

Snowflake Projection Policies

Snowflake Projection Policies

Snowflake Projection Policies

Snowflake JMeter Load & Concurrency Test

Snowflake JMeter Load & Concurrency Test

Snowflake JMeter Load & Concurrency Test

Snowflake Memoizable Functions

Snowflake Memoizable Functions

Snowflake Memoizable Functions

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Snowflake Trail For Obervability (+Logs, +Events, +Traces)

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved

Copyright 2020-24 - All Right Reserved