Qualify

This is the second post about some Snowflake features, that I use often and find useful to write a better SQL. You can find the first of the series about aliasing here.

What is it?

Have you ever been in the situation where you want to keep all rows of your query but than filter the result set based on a condition you would get from a subset of it? Let me clarify this with some examples:

Let’s stick with this second example. In that case you would have a table looking like this, let’s call it purchases:

customer_id product purchased_date (YYYY-MM-DD)
1 A 2020-04-01
1 B 2020-05-03
2 A 2020-03-01
3 B 2020-04-01


So customer 1 and 3 last purchase was product B, and customer 2 last purchase was product A. Now suppose you want to filter in your data only the customers that purchased B as last product, for an analysis you’re doing.

What you would do, without using the qualify statement, would be something like this:

with product_purchased_order as (
select 
  customer_id
  , product
  , row_number() over (partition by customer_id 
      order by purchased_date desc) as n_last_order_nr
from purchases
)
select customer_id 
from product_purchased_order
where n_last_order_nr = 1 and product ='B'

You see that I have to use at least two queries: first the window statement and the where clause to filter the results I want. There are many other way to get to the same result, but let’s see how to simplify the query.

The qualify expression allows to rewrite the statement above to read nicely and in one single query. Here you go:

select 
  customer_id
  , product 
  , last_value( product ) over (partition by customer_id 
      order by purchased_date) as last_product_purchased
from purchases
qualify last_product_purchased = 'B'

Note the benefits of using this second option:

When analysing something, you often want to filter by a condition on a partition, but keep all the records at a lower granularity – that is where the qualify come in handy. Note also how using the alias in the query above made the statement more clear.

Where does the qualify clause this fit?

From the Snowflake general query syntax reference you get this general outline:

[ WITH ... ]
SELECT
   [ TOP <n> ]
   ...
[ FROM ...
   [ AT | BEFORE ... ]
   [ CHANGES ... ]
   [ CONNECT BY ... ]
   [ JOIN ... ]
   [ MATCH_RECOGNIZE ... ]
   [ PIVOT | UNPIVOT ... ]
   [ VALUES ... ]
   [ SAMPLE ... ] ]
[ WHERE ... ]
[ GROUP BY ...
   [ HAVING ... ] ]
[ QUALIFY ... ]
[ ORDER BY ... ]
[ LIMIT ... ]

So, the qualify sit in the query order right after the having (if present) but before the order by and the limit clauses.

It’s also possible to not include the window function you are filtering by in the select list, which is good in case you don’t want to show the value you are filtering on, for example in the previous query something like:

select 
  customer_id
  , product
from purchases
qualify last_value( product ) over (partition by customer_id 
    order by purchased_date) = 'B'

Which could be good if the additional field is redundant or used just for filtering.

That’s it for today! Hope you’ll start getting rid of some extra CTEs / sub-queries, thanks to the qualify or, why not, have better time writing the logic to get some fancy slice of your data, based on whatever complex condition.

Have you used the qualify statement? What are the pros and cons about it? Any warnings that you want to share? Let me know!