introduction
Window functions are a powerful feature in SQL used to perform calculations across a set of rows related to the current row. Unlike aggregate functions, window functions do not aggregate rows into a single output; They return a result for each row while maintaining the context of the data set.
In this article, we’ll explore some commonly used SQL window functions (ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, LEAD()
and LAG()
) with examples.
Sample table: sales data
We will use the following sales table to illustrate the window’s functions:
Sales ID |
Customer ID |
project |
region |
amount |
Sale date |
---|---|---|---|---|---|
1 |
101 |
Laptop |
north |
1200 |
01-05-2023 |
2 |
102 |
tablet |
north |
800 |
02-15-2023 |
3 |
103 |
phone |
north |
800 |
03-10-2023 |
4 |
104 |
tablet |
north |
500 |
2023-04-01 |
5 |
105 |
Laptop |
south |
1300 |
05-05-2023 |
6 |
106 |
tablet |
south |
700 |
06-20-2023 |
7 |
107 |
phone |
West |
900 |
07-15-2023 |
8 |
108 |
Laptop |
east |
1300 |
2023-08-10 |
1.ROW_NUMBER()
The ROW_NUMBER() function assigns a unique number to each row within a section, ordered by a specific column.
a task: Assign a unique row number to each sale within a region based on the sale amount (highest to lowest).
SELECT SalesID, Region, Amount,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Amount DESC) AS RowNum
FROM Sales;
a result:
Sales ID |
region |
amount |
Row number |
---|---|---|---|
1 |
north |
1200 |
1 |
2 |
north |
800 |
2 |
3 |
north |
800 |
3 |
4 |
north |
500 |
4 |
5 |
south |
1300 |
1 |
6 |
south |
700 |
2 |
7 |
West |
900 |
1 |
8 |
east |
1300 |
1 |
2. sort()
The RANK() function assigns a rank to each row within a section. Rows with the same values ​​get the same order, and the next order is skipped.
a task: Ranking of sales within each region by amount (from highest to lowest).
SELECT SalesID, Region, Amount,
RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS Rank
FROM Sales;
a result:
Sales ID |
region |
amount |
Rank |
---|---|---|---|
1 |
north |
1200 |
1 |
2 |
north |
800 |
2 |
3 |
north |
800 |
2 |
4 |
north |
500 |
4 |
5 |
south |
1300 |
1 |
6 |
south |
700 |
2 |
7 |
West |
900 |
1 |
8 |
east |
1300 |
1 |
Main advantage:
- For the North region, both sum = 800 rows share second place.
- The next rank is skipped (for example, rank 3 is missing) and jumps to rank 4.
3.DENSE_RANK()
The DENSE_RANK() function assigns ranks like RANK(), but does not skip ranks after links.
a task: Assign density ranks to sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
DENSE_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS DenseRank
FROM Sales;
a result:
Sales ID |
region |
amount |
DenseRank |
---|---|---|---|
1 |
north |
1200 |
1 |
2 |
north |
800 |
2 |
3 |
north |
800 |
2 |
4 |
north |
500 |
3 |
5 |
south |
1300 |
1 |
6 |
south |
700 |
2 |
7 |
West |
900 |
1 |
8 |
east |
1300 |
1 |
Main advantage:
- For the North region, both sum = 800 rows share second place.
- The next rank is 3, without skipping ranks.
4. ntel()
The NTILE() function divides rows into a specified number of approximately equal groups.
a task: Divide all sales into 4 groups based on amount in descending order.
SELECT SalesID, Amount,
NTILE(4) OVER (ORDER BY Amount DESC) AS Quartile
FROM Sales;
a result:
Sales ID |
amount |
Quarter |
---|---|---|
5 |
1300 |
1 |
8 |
1300 |
1 |
1 |
1200 |
2 |
7 |
900 |
2 |
2 |
800 |
3 |
3 |
800 |
3 |
4 |
500 |
4 |
6 |
700 |
4 |
5. Lead()
LEAD() retrieves the value from the next row within the same partition.
a task: Compare each sale amount to the next sale amount, sorted by date of sale.
SELECT SalesID, Amount,
LEAD(Amount) OVER (ORDER BY SaleDate) AS NextAmount
FROM Sales;
a result:
Sales ID |
amount |
Next amount |
---|---|---|
1 |
1200 |
800 |
2 |
800 |
800 |
3 |
800 |
500 |
4 |
500 |
1300 |
5 |
1300 |
700 |
6 |
700 |
900 |
7 |
900 |
1300 |
8 |
1300 |
void |
6. lag()
LAG()
Retrieves the value from the previous row within the same section.
a taskCompare each sale amount with the previous sale amount, sorted by date of sale.
SELECT SalesID, Amount,
LAG(Amount) OVER (ORDER BY SaleDate) AS PrevAmount
FROM Sales;
a result:
Sales ID |
amount |
PrevAmount |
---|---|---|
1 |
1200 |
void |
2 |
800 |
1200 |
3 |
800 |
800 |
4 |
500 |
800 |
5 |
1300 |
500 |
6 |
700 |
1300 |
7 |
900 |
700 |
8 |
1300 |
900 |
conclusion
SQL window functions such as ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG() provide powerful ways to analyze data within partitions.
Key takeaways:
ROW_NUMBER()
Assigns a unique identifier to each row.RANK()
andDENSE_RANK()
They differ in how they handle relationships (skipping vs. not skipping).NTILE()
Useful for dividing rows into statistical groups.LEAD()
andLAG()
Allow comparisons with adjacent rows.
By mastering these functions, you can handle complex analytics and classification tasks effectively!
Thank you for taking the time to explore data insights with me. I appreciate your participation. If you found this information helpful, I invite you to follow or connect with me on LinkedIn. Happy exploring!👋