Repeat last known value
ETL SQL Window functionsDemo scripts below.
Disclaimer: This technique is by no means something I have come up with, I’ve just needed it enough times that I figured I might as well put it into my own words, for future reference.
Background
Suppose you have a collection of time series data, recording some status or other measurement at a point in time.
In an OLTP system we probably don’t want to repeat the same value every time we take a measurement, or we may only record changes made to a ledger when they occur.
For reporting purposes, however, many things get easier if we have complete time series data - if something hasn’t changed, we just want to repeat the last value we saw.
In this article, we want to build a report for a local sports club that shows the membership status of our club members by date. The member administration system only records membership changes, e.g. when memberships start or end.
Let’s say we have a MemberStatus table that looks something like this:
| Date | Member | Status |
|---|---|---|
| 2025-01-01 | Alice | Active |
| 2025-01-04 | Bob | Active |
| 2025-01-06 | Alice | Passive |
| 2025-01-10 | Alice | Active |
| 2025-01-10 | Bob | Passive |
| 2025-01-13 | Alice | Passive |
| 2025-01-14 | Bob | Inactive |
| 2025-01-17 | Alice | Inactive |
The problem
To support reporting requirements, we want to count how many members are active at any given date. This would be easy if data looks something like this (original data rows are highlighted):
| Date | Member | Status |
|---|---|---|
| 2025-01-01 | Alice | Active |
| 2025-01-02 | Alice | Active |
| 2025-01-03 | Alice | Active |
| 2025-01-04 | Alice | Active |
| 2025-01-05 | Alice | Active |
| 2025-01-06 | Alice | Passive |
| 2025-01-07 | Alice | Passive |
| 2025-01-08 | Alice | Passive |
| 2025-01-09 | Alice | Passive |
| 2025-01-10 | Alice | Active |
| 2025-01-11 | Alice | Active |
| 2025-01-12 | Alice | Active |
| 2025-01-13 | Alice | Passive |
| 2025-01-14 | Alice | Passive |
| 2025-01-15 | Alice | Passive |
| 2025-01-16 | Alice | Passive |
| 2025-01-04 | Bob | Active |
| 2025-01-05 | Bob | Active |
| 2025-01-06 | Bob | Active |
| 2025-01-07 | Bob | Active |
| 2025-01-08 | Bob | Active |
| 2025-01-09 | Bob | Active |
| 2025-01-10 | Bob | Passive |
| 2025-01-11 | Bob | Passive |
| 2025-01-12 | Bob | Passive |
| 2025-01-13 | Bob | Passive |
In this case, we write no rows for inactive members.
How do we fill in the blanks when the data we’re recording can have gaps in it?
The solution
There are several ways to do this: We could just run a loop for every day, check the status of the previous day and check if there is a new event on the current day, but this isn’t very set-based.
We can also use window functions to repeat the last known value.
SQL Server supports the LAST_VALUE analytic window function. However, the default behaviour of this function doesn’t overwrite NULLs.
In SQL Server 2022 and later, an optional IGNORE NULLS/RESPECT NULLS option is added, which actually lets us use this window function for a very simple solution (see Solution 1).
On older versions of SQL Server, or if we are using a SQL flavour that doesn’t support a similar LAST_VALUE window function, we can still get what we want, but we need an extra step (see Solution 2).
Demo scripts
Prepare environment
/* Create a very simple Date dimension */
DROP TABLE IF EXISTS dbo.Dates;
CREATE TABLE dbo.Dates (
Date DATE NOT NULL
);
;WITH CTE_base AS (
SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS v(n)
), CTE_numbers AS (
SELECT rw = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM CTE_base AS ones
CROSS JOIN CTE_base AS tens
)
INSERT INTO dbo.Dates (Date)
SELECT date = CAST(DATEADD(DAY, rw, '2024-12-31') AS DATE)
FROM CTE_numbers
WHERE rw <= 31;
/* Create a very simple Members dimension */
DROP TABLE IF EXISTS dbo.Members;
CREATE TABLE dbo.Members (
Member NVARCHAR(255) NOT NULL
);
INSERT INTO dbo.Members (Member)
VALUES
('Alice'),
('Bob');
/* Create sample data */
DROP TABLE IF EXISTS dbo.MemberStatus;
CREATE TABLE dbo.MemberStatus(
Date DATE NOT NULL,
Member NVARCHAR(255) NOT NULL,
Status NVARCHAR(50) NULL
);
INSERT INTO dbo.MemberStatus (Date, Member, Status)
VALUES
('2025-01-01', N'Alice', N'Active'),
('2025-01-04', N'Bob', N'Active'),
('2025-01-06', N'Alice', N'Passive'),
('2025-01-10', N'Alice', N'Active'),
('2025-01-10', N'Bob', N'Passive'),
('2025-01-13', N'Alice', N'Passive'),
('2025-01-14', N'Bob', N'Inactive'),
('2025-01-17', N'Alice', N'Inactive');
Solution 1 (SQL Server 2022 or newer)
Depending on your flavour of SQL, for example SQL Server 2022, you may have access to the Window function LAST_VALUE, with the IGNORE NULLS option.
This makes everything significantly easier:
;WITH CTE_last_value AS (
SELECT d.Date,
c.Member,
actual_value = ms.Status,
last_value = LAST_VALUE(ms.Status) IGNORE NULLS OVER(PARTITION BY c.Member ORDER BY d.Date)
FROM dbo.Members AS c
CROSS JOIN dbo.Dates AS d
LEFT OUTER JOIN dbo.MemberStatus AS ms
ON c.Member = ms.Member
AND d.Date = ms.Date
)
SELECT Date,
Member,
actual_value,
last_value
FROM CTE_last_value
/* Include only members with active/passive status */
WHERE last_value IN (N'Active', N'Passive')
ORDER BY Member, Date;
Solution 2 (universal)
If you’re on older versions of SQL Server, or use a different SQL flavour that doesn’t have something like LAST_VALUES, this solution works just as well.
It’s a bit more verbose, but it works with any flavour of SQL that supports window functions:
;WITH CTE_LeftHandSide AS (
SELECT d.Date,
c.Member
FROM dbo.Members AS c
CROSS JOIN dbo.Dates AS d
), CTE_Group AS (
SELECT lhs.Date,
lhs.Member,
ms.Status,
groups = COUNT(ms.Status) OVER(PARTITION BY lhs.Member ORDER BY lhs.Date)
FROM CTE_LeftHandSide AS lhs
LEFT OUTER JOIN dbo.MemberStatus AS ms
ON lhs.Member = ms.Member
AND lhs.Date = ms.Date
), CTE_last_value AS (
SELECT Date,
Member,
actual_value = Status,
last_value = MAX(Status) OVER(PARTITION BY Member, groups ORDER BY Date)
FROM CTE_Group
)
SELECT Date,
Member,
actual_value,
last_value
FROM CTE_last_value
/* Include only members with active/passive status */
WHERE last_value IN (N'Active', N'Passive')
ORDER BY Member, Date;
How does it work? In CTE_Group, using COUNT(Status) as a window function generates an accumulating count of rows where the Status column is not NULL:
| Date | Member | Status | COUNT | last_status |
|---|---|---|---|---|
| 2025-01-01 | Alice | Active | 1 | Active |
| 2025-01-02 | Alice | NULL |
1 | Active |
| 2025-01-03 | Alice | NULL |
1 | Active |
| 2025-01-04 | Alice | NULL |
1 | Active |
| 2025-01-05 | Alice | NULL |
1 | Active |
| 2025-01-06 | Alice | Passive | 2 | Passive |
| 2025-01-07 | Alice | NULL |
2 | Passive |
| 2025-01-08 | Alice | NULL |
2 | Passive |
| 2025-01-09 | Alice | NULL |
2 | Passive |
| 2025-01-10 | Alice | Active | 3 | Active |
| 2025-01-11 | Alice | NULL |
3 | Active |
| 2025-01-12 | Alice | NULL |
3 | Active |
| 2025-01-13 | Alice | Passive | 4 | Passive |
| 2025-01-14 | Alice | NULL |
4 | Passive |
| 2025-01-15 | Alice | NULL |
4 | Passive |
| 2025-01-16 | Alice | NULL |
4 | Passive |
| 2025-01-17 | Alice | Inactive | 5 | Inactive |
| 2025-01-18 | Alice | NULL |
5 | Inactive |
| 2025-01-19 | Alice | NULL |
5 | Inactive |
| … | … | … | … | … |
Since the Status value is unique per COUNT frame, the following CTE_last_value uses the MAX aggregate to apply the value in the frame over any NULLs.
Note that the choice of the MAX aggregate function is arbitrary - you could use MIN if you prefer.