Update SCD Type 2 dimension in one single transaction using only T-SQL

Posted on Posted in Database, Development, SQL 2014, SQL 2016, T-SQL

Merge logo new

Recently I got a request inside my organization to make sure that a dimension would keep track of the changes due to requrementes from the business.

This needed to be done in a single transaction in pure T-SQL code.

So – what to do and how to do it. Here’s one way.

The sourcetable looks like this:

tablesource

The request was to keep track of changes in the ManagerId according to CaseId.

I’ve created a SCD2 table like this:

CREATE TABLE [dbo].[CaseProjectManagerHistory](
	[dwid] [bigint] IDENTITY(1,1) NOT NULL,
	[CaseId] [int] NULL,
	[ManagerId] [int] NULL,
	[dwDateFrom] [date] NULL,
	[dwDateTo] [date] NULL,
	[dwIsCurrent] [bit] NULL,
	[dwChangeDate] [date] NULL
)

The fields are as follows:
dwid: Identifier for the table
CaseId: The caseid for the rows
ManagerId: The managerid for the row
dwDateFrom: The date from where the row is actual
dwDateTo: The date to where the row is actual
dwIsCurrent: Boolean that tells if the row is the current one or not
dwChangeDate: The date of the change (if the row has changed since the first write)

If you need to catch up on the history types in a dimension – then take a look at Kennie’s blogppost HERE.

First of all I started out with a merge statement that would insert all the new values not in the table and update the ones that needed update.

Something like this:

merge dbo.CaseProjectManagerHistory as target
	using (select CaseId, ManagerId, cast(getdate() as date) as startDate, datefromparts(2199,1,1) as endDate, 1 as [current], cast(getdate() as date) as changeDate from dbo.[Case]) as source
	on target.CaseId = source.CaseId
	when not matched by target
		then
			insert (CaseId, ManagerId, dwDateFrom, dwDateTo, dwIsCurrent, dwChangeDate)
			values (source.CaseId, source.ManagerId, source.startDate, source.endDate, source.[current], source.changeDate)
	when matched 
		and target.dwIsCurrent = 1
		and exists (select source.CaseId, source.ManagerId
					except
					select target.CaseId, target.ManagerId)
		and target.dwChangeDate <= source.ChangeDate
		and source.changeDate < target.dwDateTo
		then
			update set dwIsCurrent = 0, target.dwChangeDate = source.changeDate, target.dwDateTo = dateadd(d,-1,source.startDate)

Those of you who haven’t tried and worked with a merge-statement – you can get the 101 from BOL here.

But this merge statement only inserts new rows and updates existing rows. The rows that are updated still needs to be in the table in order to fully apply to the SCD 2 rules.

This can be done by using the cluse ‘output’ from the merge-statement and then use the output rows to insert into the same table.

It will look like this:

insert into dbo.CaseProjectManagerHistory_demo (CaseId, ManagerId, dwDateFrom, dwDateTo, dwIsCurrent, dwChangeDate)
select CaseId, ManagerId, startDate, endDate, [current], changeDate 
from (
	merge dbo.CaseProjectManagerHistory_demo as target
	using (
		select 
			CaseId
			,ManagerId
			,cast(getdate() as date) as startDate
			,datefromparts(2199,1,1) as endDate
			,1 as [current]
			,cast(getdate() as date) as changeDate 
		from 
			dbo.[Case]
		where 1=1
			and caseid in (2005,2013,2015,2016,2019,2021,2023,2025,2027,2028)
			) as source
	on target.CaseId = source.CaseId
	when not matched by target -- indsæt nye rækker
		then
			insert (CaseId, ManagerId, dwDateFrom, dwDateTo, dwIsCurrent, dwChangeDate)
			values (source.CaseId, source.ManagerId, source.startDate, source.endDate, source.[current], source.changeDate)
	when matched -- opdater eksisterende rækker
		and target.dwIsCurrent = 1
		and exists (select source.CaseId, source.ManagerId --filtrer kun på rækker der ikke allerede eksisterer i target
					except
					select target.CaseId, target.ManagerId)
		and target.dwChangeDate <= source.ChangeDate
		and source.changeDate < target.dwDateTo
		then
		update set dwIsCurrent = 0, target.dwChangeDate = source.changeDate, target.dwDateTo = dateadd(d,-1,source.startDate) 
		output $action ActionOut, source.CaseId, source.ManagerId, source.startDate, source.endDate, source.changeDate, source.[current]) as mergeOutput
where mergeOutput.ActionOut = 'UPDATE';

The mergestatement ‘output’ action is used to insert the same rows to the history table once more. The only change is the ‘end date’.

Happy coding!

Note: I did a short presentation with this at my workplace a few weeks ago, and here Kennie (l, b, t) told me that there is a bug in the merge statement that needs to be taken into account. Read more of that here.

Gem

Gem

Gem

Gem

Gem

Gem

Gem

Gem

Gem

Leave a Reply

Your email address will not be published. Required fields are marked *