Dynamic partitioning tabular cube

Posted on Posted in Database, Development, Performance, SQL 2012

new_rubiks_cube_5In every project on Business Intelligence there comes a time when the code needs to be deplyed to the production environment.
No more development, no more manual work.

But what about the partitions on the tabular cube? Do we really need to tell and learn the DBA how to handle that on a periodic plan?

The answer is simple: No!

Thanks to the XMLA language, the DMV’s for SSAS instances (both tabular og multidimensionel) and SSIS we can do the partitioning dynamic based on the current data in the datawarehouse.

In below examble I’ve made partitions for every month, but as always you need to take the current architecture, dataflow, data deliveries etc into account when creating your sollution.

Here we go:

We need a table in order to keep track on the partitions and their metadata. Also a table to hold data from existing partitions in the Tabular cube:

-- Schema to hold tables for partitioning
CREATE SCHEMA [Partition]
GO

-- PartitionLog table
CREATE TABLE [Partition].[PartitionLog](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[PartitionName] [varchar](100) NOT NULL,
	[FromDateKey] [int] NOT NULL,
	[ToDateKey] [int] NOT NULL,
	[CreatedDate] [datetime] NULL,
	[LastProcessDate] [datetime] NULL,
	[ProcessedCount] [smallint] NULL,
 CONSTRAINT [PK_PartitionLog] PRIMARY KEY CLUSTERED 
(
	[PartitionName] ASC,
	[FromDateKey] ASC,
	[ToDateKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [Partition].[PartitionLog] ADD  CONSTRAINT [DF__Partition__Proce__38996AB5]  DEFAULT ((0)) FOR [ProcessedCount]
GO

-- Table to hold list of existing partitions. 
-- The table is updated with data from the SSIS package
CREATE TABLE

In SSIS I’ve created a dataflow from the SSAS Tabular instance with the list of partitions on the table I want to partition to the table ExitingPartitionsList.
We can do this thanks to the DMV’s for SSAS which also works for Tabular instances – see this link for further information.

The SQL-statement to get list of partitions (reference to msdn):

select distinct 
	PARTITION_NAME 
from 
	$SYSTEM.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS 
where 
	TABLE_ID='Måleraflæsninger_51b67466-b3e0-425a-9111-072e9d814c0b' 
order by 
	PARTITION_NAME

The tableID is found in the tables properties (right-click and choose ‘Properties’):
TableID

The controlflow looks like this. I hope it is somewhat self-explainable:
UpdateExistingPartitions

Now we need to take a look at the naming convention of the partitions. There is a need for the partition-name to have a suffix that tells the span of the partition. I’ve choosen [Tablename]-[fromdate]-[todate]. And in order to get usable data from the previous dataflow, I’ve made a view as below:

create view [Partition].[ExistingPartitions] as 

select
	[PARTITION_NAME]
	,substring(partition_name,CHARINDEX('-',Partition_name,1)+1,CHARINDEX('-',Partition_name,CHARINDEX('-',Partition_name,1)+1)-CHARINDEX('-',Partition_name,1)-1) as FromDateKey
	,substring(partition_name,CHARINDEX('-',Partition_name,CHARINDEX('-',Partition_name,1)+1)+1,len(partition_name)) as ToDateKey
from 
	[Partition].[ExistingPartitionsList]

Giving this output:
ExistingPartitions

My facttable (Måleraflæsninger) has a field that is used to filter the partition. In this case it is ‘datekey’. In order to get all possible partitions that are needed for the project I’ve made this view:

create view [Partition].[PossiblePartitions] as 

select distinct
	cast('Måleraflæsninger-' + cast(int_datekey as varchar(8)) + '-' + convert(varchar(8), dateadd(day,-1,dateadd(month, 1, datekey)), 112) as nvarchar(255)) as Name
	,int_datekey as FromdateKey
	,cast(convert(varchar(8), eomonth(datekey), 112) as int) as ToDateKey
from
	Fact.Måleraflæsninger
where
	day(datekey) = 1

Take notice that I’ve made the dates end at last day of the month. This is going to be used to generate the view for the partitions later on.
Based on these two views we can now generate the list of partitions that needs to be created in the Tabular project:

create view [Partition].[MissingPartitions] as 

select
	pp.Name
	,pp.FromdateKey
	,pp.ToDateKey
from
	Partition.PossiblePartitions pp
where
	pp.Name not in ( select PARTITION_NAME from Partition.ExistingPartitions)

Now I need to generate a set of XMLA’s. One to create a partition and one to process a partition. The easiest way to get these is to script them from the GUI in the SSAS Tabular instance.

1: Right-click the table that I’m working with and selecting ‘Partitions…’

CreatePartitionXMLA_1

2: Click ‘New partition’

CreatePartitionXMLA_2

3: Here specify the name – remember the convention – and the SQL statement. Here it is important to remember the filter criteria for the partitions. In this case it is one partition for every month.

CreatePartitionXMLA_3

4. Finally click the Script buttom and ‘Script Action to New Query Window’

CreatePartitionXMLA_4

Result – the areas I’ve highlighted are the ones that we need to parameterize in SSIS – more to come on that part in a bit.

CreatePartitionXMLA_5

The same way I’ve made a XMLA script to process the partition – the highlighted area is, again, to be parameterized later:

ProcesPartitionXMLA_1

Now we need to go to SSIS and make the logic and steps to accuire the dynamic partitioning.

First of all we need to make sure that all partitions that should be in the model also exists, and if they do not exists, we’øll create them.

The SSIS project now needs some variables as listed below:

VariablesList

The scope varies from project to project.

After all the variables has been defined, we need to make the two XMLA-variables as Expressions.

In the Expression builder add below codes to the respective variable:

CreatePartitionXMLA – replace the meta-data with the one that matches your sollution:

"<Alter AllowCreate="true" ObjectExpansion="ExpandFull" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Object>
    <DatabaseID>TabularKube</DatabaseID>
    <CubeID>Model</CubeID>
    <MeasureGroupID>Måleraflæsninger_51b67466-b3e0-425a-9111-072e9d814c0b</MeasureGroupID>
    <PartitionID>" + @[User::Partitionname]  + "</PartitionID>
  </Object>
  <ObjectDefinition>
    <Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400">
      <ID>" + @[User::Partitionname]  + "</ID>
      <Name>" + @[User::Partitionname]  + "</Name>
      <Source xsi:type="QueryBinding">
        <DataSourceID>f6bb153e-310c-41b4-9a15-9ed45e9dc8b6</DataSourceID>
        <QueryDefinition>SELECT [vFact].[Måleraflæsninger].*
FROM [vFact].[Måleraflæsninger]
WHERE [int_datekey] between " + (DT_WSTR, 8) @[User::FromDatekey] + " and " +  (DT_WSTR, 8) @[User::ToDatekey] + "</QueryDefinition>
      </Source>
      <StorageMode valuens="ddl200_200">InMemory</StorageMode>
      <ProcessingMode>Regular</ProcessingMode>
      <ddl300_300:DirectQueryUsage>InMemoryOnly</ddl300_300:DirectQueryUsage>
    </Partition>
  </ObjectDefinition>
</Alter>"

ProcessPartitionXMLA – replace the meta-data with the one that matches your sollution:

"<Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Type>ProcessFull</Type>
  <Object>
    <DatabaseID>TabularKube</DatabaseID>
    <CubeID>Model</CubeID>
    <MeasureGroupID>Måleraflæsninger_51b67466-b3e0-425a-9111-072e9d814c0b</MeasureGroupID>
    <PartitionID>" + @[User::Partitionname]  + "</PartitionID>
  </Object>
</Process>"

Now we are all set and just need to build the package.

The focus is now on a container like this:

ContainerCreateAndProcessMissingPartitions

The steps are to get all the missing partitions from our defined view and loop the result with both Create partition, Process partition and Insert data to PartitionLog.

Step 1:

Define a SQL task with the following statement:

select
	mp.Name as PartitionName
	,mp.FromdateKey
	,mp.ToDateKey
from
	Partition.MissingPartitions mp

Define the output to ‘Full resultset’ and map the result to the variable MissingPartitions.

Add a ForeahLoopContainer and define the container to reference the MissingPartitions object as a ADO source and map the data to the variables as below:

ForeachLoopCreatePartitionParameterMapping

Inside this container add two Analysis Services Execetute DDL Tasks and define a Analysis Services Connectione that matches the environment.

The first DDL (Create Partition) is defined like this:

CreatePartitionDDL

The second DDL (Process Partition) is defined like this:

ProcessPartitionDDL

The last step is to add a record in the PartitionLog table – add a SQL task with this statement:

select
	mp.Name as PartitionName
	,mp.FromdateKey
	,mp.ToDateKey
from
	Partition.MissingPartitions mp

And map the parameters like this:

InsertRecord toPartitionLogParameterMapping

Now, when the package runs, it will get a dataset of missing partitions, loop the dataset and create and process the partitions dynamically. At the end it creates a record in the partitionlog to keep track of this.

The last thing we need to do is to add a container to process the latest partition every time the package executes.

We need to build this:

ContainerProcessCurrentPartition

Again, add a SQL task and define it to get data from the view with current partition we created earlier:

insert into Partition.PartitionLog (
	[PartitionName]
	,[FromDateKey]
	,[ToDateKey]
	,[CreatedDate]
	,[LastProcessDate]
	,[ProcessedCount]
	) 
values (?,?,?,GETDATE(),GETDATE(),1)

Map the ‘Full resulset’ to the variable PartitionName.

The foreach loop container must be defined to use this variable and the data mapped like this:

ForeachLoopProcessLatestPartition

Add a Anaysis Services Exectute DDL task and define it to use the variable ProcesPartionXMLA. We can reuse the expression as it is defined as expression and uses the same logic in the expression.

Finally add a SQL task with below code:

select
	PartitionName
from
	Partition.CurrentPartition

Map the parameters like this:

UpdatePartitionLog

And there it is. All done.

Now every time the package is executed it will see to that missing partitions is created (in this case for every month start) and processed.
And it will make sure that the latest partition is updated with the latest data.

The processing time now takes very short time to do, as the only data that is processed is the latest one. Of course the first time the package is run it will create all the partitions and process them.

The whole code from above can be downloaded here:

Blog-code_DynamicPartitions

 

15 thoughts on “Dynamic partitioning tabular cube

  1. Hi Brian…I am trying to reproduce the above package but with having no success…can you kindly send me the SSIS Dynamic Partitions package…or provide me a path where I can download ….thanks….sandeep

  2. Hi Brian, if I generate an XMLA script via SSMS, it generates a GUID for the partition. Like: Internet Sales_9dda4033-5b57-4512-b413-89f66456cb61. And I thought that this is not changeable at all. I really would like to know, how do you get your own specified partitionname with fromdatekey and todatekey at the end?

    1. Hi Funda,

      The guid for the partition is changed in the create statement in SSIS. I’ve made a variable that builds the partitionname dynamic from the results from the view in the database. In this view I have the expected partition-name and to- and from-dates.

      I hope it helps – or else just write again.

      Best regards
      Brian

  3. Hello Brain,
    thanks a lot for your answer. I get your point. I implement it as you have described and it works fine.

    But there one more thing which I dont understand. Why do we need to process the current partition one more time at the end? Dont we do that already above in the first workflow? Why it is necessary?

    1. Hello again Funda,

      You actually don’t have to either. It is just that we allways need to proces the current partition. So in order to make sure that happens I’ve made that task at the end. You could implement that in only processes the current partition, if it has not allready been processed within a given timespan. That would be an easy fix in the view that generates the record for the processing task.

  4. Hi Brain, finally I have finished the package and understand your logic completely now. Thank you so much for sharing this solution and for answering me!

  5. Hi Brian,
    Thanks for above solution which is great and currently I am trying to develop similar solution for my academic project but I am getting an error at Create and process partition stage:
    Error- Package Validation Error (Package Validation Error)
    Error at Process New partition [Analysis Services Execute DDL Task]: DDL is not valid.
    Error at Process New partition: There were errors during task validation.

    (Microsoft.DataTransformationServices.VsIntegration)

  6. Hi Brian,
    Great work , It suits my exact requirement and I am trying to create but I am getting following error
    Error at Process New partition [Analysis Services Execute DDL Task]: DDL is not valid.
    Error at Process New partition: There were errors during task validation.

    (Microsoft.DataTransformationServices.VsIntegration)

    1. Hi Mazhar

      I sound like your dynamic SSAS DDL is incorrect. Are you closing all open ‘< ' with corresponding '>‘ and are your namingconvention valid (are the name for the partition usable in SSAS?
      I can help you furhter, but I need to look at the DDL (the dynamic and one compiled version) then – if you can provide it?

      Best regards
      Brian

      1. Hi Brian,
        Thanks for your replay, I rally appreciate your help.
        The following is Dynamic DDL code which is used in variable@ CreatePartitionXMLA

        twotables
        Model
        Query_9a029887-0684-4a4e-b072-e2287a13597b
        \” + @[User::Partitionname] + \”

        \” + @[User::Partitionname] + \”
        \” + @[User::Partitionname] + \”

        1c42ea41-53f9-40ac-971a-a1e72a88bd6d
        SELECT
        SALESDETAIL.SalesDTLID
        ,SALESDETAIL.SalesHDRID AS [SALESDETAIL SalesHDRID]
        ,SALESDETAIL.SalesHDRREFID
        ,SALESDETAIL.ItemID
        ,SALESDETAIL.UOMID
        ,SALESDETAIL.TaxID
        ,SALESDETAIL.ItemSellExcl
        ,SALESDETAIL.ItemSellIncl
        ,SALESDETAIL.tmpItemNewSellIncl
        ,SALESDETAIL.ItemCostExcl
        ,SALESDETAIL.ItemCostIncl
        ,SALESDETAIL.ItemNCostExcl
        ,SALESDETAIL.ItemNCostIncl
        ,SALESDETAIL.ItemGrandParentCostExcl
        ,SALESDETAIL.ItemGrandParentCostIncl
        ,SALESDETAIL.ItemGrandParentNCostExcl
        ,SALESDETAIL.ItemGrandParentNCostIncl
        ,SALESDETAIL.Quantity
        ,SALESDETAIL.Discount
        ,SALESDETAIL.ParentDiscount
        ,SALESDETAIL.PackingComment
        ,SALESDETAIL.ChildAccountID
        ,SALESDETAIL.PC
        ,SALESDETAIL.LPC
        ,SALESDETAIL.RPC
        ,SALESDETAIL.PackerID
        ,SALESDETAIL.PackerTime
        ,SALESDETAIL.PackerShort
        ,SALESDETAIL.DataEntryID
        ,SALESDETAIL.CheckerID
        ,SALESDETAIL.CheckerTime
        ,SALESDETAIL.Corrected
        ,SALESDETAIL.PackerIssue
        ,SALESDETAIL.Barcode
        ,SALESDETAIL.OrderIssueDescriptionID
        ,SALESDETAIL.ItemLoadLocationID
        ,SALESDETAIL.SSMA_TimeStamp AS [SALESDETAIL SSMA_TimeStamp]
        ,SALESDETAIL.COLORID
        ,SALESDETAIL.LastUpdateDate
        ,SALESDETAIL.CurrentPackerID
        ,SALESHEADER.SalesHDRID AS [SALESHEADER SalesHDRID]
        ,SALESHEADER.PrivateOrderNumber
        ,SALESHEADER.InvoiceStatusID
        ,SALESHEADER.AccountID
        ,SALESHEADER.ParentAccountID
        ,SALESHEADER.OrderDate
        ,SALESHEADER.OrderComment
        ,SALESHEADER.DeliveryDate
        ,SALESHEADER.InvoiceDate
        ,SALESHEADER.InvoicePrinted
        ,SALESHEADER.BuildFlag
        ,SALESHEADER.DeliveryDriver
        ,SALESHEADER.PackingSlipPrinted
        ,SALESHEADER.PackingFlag
        ,SALESHEADER.PackingBay
        ,SALESHEADER.TailoredOrdering
        ,SALESHEADER.CheckingFlag
        ,SALESHEADER.DeliveryAccountID
        ,SALESHEADER.DespatchDate
        ,SALESHEADER.CustRebateFlag
        ,SALESHEADER.CustRebateMargin
        ,SALESHEADER.SSMA_TimeStamp AS [SALESHEADER SSMA_TimeStamp]
        ,SALESHEADER.ExternalPONumber
        ,SALESHEADER.SSCCNumber
        FROM
        SALESHEADER
        INNER JOIN SALESDETAIL
        ON SALESHEADER.SalesHDRID = SALESDETAIL.SalesHDRID
        WHERE SALESHEADER.[InvoiceDate] between \” + (DT_WSTR, 8) @[User::FromDatekey] + \” and \” + (DT_WSTR, 8) @[User::ToDatekey] + \”

        InMemory
        Regular

        -PT1S
        -PT1S
        -PT1S
        -PT1S

        XMLA CODE generated using SSAS Partition GUI

        twotables
        Model
        Query_9a029887-0684-4a4e-b072-e2287a13597b
        Query_22c10a7d-97bf-4a44-bb60-7c2649d4f6de

        Query_22c10a7d-97bf-4a44-bb60-7c2649d4f6de
        Query

        1c42ea41-53f9-40ac-971a-a1e72a88bd6d
        SELECT
        SALESDETAIL.SalesDTLID
        ,SALESDETAIL.SalesHDRID AS [SALESDETAIL SalesHDRID]
        ,SALESDETAIL.SalesHDRREFID
        ,SALESDETAIL.ItemID
        ,SALESDETAIL.UOMID
        ,SALESDETAIL.TaxID
        ,SALESDETAIL.ItemSellExcl
        ,SALESDETAIL.ItemSellIncl
        ,SALESDETAIL.tmpItemNewSellIncl
        ,SALESDETAIL.ItemCostExcl
        ,SALESDETAIL.ItemCostIncl
        ,SALESDETAIL.ItemNCostExcl
        ,SALESDETAIL.ItemNCostIncl
        ,SALESDETAIL.ItemGrandParentCostExcl
        ,SALESDETAIL.ItemGrandParentCostIncl
        ,SALESDETAIL.ItemGrandParentNCostExcl
        ,SALESDETAIL.ItemGrandParentNCostIncl
        ,SALESDETAIL.Quantity
        ,SALESDETAIL.Discount
        ,SALESDETAIL.ParentDiscount
        ,SALESDETAIL.PackingComment
        ,SALESDETAIL.ChildAccountID
        ,SALESDETAIL.PC
        ,SALESDETAIL.LPC
        ,SALESDETAIL.RPC
        ,SALESDETAIL.PackerID
        ,SALESDETAIL.PackerTime
        ,SALESDETAIL.PackerShort
        ,SALESDETAIL.DataEntryID
        ,SALESDETAIL.CheckerID
        ,SALESDETAIL.CheckerTime
        ,SALESDETAIL.Corrected
        ,SALESDETAIL.PackerIssue
        ,SALESDETAIL.Barcode
        ,SALESDETAIL.OrderIssueDescriptionID
        ,SALESDETAIL.ItemLoadLocationID
        ,SALESDETAIL.SSMA_TimeStamp AS [SALESDETAIL SSMA_TimeStamp]
        ,SALESDETAIL.COLORID
        ,SALESDETAIL.LastUpdateDate
        ,SALESDETAIL.CurrentPackerID
        ,SALESHEADER.SalesHDRID AS [SALESHEADER SalesHDRID]
        ,SALESHEADER.PrivateOrderNumber
        ,SALESHEADER.InvoiceStatusID
        ,SALESHEADER.AccountID
        ,SALESHEADER.ParentAccountID
        ,SALESHEADER.OrderDate
        ,SALESHEADER.OrderComment
        ,SALESHEADER.DeliveryDate
        ,SALESHEADER.InvoiceDate
        ,SALESHEADER.InvoicePrinted
        ,SALESHEADER.BuildFlag
        ,SALESHEADER.DeliveryDriver
        ,SALESHEADER.PackingSlipPrinted
        ,SALESHEADER.PackingFlag
        ,SALESHEADER.PackingBay
        ,SALESHEADER.TailoredOrdering
        ,SALESHEADER.CheckingFlag
        ,SALESHEADER.DeliveryAccountID
        ,SALESHEADER.DespatchDate
        ,SALESHEADER.CustRebateFlag
        ,SALESHEADER.CustRebateMargin
        ,SALESHEADER.SSMA_TimeStamp AS [SALESHEADER SSMA_TimeStamp]
        ,SALESHEADER.ExternalPONumber
        ,SALESHEADER.SSCCNumber
        FROM
        SALESHEADER
        INNER JOIN SALESDETAIL
        ON SALESHEADER.SalesHDRID = SALESDETAIL.SalesHDRID

        InMemory
        Regular
        InMemoryOnly

        1. Hello again

          Sorry for my late reply.

          It seems like the generated XMLA from the GUI is not quite accurate. If you compare the one from you with the one from the blogpost, you’ll see differences in content.
          It seems like you’re missing the XML formating with ‘< ' and '>‘. Also statements like ALLOW CREATE etc.

          Try to see if you can get the correct XMLA from the GUI with the XML tags included…

  7. Hi Brian,

    Dors the solution which you mentioned above works for Azure Tabular model too? As I am not sure how to get partition information from Azure tabular model to my databse .

    Can you please suggest. I am trying to create the dynamic partition in Azure environment and need help on the sane.

    Thanks,
    Sanjay

    1. Hi Sanjay

      It depends – if the tabular datamodel is hosted on a server in azure then I’m sure it could be changed to fit that environment.
      On the other hand, if you are using Azures Tabular model as a service, I’m not sure it could work.

      I hope that you can make it fit – I’ll gladly help you if you need further assistance.

      Best regards
      Brian

Leave a Reply

Your email address will not be published. Required fields are marked *