Analysis ServicesDatabaseDevelopmentSQL 2014

Multidimensional ROLAP partitions and updatable columnstore indexes – the new black?

OLAP CubeI came across a colleague of mine, who asked me if the new updatable columnstore index and ROLAP partitions in a Multidimensional cube is the new trend of fast and no-latency Business Intelligence.

Well – here is my 25 cents.

I’ll start with the updatable columnstore indexes.

With SQL Server 2014 Microsoft introduces updatable columnstore indexes. Which in short terms defines that the columnstore no longer has to be dropped/disabled when loading data to the table. When the new data is loaded to the table, it is therefore first loaded to a temporary deltastore where background processes splits it into rowgroups, does indexing and compression.

This progress can be observed through the DMV sys.column_store_row_groups. Books online documents that every rowgroup can have 3 statuses: OPEN, CLOSED and COMPRESSED.

OPEN: A rowgroup in read/write state that is accepting new records. An OPEN rowgroup is still in rowstore format and has not been compressed to columnstore format yet.

CLOSED: A rowgroup which is filled and therefore locked for read/write. This rowgroup still needs compression

COMPRESSED: A rowgroup that has been locked and compressed and a part of the CS index.

The first two states are not yet part of the CS Index – but resides as internal objects in the database. They are not to be found in sys.objects or sys.partitions. They are there… They can be found trough the sys.system_internals_partitions view.

The latter one above, sys.partitions, only shows row groups that are in COMPRESED state and can therefore give wrong answers if it’s used to for example calculate space used.

Therefore, when a query is executed against a table that contains a CS index that has rowgroups in OPEN or CLOSED state we will not get full benefit of the CS index, as the engine then needs to read from the rowgroups and not the CS index. Even worse is it when data already resides in the CS index and new data is loaded to the table. Then the engine needs to look at both the CS index and the rowgroups. There is no guarantee that the CS index contains all the data that is required for the query, so even if it is so, then a scan of the rowgroups are still executed.

One could wish that the background process of getting the rowgroups from state OPEN to COMPRESSED are blazing fast. But it is not. It is slow slow slow. A quick test on a virtual machine (4 cores 6 GB ram) shows that the background process of getting the rowgroups to COMPRESSED state is averaging with approx. 180.000 rows pr. second.

Next up – the ROLAP partitions in Multidimensional

I will not go through the explanation of building a ROLAP partitions as they are found in very good blogs around the BI universe.

But I’ll take a look at the way the SSAS engine gets the data from a ROLAP partition. Based in the dimensional relationships inside the SSAS cube the engine generates a t-sql statement to get the data.

This statement is not pretty and is not optimized in any way. It cannot be altered or user defined in any way. I have seen these queries in Xevents – and they do not perform very good overall.

Conclusion

If the data is fully loaded every night in the layer just before the cube, and this data is more than a couple of million rows, then the combination is not a good option. It will take way to long time for the background process to get the CS index up and running, resulting in very poor performance at the end users.

If the updated data is small (a couple of 100.000 rows) and is only loaded at nighttime, then there can be arguments to use the combination. But then again, why not just load the data to a normal table and process the corresponding partition.

So – it depends. Partly on the solution build, the environment and the architecture.

Happy coding.

Leave a Reply

Your email address will not be published. Required fields are marked *