Beware of preceding loads!

Intro

For all, who don’t know what preceding loads are, the following blog posts are a great introduction:

https://community.qlik.com/blogs/qlikviewdesignblog/2013/03/04/preceding-load
http://www.quickintelligence.co.uk/preceding-load-qlikview/
http://qlikviewcookbook.com/2014/08/preceding-load-is-elegant/

The Authors mention, that after they got to know that functionality for the first time, they wondered, how they ever did without it. I totally agree: many problems in loadscripts can be realized elegantly, simply and with great readability with the help of preceding loads.

Preceding load Overhead

You have to be aware though, that the handoff from one layer to the next comes with a certain overhead. Let’s consider the following ficticious code fragement:

No preceding load, several columns, 2 columns

It simply copies columns from one table to another and renames them at the same time. The time required to do this develops linearly from 1 column on the left, to 5 columns on the right. Kind of what we expect.

result - no preceding load, x columns

With an additional preceding load level, it would look like this (Table1 only has those two columns in this case):

Preceding load, several columns, 2 columns

For this example, we have a linear development as well, but the fifth column already needs 183% more time, while it had only been 67% more in the example above.

result - preceding load, x columns

But only when we now let the two options compete against each other, we see the entire the difference in its entirety.

result - no preceding load vs preceding load, x columns

One column with a preceding load already needs 347% more time, than without it. At five columns we have already reached a premium of 656% (1.264% against 167%).

Test Setup

Table 1 contains 1.000 rows of randomly generated numbers. Then this table is copied 100 times into another (copy to Table2_1, Table 2_2 etc.) over 100 runs (10.000 copies in total). The time it takes to copy 100 times is measured and then broken down to one copy. In each of the 100 runs, we randomly decide, which option we want to run. With all this, I hope to have created enough randomness and increased the measuring period enough to get robust data.

Once the test setup has further stabilized, I’ll publish another post on that.

Conclusion

Without a doubt, preceding loads are extremely handy and elegant and every developer should have them in her toolkit. You also shouldn’t necessarily loose sleep over the performance impact right away. At 1.000 rows and 5 columns, we are talking about a difference of 0,01 sec on my machine. But when the number of rows rises, so does the absolute impact on performance.

10.000 rows: 0,09
100.000 rows: 0,91
1.000.000 rows: 9,11

And we all know, that 1.000.000 rows in a QlikView application is the norm, rather than the exception.

Outlook

Next week, we’ll dig further into this and see, if the performance impact depends on the “data type” (integer vs. string, probably not) or on the number of distinct items in the field (more likely).

If you found this post interesting and don’t want to miss the next, why not subscribe to my RSS feed on the left.

Sandro

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>