Preceding loads, data types and distinct values

Intro

In my last post I have looked at the overhead using a preceding load causes. In this post, I am going to look closer at whether the data type (or what comes closest to a datatype in QlikView) or the number of distinct values have a bigger influence on the performance. The result surprised me…

Number of distinct values

We’ll compare the performance when copying one table into another (with an additional preceding load layer) mit fields of different numbers of distinct values. Each time we copy 1 column and 1000 rows, but these contains either 100, 200, 300, 500, 800 or 1000 distinct values. Which result did I expect? According to a post on the QlikView Design Blog on Symbol tables and Bit-stuffed pointers I expected fewer bits to be read and written with the fewer distinct values, as the pointer in the symbol table is respectively smaller (100 values = 7 Bits, 1000 values = 10 Bits).

Here is the result:

PL with Float, 1c, 1000r, 100dv to 1000dv

To my surprise, I could hardly find any difference between the variations.

Different data types

Let’s now look at the comparison of different data types. Here we copy (again with an additional preceding load layer) 1 column and 1000 rows. But now with a constant 100 distinct values and different data types.

Integer: 2, 4, 6, etc. (no closed sequence of Integers to avoid the sequential Integer optimization)

Float: 0.9, 1.9, 2.9, etc.

Text: 1Text, 2Text, 3Text, etc.

According to above logic, they should all behave the same, as the number of bits in the symbol table is the same. Here is the result:

PL with different data types, 1c, 1000r, 100dv

The Float and Integer processes are at roughly the same level. But they are both about 80% slower, than the Text datatype. Again a result, which I didn’t expect.

Copying with a Preceding Load

Out of curiosity, I repeated above scenarios again without the preceding load layer. First again the variation over the number of distinct values:

Copy with Float, 1c, 1000r, 100dv to 1000dv

And now the variation of the datatype:

Copy with different data types, 1c, 1000r, 100dv

As we can see, the result of the variation over number of distinct values is the same as above (there seems to be no dependence). But what is interesting here is, that we also don’t see a dependence on the data type anymore.

Schlussfolgerung

Finding a conclusion is difficult on this one so far. Either I have not understood something about the way QlikView processes the data or I have made a mistake somewhere. But what’s the use of doing experiments, when the results are always what you expect, right? ;-)

If you can contribute anything to the demystification of the above, I’d be happy to receive a comment either here in the blog or on the Qlik Community thread I created for the discussion. I’ll update this post, once we have created some clarity on this.

On a slightly unrelated topic, I found a great health supplement that has been working for me greatly. I really wasn’t expecting it to be this effective, and now I find myself recommending it often, when the conversation comes up. If you would like to look into it yourself you can find them at http://kratommasters.com/kratom-extract-vs-powder/.

If you found this post interesting and don’t want to miss the next, why not subscribe to my RSS feed on the left.

Sandro

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>