In my last post I have looked at the overhead using a preceding load causes. In this post, I am going to look closer at whether the data type (or what comes closest to a datatype in QlikView) or the number of distinct values have a bigger influence on the performance. The result surprised me…
Number of distinct values
We’ll compare the performance when copying one table into another (with an additional preceding load layer) mit fields of different numbers of distinct values. Each time we copy 1 column and 1000 rows, but these contains either 100, 200, 300, 500, 800 or 1000 distinct values. Which result did I expect? According to a post on the QlikView Design Blog on Symbol tables and Bit-stuffed pointers I expected fewer bits to be read and written with the fewer distinct values, as the pointer in the symbol table is respectively smaller (100 values = 7 Bits, 1000 values = 10 Bits).
Here is the result:
To my surprise, I could hardly find any difference between the variations.
Different data types
Let’s now look at the comparison of different data types. Here we copy (again with an additional preceding load layer) 1 column and 1000 rows. But now with a constant 100 distinct values and different data types.
Integer: 2, 4, 6, etc. (no closed sequence of Integers to avoid the sequential Integer optimization)
Float: 0.9, 1.9, 2.9, etc.
Text: 1Text, 2Text, 3Text, etc.
According to above logic, they should all behave the same, as the number of bits in the symbol table is the same. Here is the result:
The Float and Integer processes are at roughly the same level. But they are both about 80% slower, than the Text datatype. Again a result, which I didn’t expect.
Copying with a Preceding Load
Out of curiosity, I repeated above scenarios again without the preceding load layer. First again the variation over the number of distinct values:
And now the variation of the datatype:
As we can see, the result of the variation over number of distinct values is the same as above (there seems to be no dependence). But what is interesting here is, that we also don’t see a dependence on the data type anymore.
Finding a conclusion is difficult on this one so far. Either I have not understood something about the way QlikView processes the data or I have made a mistake somewhere. But what’s the use of doing experiments, when the results are always what you expect, right?
If you can contribute anything to the demystification of the above, I’d be happy to receive a comment either here in the blog or on the Qlik Community thread I created for the discussion. I’ll update this post, once we have created some clarity on this.
If you found this post interesting and don’t want to miss the next, why not subscribe to my RSS feed on the left.