Improving performance with covering index.

To appreciate this post you must read the post  when fast is not enough first.

I was asked some weeks ago, ‘can you explain your PHP structure assembling program?’.
Why? I asked.
‘It seems we have a problem, it takes about 10 hours to assemble the spare parts structure for the CPD factory!’

Ten hours is definitely too much, in this spare BOM tree there are some 8000 structures or bills of meterial and some +400000 tree nodes. I expected this to take some 15 minutes at the most.
Something must be awful wrong, so I decided to take a look myself first. Sure enough it was the tree assembling PHP program that took some 10 hours. It was setup to distribute the work over 7 parallel threads like this:
This was obviously just cut and pasted from a job with a much larger BOM where the chunks assigned to threads was optimized for that special BOM. As you can see from the forevery iterator, the first chunk takes the first 11000 bills or top nodes. In this example we only have 8000 BOMs to assemble, which means all go into the first chunk effectively single-threading the assembly job, in this case it would probably be better to split the BOMs evenly over 7 chunks like this:
Here we do not try to optimize each chunk, it will be good enough to split the BOMs evenly (on 7 chunks) and parallel process them all. Said and done, now we were down to about 1 hour 20 minutes, which was expected since we now run seven parallel threads, still this was not good enough, since I know the PHP program iterates the same SQL query for every node I took a look into PhpMyAdmin to see what’s going on:
We run this query over and over again, and a close examine of the table we found an index missing. In this case you really want a covering index as this will drastically improve performance.  This is easily fixed:

We just add an alter statement in the job creating the table.
(A covering index is an index that satisfies a query without accessing the table.)

And now we are down to about three minutes for the tree assembly job.

The moral of this post is: indexes are good, but optimizing too much is not necessarily good.


Good Luck Camilla

This post is the hardest I will probably write in this blog, normally I do not have problems finding words but today I have. There are many thoughts but few words...

One year ago I wrote a post on the progress of the Data Warehouse. It was not a happy post since a highly esteemed member of BI development team left the company.
Today I have to write yet another sad post - Camilla leaves the company after some five years of brilliant work. Apart from being a brilliant Business Intelligence developer she took the lead and added much needed structure, good work processes, workflows and good will to The Data Warehouse.
She will be sorly missed by us all.

But the Data Warehouse must go on. When someone leaves some other have to step in.  This time it is Henrik that will carry on the good work together with newcomer Lucas.

Henrik scores on the football pitch.

I’m confident Henrik and Lucas together with the users will run the Data Warehouse successfully, and carry Camillas legacy forward.