2015-01-25

Sunday morning

Stockholm 2015-01-25. Cautious swan on thin ice, the ice squeaked for every step.

This morning I replace the internal Data Warehouse server switch. The old one started to misbehave last week, so I replace it with one from my private stash. While I was alone in the network I upgraded the japanese Data warehouse server to Ubuntu 12.04.05. I still find it cool to administer a server on the other side of the globe.  

2015-01-18

PHP 7



hill.jpg
Stockholm January 2015, view from the office.


The other data I found and read the PHP RFC, these includes changes already made for PHP 7, the next major version of PHP. There is no really exciting things for me. My use of PHH is very unusual, I use PHP mostly for shell scripting and an interpreter for my Integration Tag Language, the latter use is probably the most awkward use of PHP imaginable. My biggest concern for every new major version of PHP will the SAP RFC extensions work, SAPNWRFC and SAPRFC, for PHP 7 I’m sure some tinkering will be needed.
My own wishes for PHP are better ways of parse and execute PHP code dynamically, eval is what I use and I just don’t get my head around that instruction, it is trial&error sessions each time I use it. Better parallel execution would be nice, i.e. simpler parallel invoking as it is today it’s fairly complex to fork children or sub-tasks in your PHP code and communicate between them. Tail call optimization is a feature I would like to see in PHP, since it allows for more efficient recursion code and I like recursions.


PHP Deprecated:  iconv_set_encoding(): Use of iconv.input_encoding is deprecated in scriptC1.php on line 49

If all deprecated code is removed from PHP 7 I have some work to do, especially UTF-8 related code. I’m in favour of removing deprecated code, let it live over two major versions then remove it. I chosed PHP for my Data Warehouse project not because it was the best or most stable language, but because it was new and there was a vibrant community pushing  the language forward, PHP looked fun. If removal of deprecated code mean simpler maintenance and a lower footprint of PHP I do not mind doing some extra work, which will make my code better in the end. You just cannot add new stuff, obsolete code should go away.

2015-01-09

Happy birthday!

New years eve I wrote it seems to be a new year every year, the same is true for birthdays, I grow one year older every year. At my age birthdays are not happy anymore. A quiet dinner with my sons maybe, I do not know yet we’ll see.
This week religion showed its ugly face again, and once again we see the profiteers of evil. I can see the hunt for the terrorists more or less live on TV, why I ask myself. No matter the reason it gives a hell of a lot to the media (people)  more work and (extra) income, self proclaimed spokesman lead manifestations for the free word etc, xenophobics can spread their venom, all profiteers of evil. The other day there was a lady on radio live from Paris ‘It is time to ban intolerant opinions’, yes she was serious. Today we have profiteers of evil speculating together with live pictures from Paris on zillions TV channels. I do not need this.


I use to talk about profiteers of tragedies, you know all priests, therapists et al that shows up as soon there is a tragedy, to offer professional help, not so much for those directly affected by the tragedy, but the surrounding masses of vultures. Now we see a more sinister next of kin, the profiteers of evil are coming of age.
My heart do not bleed for those merely remotely affected of calamities or the profiteers. My thoughts goes to those really affected and those who tries to protect us and catch the bad guys.
What is next? Journalists embedded with terrorists?

I had in mind writing something completely different, this is far from what I usually write. But it is my birthday and my blog so what the...

2015-01-07

Importing Qlikview logs into MySQL

I have for a very long time tried to come up with ways to measure and compare Business Intelligence Activities. So far I have not come up with something that holds water. It is complicated just to measure the activity in one BI system alone, compare two different systems are even more complex. What I have in mind is to capture some figure of all activities in the Data Warehouse, this will at least give an indication of the use of the Data Warehouse.
The stats I capture today is batch jobs and MySQL queries. This is just an indication of the activities it does not give any hint of the quality or the value of the system.
Qlikview is becoming more and more popular among the Data Warehouse users as a viewer, so I felt it is appropriate to include Qlikview in the overall activities of the Data Warehouse. And this is what this post is about.

When I created The Data Warehouse Movie I had a hard time to parse the QV log, the columns in the log were not separated, it was just space in between and column entries could include space and missing entries was just missing. This time I hex displayed the log and I found columns were separated by hex ‘09’ a whitespace (tab) character, much simpler to parse the logs with that knowledge. The next hurdle the Data Warehouse runs on Linux, Qlikview runs on Windows, I do not want to set up any procedures on the Qlikview server, but decided to grab the log files via a CIFS mount. I created this ITL procedure:
I’m very happy with this procedure, when I started I thought it would be very hard to import the logs, this is a walk in the park, kids play!
The second <action> tag in the <init> section specifies what logs should be imported, by specifying:
<action sync='yes' cmd='ls @WINMNT/Sessions_SSCSSEQVS002_2014-*.log > @J_DIR/logs.txt' dir='@J_DIR'/>
I downloaded all Qlikview session logs from 2014 in one go, it took some 140 seconds.
As it is setup now I will schedule this job at 01:00:00 and import yesterday’s logs.

I still like the Integration Tag Language, it’s simple, succinct and does the job. It should not be hard to read and understand the procedure.
The task that took the longer time was to define the MySQL table:

CREATE  TABLE IF NOT EXISTS qvlog
 (`ExeType` char(5),
 `ExeVersion` char(20),
 `ServerStarted` timestamp,
 `Timestamp` timestamp,
 `Document` varchar(200),
 `DocumentTimestamp` timestamp,
 `QlikViewUser` char(12),
 `ExitReason` varchar(64),
 `SessionStart` timestamp,
 `SessionDuration` time,
 `CPU` int unsigned,
 `BytesReceived` int unsigned,
 `BytesSent` int unsigned,
 `Calls` int unsigned,
 `Selections` int unsigned,
 `AuthenticatedUser` varchar(30),
 `IdentifyingUser` varchar(30),
 `ClientMachine` char(56),
 `SerialNumber` varchar(32),
 `ClientType` varchar(64),
 `ClientVersion` char(10),
 `SecureProtocol` char(3),
 `TunnelProtocol` char(3),
 `ServerPort` int unsigned,
 `ClientAddress` int unsigned,
 `ClientPort` int unsigned,
 `CalType` char(16),
 `CalUsageCount` varchar(25),
 Primary key (`SessionStart` , `AuthenticatedUser`)
 );
I trial&error the table definition a few times until loading was OK. If you happen to know the proper table definition please drop me a line.

Now you may say ‘This seems to be a bit awkward, why download Qlikview logs to MySQL, why not use Qlikview?’. That is a good question, we already have a Qlikview app for the logs, but I have the Data Warehouse twittering app written in ITL and all the other stats in MySQL, so I thought it would be nice to have the Qlikview stats in MySQL also.
If and when I have verified the downloaded data and implemented it in some app I probably write a second post.

2015-01-05

Wanted - Business Intelligence Developer


Petter is leaving the company- https://www.linkedin.com/jobs2/view/25807829

Petter is a highly appreciated colleague, I wish him the best in his new career. Fortunately he will stay in Stockholm so I will hopefully see him  from time to time, and he still owes me a beer :)



2015-01-03

Meta blogging - promoting my own posts

This morning I had nothing to do so I enjoyed myself by reading some of my own posts. It’s not entirely pleasant, not that I think the posts are uninteresting or of low quality, well a few are but in general I think the posts are interesting. The posts are written for a ‘narrow’ audience, some posts are written as documentation for myself and a few colleagues. For such posts I assume the reader knows the context, if not the posts can be incomprehensible or very hard to understand and of posts are badly written and my english far-from-perfect english doesn’t help. A good understanding of programming and interest in IT and Business Intelligence are prerequisites for many of my posts. And this is certainly true for the posts I want to promote in this post.

I have written a serie of post on the topic parallel programming with my own Integration Tag Language. These posts describes not only how parallelize program execution, but also how to right size parallel ‘business units’ of work, e.g. performant extraction of  information of 50.000 parts from a SAP system. This kind of optimization is not something I have found much information about. Normally parallel programming deals with optimization on the micro scale, non blocking I/O on file systems, I/O channel  programming or operation systems programming, but not e.g. how performantly assemble Bill of Materials from simple parent-child relations. I know there is a huge demand for this and I have seen some real examples examples in the ‘business world’ where such knowledge would have helped. I have also seen non existing examples, ‘you cannot do this it will take forever’.

I hope I’m not presumptuous or preposterous when I claim these posts give insight in a very common problem. Big Data proponents often claims they have the solution, (as for any other problem you might have, these guys have seen the light), but I have not seen Big Data solutions for e.g. extraction of information of 50.000 parts from a SAP system, (in that particular case there probably exists HANA solutions).

Anyway the first post:

pave the way for the post that explains right sizing of business problems with ITL parallel  programming:

This is complex, if you found this interesting but incomprehensible, please read all the posts beginning with Parallel processing of Workflows.        

Still interested but not getting it, drop me a note and I try to explain better.

Finally since I makes some claims here, please tell if you find errors or you find the claims preposterous, or you can give better advice.

2015-01-02

PhpMetrics 2.

Some days ago I run over my Data Warehouse code with PhpMetrics, interesting and at the same time depressing since my code didn’t score well. When i took a closer look at the result I realised the lowest scoring code was other projects I incorporated in the Data Warehouse. I removed most of the ‘external’ code and evaluated my code again, I do not see this as an improvement.  
My code after removal of ‘external’ code


When I had a closer look at the result, I found a lots of test code, often bad code still in the production code libraries, unfortunately it will probably take me a weekend of hard boring work to remove this code, but if I find time I will purge the test code. I was a bit surprised of the number of test shots in there, the number just grows with time.
Another interesting finding, most of the big reddies are code not touched for a long time, (most of the code is not touched for a long time), so it is stable and fairly complete code! However two of those red circles I have thought about rewriting since I’m not happy with the ‘code structure’, but doing that will take me a weekend per circle of extremely boring toil, so this will never happen.


I have put in some effort of study the underlying principles behind PhpMetrics, and that was pretty interesting, Halstead metrics and cyclomatic complexity, things I was/am just vaguely aware of, which is tools that can be used to evaluate ‘quality’ of code. Still I’m a bit sceptic, if a large piece of complex code executed thousands of times a day year in year out, never failed never changed comes out as bad code, I ask myself what is bad code?


I suspect my code suffers from PHP 4 (which was PHP when I wrote the code). PHP 4 was probably not the best of languages to write a language interpreter, with both lots of dynamic inclusion of code and dynamic interpretation (read eval). I probably have to adjust PhpMetrics to my PHP code, even though it feels like cheating when you play solitaire. But my Data Warehouse PHP code is a long way from a 'standard' PHP system.

There is no excuse for not using PhpMetrics, it is really a cool tool and very simple to use, it has already high lighted problems with my code. But before I can attack my code further with PhpMetrics I need to purge obsolete code, as it is now I do not see the forest for all the trees! Of course I want my code to turn up as green circles.