From an Excel sheet Button, a signal is sent to a Linux ETL system, that starts a process to fetch ‘delta data’ from a SAP system and update the Business Intelligence data storage in this case a MySQL database.
The merge delta load method
Another approach – user triggered Delta Load
User triggered Delta Load – an example.
Under the hood
Appendix B - ETL De lta load times
When I read this amusing story about a 500 mile email problem , I come to think of another email story I witnessed about fifteen years ago. I was doing consultant work for a large company in Sweden. I have done many mistakes and caused some real bad disasters myself, but this one I only observed.
Stockholm Underground Transport (SL) had raised traffic fare prices and a lady employee of the company was very upset about the raise. She wrote an email urging her colleagues to protest against the raise and addressed it to all colleagues in Stockholm, or so she thought she did. I received her mail at about 09.00 in the morning, after reading the mail I purged it and didn’t thought more of it. But soon I began to get responses to her email some in favor of the lady and some against. And my network connection started to get sluggish, then I realized the recipients of the mails was the entire company about 40.000 employees, I do not recall the exact figure but it was a lot of recipients and to make things worse due to a merger with an almost equally big company their mail systems had been connected the week before so the mail reached almost the double amounts of recipients. About noon time the network was down I only received a few ‘response mail’ an hour. The network was flooded with these mails. In the afternoon people started to be aware of the problem and now mails started to appear from bosses in both companies telling the lady her use of the mail system was improper and she was to blame for the network outage, of course they responded to all making things worse. Later in the afternoon mails from network admins started to appear begging all to stop mailing. It took the admin guys two days to get back to normal again.
If I only had one word to define me as a professional creative is the word I would use. The best quality an IT professional can possess is an analytical mind. I am analytical, but it does not match my creativity.
Recently I have had reasons to consider the twain traits, analyticity and creativity. Here I define creativity as the ability to create computerized models of the real world or cyber models of reality, and analyticity as the ability to understand such cyber models and find faults in them. I was contemplating over the best eight  guys I met in the business. I was analyzing their strengths and weaknesses and found they all are analytical (no surprise), but to my surprise I considered them relatively weak in creativeness. One of these guys once asked me for advice he had a problem with the general ledger in an accounting application. He gave me such a detailed explanation of the problem I thought he was giving me a solution, but when I outlined the solution he called me a genius, (this guy is a genius with an IQ above 150). You can focus on a problem so much you do not see the solution, this happens to me too, but more often to my more analytical friends.
I am starting to believe these two traits are XOR, you can be very creative or very analytical but not both. Of these eight guys the one I consider least analytical is the most creative of them (and vice versa).
Eight is too few to draw far fetching conclusions from, but I wonder. I suppose I have assumed being creative and analytical goes hand in hand. To be successful in IT you must be analytical, being a successful creative IT professional you may be a compromise, balancing analyticity and creativity. If I’m right a creative IT professional should be less creative than successful professionals in trades where analytical skills are less important, since the creative IT guy have an analytic mind to drag along.
 My selection criteria ‘I should have worked with them sufficiently to know they are top quality IT professionals’. I count to eight such guys. I will not name or describe them since some of them would resent being on my list in public.
Qlikview the good and the bad.
Qlickview parallel processing.
What is in a warehouse storage bin?
Data transformations for viewers.
Accessibility, quality and reuse of data (and definitions).
These pictures are included just to give an idea what transformation code may look like in Qlickview and SQL.
Google summer of Code is a great event, giving young people the opportunity to work with hopefully good mentors and gain an insight in the part of the IT world that is system design and programming. Programming is a word obsolete not used so much anymore. Who tells you I’m a programmer these days? Web designer, systems designer, application developer, project leader, database modeler - yeah, but programmer - no.
Each summer vacation if I do not have anything better to do , I try study a subject in some detail, something I then can use in my daily work. Last summer I did some web development with CouchDB to learn Nosql databases (no I have no use for CouchDB in my work), two years ago I had better things to do, three years ago I created a Dekiwiki/MindTouch web site for use as a ‘virtual project workplace’ only to know we already had ‘standardized’ around the Lotus Notes Quickr product.
I have been following the development of PERL6 since 2002 or so and that I really like. I started to write Perl 6 a few times, but Perl 6 is not ready yet. Creating something genuinely new takes time and there are some lovely constructs in there. I will definitely learn Perl 6, it’s part of the future. Perl 6 is coming to us this year, of that I’m certain. The guys behind Perl 6 are true heroes of endurance. And Carl Mäsak (one of the heroes) is my favorite blogger. Carl’s posts are often brilliant, full of wit and humor and wisdom about IT and beyond.
But for my summer of code I decided to go for D. The D language has a lot of nice features and it comes with an assembler. It’s not mainstream - I like that, but it has the potential to become mainstream - I like that even more.
Year 2001 I needed a simple scripting language in the Linux environment, I wanted to build a simple job controller I choose PHP (in favor of PERL). PHP was new and fresh and I never heard about it, so I thought I should give it a try. Still I can do whatever I need in PHP, it’s a good scripting language, no matter what programmers not using PHP say (and they do). The animosity is interesting, some guys are really explicit when they express their dislike of PHP. And it’s fun to read it is impossible to do with PHP what you do with PHP.
In my previous post about Job scheduling with PHP , I described two entities the context and the schedule. The context is where all configurations and descriptions of source systems go. The schedule is the entity we schedule for execution. The entity where we describe the job we want to do is called job. Jobs are chained together in schedules and can be included in a schedule from a job library or explicitly declared directly in the schedule. Before we look at the job I need to explain return codes.
This is a bit complicated, if you are not for details go directly to Summary .
Previous in my posts on job scheduling with PHP I have explained why I use Boolean return codes (with a few exceptions). Normally you declare a schedule with mustcomplete=’yes’ , which means execution stops if any return code is FALSE. This is what you normally want, stop at point of failure, correct and rerun the schedule. But sometimes you need to cleanup or automatically fix the problem by executing error correcting jobs or you like to build schedules with logic like ‘if month end run job allocateNewMonth’. You do this by prereqs, a prereq is basically a Boolean gate that is either open or closed, a job prereq determines if a job should execute or not, if the prereq is FALSE the job is bypassed. This means that the result of a job is not strictly Boolean, it can successfully execute, fail or be bypassed. The ‘bypassed’ condition defaults to TRUE/success, you can change this by bypassed=’false’ in the schedule.
To allow for failures you turn off normal error checking by stating ‘ mustcomplete=no ’ in the schedule, then all error checking must be done explicitly in the schedule.
Summary: Job return codes are Boolean. If a FALSE return code is detected execution of the schedule is intercepted. This default behavior can be changed.
How many times must a job execute to become a success? In most job control systems I have seen the answer is ONCE only. In my job scheduler the answer is - it defaults to zero or more times . By default a job is executed ONCE. The job iterator determines how many times a job executes. The job iterator is a table and the job is executed once for each row, the job iterator is also a placeholder for symbolic variables a.k.a. @tags. Job iterators are immensely powerful, but for now the job iterator determines how many times a job executes and can contain @tags.
The job iterator is declared by the xml tag <forevery> within a job.
The job is declared by the xml <job> tag.
A job is a unit of work consisting of five optional execution elements:
1. init actions; (operation system commands) executed prior to the job type action.
2. the job type action; which is an SQL script or a PHP script or function.
3. nested jobs.
4. exit actions; (operation system commands) executed after the job type action.
5. guard action; a php script that executes in case of unsuccessful excecution of 1,2,3 or 4.
Job prereqs can be used to fine tune execution logic.
Now this may seem awfully complicated but it is not. You only add what you need and that is almost always only an SQL script or a PHP script. Does this mean I always have to create SQL and PHP scripts? Yes and No. You write SQL scripts, but very seldom PHP scripts those you need are already written, e.g:
I want to create an HTML table report and mail it to Kalle Kula. The data is in Mysql table mytable.
This schedule consists of 2 jobs.
The first job creates the report formatted as HTML. The second job mails the job with the help of the prewritten PHP script sendmail.
There are two tags in there THETABLE points to the result HTML table produced by job 1, and the second tag THECSS point to a prewritten CSS template file.
Now suppose Kalle tells us, ‘ Please send the data for sales area ‘uppsala’ to my colleague Peggy Piggelin and please send the reports as Excel sheets ’. We have to do some changes to our schedule, these changes can be done in several ways. I show you one way to do it:
As you can see we have added a dummy job with a <forevery> job iterator which consists of two rows with the columns NAME,EMAIL and SALES areas. In this new dummy job we execute the two original jobs, first for row one in the iterator then for the second and last row. To change the output from an HTML table to an MS Excel sheet we only changed SQL converter.
In real life you store the recipients in a database table and create the job iterator with an SQL query. The report sql queries are probably not hard coded in the job but stored in files in a suitable directory.
The post PHP, MySQL, Iterators, Jobs ,Templates and Relation sets , explains templates and shows alternate iterators.
The post PHP parallel job scheduling - 1 explains how you can parallel execute jobs.
Other Examples can be found here .
I end my post about my job scheduler by displaying sqlconverter_GoogleDocs01.php
This code uploads the SQL result table to Google docs. I think this is very cool : )