In my previous post Job Scheduling with PHP - 1 I described the basics about job scheduling in general. Here I write about the background and the underlying design principles for my job scheduling system written in PHP.
I created my job scheduling system for a Business Intelligence BI System. An important process in any BI system is Extract Transform & Load (ETL). The ETL process is background job scheduling. Good scheduling is essential for BI systems. I didn't have any funds for my BI project so I had to use existing (scrapped) hardware and Free Software. I mentioned in my previous post I was (still am) not very impressed by Job Scheduling systems on the market. I am a programmer by trade and I had used scripting languages for controlling processes before. I looked around for software to use, and Linux and MySQL was easy ones to pick. A programming language was harder to find, first I looked for ReXX my scripting language par preference. But I couldn't find ReXX in Linux, the languages I found were PERL and PHP. PERL was the better language, my impression of PHP was a tool for simple Web apps. But I couldn't resist the challenge to use PHP for advanced background processing. I did the first ETL controller in two crude simple PHP scripts. scriptS.php (S as in start), and scriptF.php (F as in function) where I stored the functions used in scriptS.php. All ETL processes were hard coded and everything was very primitive, but it worked pretty well. But as the BI system grow, it became clear my two scripts were a dead end, my hard coded scripts were not scalable, I had to go back to the drawing board and begin from scratch again.
By now (2005-2006) Object Orientation had arrived in PHP, I'm not fond of OO programming, but a logging subsystem is perfect to objectify, so I learned PHP OO by creating a logger class before I started design version 2 of my job scheduler. Next I disconnected all configuration and job definitions from the PHP scripts. I wanted a strict but extendible syntax that was easy to parse. This was an easy pick; XML was the obvious choice, and PHP had a very simple and capable enough XML parser simpleXML. Now I had to define the environment , the scheduling and the jobs.
The environment defines execution elements like databases, programs, directories etc. all is defined in XML scripts, the main context script points to other XML scripts defining the total execution. Here you see a <sap> tag pointing to XML script defining a SAP system. The <prereq> are Boolean statements that must be true.
The schedule XML script defines a chain of jobs that is scheduled for execution with or without dependencies. Variant define startup parameters and the first job point to an XML script 'exp8_generate_iterators'.
This job XML Script defines the execution of series of SQL statements.
Having laid a sound foundation with three well defined entities (context, schedule & job) and a logger, I also needed an execution plan for my scheduling system. I decided to have divide execution into three phases :
- Read and parse all XML scripts and syntax check. And check prerequisites i.e. access to input files and subsystems like MySQL , predecessor conditions etc.
- Create the execution environment, it is a directory structure where all things from the execution of a schedule are stored, e.g. log files
- The actual execution of a schedule.
Phase 1 creates an execution tree which basically is the parsed XML files into a PHP array structure. This tree is passed to phase 2 where more 'things' are added to the executions tree as the execution environment is created. This environment is then passed to phase 3, which then executes the schedule job by job and records the outcome of each job into the execution tree.
This is a schematic view of a schedule execution (the picture is old and some entities have been renamed).
I often use my own Garbage In - Garbage out design pattern which means all not recognized is treated as noise and defaults are non destructive. This design pattern is both code friendly , the code do not have to consider unknown parameters etc, and user friendly you do not need to know all details, try the software you will not destroy anything by misspell a parameter or leave something out.
But you have to give defaults some thoughts, they should be non destructive and sensible. This is actually quite hard. I’m sure you many times have seen idiotic defaults.
Another design pattern I often use is something I invented years ago when I did large systems in assembler language. I posit all goes wrong use boolean FALSE return code and return as soon as I find something wrong, this gives submodules often with many FALSE returns and only one TRUE or non-FALSE return at the end. If you are careful and design your submodules or functions with minimal side effects, you can often avoid cleanup code. The caller either deals with the false return code or exit himself with the FALSE return code. This design pattern gives flat, efficient and robust programs.
I already stated I prefer a simple boolean return code structure TRUE or FALSE . Either an action is a success or not, black or white if you wish, no gray zones. You probably have seen other return code schemes. For reason of the branch on count assembler instruction very many return code schemes is based on zero=success, 4=remark,8=warning,12=serious warning etc. This code scheme is not only confusing, error prone it is also out of sync with modern computer languages where zero=Boolean FALSE and everything else is Boolean TRUE. Multi value return code schemes may also force you to write code like if not ok then ok else not ok or even more horrid constructs.
For a job scheduling system return codes are very important, jobs are dependent of predecessor jobs, errors must be fixed in successor jobs, you must be able to set up guards that kicks in when things go wrong. A simple return code structure is a boon not only in the code of the of the job scheduling system itself, but also to the job scheduling. In my system a schedule can only successfully execute or fail execution and the same goes for the jobs or almost.
In job scheduling you have to deal with the situation where a job is bumped over due to preconditions not met. Is that a failure or a success? IBM’s Job Control Language treat that as a success. This may seem absurd, but the opposite may be equally absurd, it depends entirely from what angle you entering the problem of a bumped over job. Even considering non executed jobs is a simplification, there are more things to consider when deciding the outcome of a job.
My defaults - all jobs must execute successfully, bumped over jobs are considered a success the schedule execution is intercepted when a failure is detected covers more than 95% of all job planning as long as you run one job after another single threaded, when you run jobs in parallel things get more complicated.
Remember I wrote Job Scheduling is important for the BI ETL process? BI systems contains large amounts of data and imports large amount of data via ETL processes, parallel processing to cut ETL execution time short is essential for BI systems. My job scheduler deals with parallel processing in basically two ways one by parallel process jobs and cut jobs up in smaller pieces/chunks. This I have described in when fast in not enough , in parallel processing of workflows I described parallel execution in detail.
Now parallel processing and multi threading is crude and awkward in PHP, but it can be done.
This and more I will try to write some more posts about. I end this post with the execution of an empty schedule. This is how this empty.xml schedule file looks:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
Remember what I wrote about defaults, non destructive and sensible, here we have quite some defaults to fill in. Here we go:
What can be more appropriate for a default job than to TTY type contents of the little red box in the middle of the log display.
Note the first line; The Job Scheduler still start with the scriptS.php module.
I hope I will be able to continue to write about my PHP job scheduler.
Some examples can be found here .
Job scheduling is complex.
- First we create a job for Sales Order Intake.
- Then a job for Material Requirement Planning , (calculate how many components missing).
- And at last a job to mail out Purchase Orders for components missing.
In the post Job scheduling with PHP -2 I will describe my Job Scheduling System. Here you find some examples.
I have had mails in the past asking me to join various people and friends on LinkedIn. I didn't know what LinkedIn was "it's like Facebook but for old farts" a younger friend told me. Thank you very much - but yeah then maybe it's for me, I'm not very much for computer social networking, but I'm undeniable an old fart. I actually was pushed in by a colleague and today I edited my LinkedIn profile with a photo. This is of respect to the LinkedIn and other members, present yourself decently so other know who you are.
Without knowing how I'm now connected to four other LinkedInners. I'm a bit thrilled by this new adventure in social networking. If you reader are LinkedIn please feel free to invite me to your 'circle' or whatever it is called. I doubt this will be read by many, so far I suspect the only ones reading my posts here are Google bots, and bots from strange Russian sites.
P.s. I still do not know very much about Facebook, except that everyone except me are there. My sons are there and if I join I might meeting them there and I'm not sure I want that. I think I prefer physical family meetings. But what do I know, I'm just an old fart just entering LinkedIn, my next step in Social networking.
Later when we started our SAP migration project we told our consultants (that did dress in normal business suites and drive ordinary cars) to remove their ties when they went to our factories. The consultant boss tore off his tie and said happily we will not wear ties in this project. This was out of respect for the factory workers and not to create artificial barriers between them and us. And that is very much what dress codes are about, IT professional or not.
Last week just before Easter I went down to Hoeselt in Belgium on a business trip. We had our annual physical Application Steering Committee meeting in Hoeselt this year. We pick one town where any of the committee members live, which happens to be Essen, Hoeselt, Nantes and Stockholm. Next year it might be Gölshausen in Germany since we have acquired SCA Schucker , a company specializing in applying glue. This is far more than it sounds, it’s actually an hitech industry with a future as more and more assemblies are tied together with glue. Anyway at the meeting in Hoeselt I learned from a Belgian colleague that all Church Bells had gone to Rome. I asked him why there was Chocolate Bells for sale together with Easter Eggs and Bunnies. In Sweden we only have Easter Eggs, most of us know of the Easter Hare, but not what he is supposed to do. My colleague told me all Church Bells goes to Rome to bring back Easter Eggs, during the week before Easter Belgian churches are silent because the bells are gone. At the return to Belgium the bells hand over the Easter Eggs to the Easter Bunnies who distribute and hide the eggs for the children to go and seek for them.
This winter we have finalized a Proof of Concept for SAP Business Warehouse, or rather we are successfully finalizing the PoC together with two consultants from Evry . The consultants Thomas and Lars have done an excellent job, the PoC was about importing SAP COPA data and create the monthly report to the Group reporting system, which turned out to be excruciatingly hard with our COPA, our cost distribution and our reporting. Now when the job was done and it was Easter time. Thomas and Lars thought it would be a good idea to send us some Easter Eggs to celebrate the good work we had done together, so they ordered 12 Easter Eggs to give to me and some colleagues that had worked with them. Now normal sized swedish Easter Eggs are big enough to hold 2-3 tennis balls and they are filled with sweets. I was on my way down to to Belgium when Thomas and Lars delivered the eggs, they called me up and told me there had been a slight misunderstanding when they ordered the eggs but now the eggs were delivered. Coming back to the office I saw what the misunderstanding was about. The Easter Eggs was huge, probably containing about two kilo sweets each. Lars and Thomas had supplied the entire HQ with sweets or almost, and still there were lots left so I took one of these gigantic Eggs with me home. On Long Friday I had about one kilo sweets myself, together with the rest of Easter eating I have probably gained one or two kilo. This is not a good start on beach-2012, I have promised myself to lose about five kilo before summer. Today Sunday I still feel drowsy after my excessive sugar intake. I have just been out for a 10km light jogging, but this is not even close to balance out the Easter Egg. And today me and my boys are going home to their Granny, more food, more Easter Eggs.