Planning the next IT year.

When I started my present employment (year 2000) I promised myself never again take on operational responsibility. Well, I kept my promise a week or two then I sat there again burning the midnight lamp fixing a corrupt database in an MRP system I had created 1980, that system is now scrapped, but now I’m  heavily involved in the operations of our BI system the Data Warehouse. I’m also involved in some other systems that needs some instant tinkering from time to time. Operations is the most important factor when I set up my calendar. Meetings is the second most important factor. To me it seems meeting is getting more and more important or frequent. I do not know if this is just a fact of my professional role is changing or if this is a more general trend.

The third most important factor for my calendar is projects. This is definitely a trend, more and more work are done as projects. When I started to work with IT most work was assigned to  individuals, only big ‘projects’ were actually organized in projects. Today no task is too small for a project organisation.

Vacations is something you have to take into consideration when planning the year. In western europe most of us take a month off in the summer. (In Sweden five weeks vacation is the minimum, I think I have six weeks. I remember when I was employed in the US I was given three weeks vacation since ‘you are from Europe, but please do not tell anyone’.)

Basically I use three cycles when I plan my next years calendar. First we have the year itself, then the month and last the week.

Week cycle.

Monday mornings is reserved to operations if anything gone wrong during the weekend. Monday afternoon project meetings, so project members can plan their week, the earlier in the week the better. Do operational changes on Mondays, so you can fix the problems on Tuesday. Wednesday is ‘meeting day’ experience have taught me operations problems are least likely to occur on Wednesday, so plan for important repeating meetings on Wednesday. Thursday small operational changes and ‘unplanned’ meetings, seminars etc.. Never change operations on Fridays, (not even a simple program bug fix if it can be postponed to next week) unless you want to work during the weekends. Fridays are for those boring sometimes unnecessary meetings you cannot avoid. Note the later a Friday meeting is held the shorter it will be. At 16.00 even a hopeless filibuster often keep quiet. Weekends lastly are for big operational tasks.

Month cycle.

First day of month is Business Intelligence day. The BI system must be up and cope with peak load. If the system is of any magnitude and have active users there are new/changed reports, it’s very likely there are problems to deal with. Do not plan for anything the first no matter what weekday it is, your BI system may need you. The second and third week in the month are the quiet weeks plan for repeating events and important meeting. Last we have month period closing that starts four days before month end and lasts to the third day of the next month. During this time the financial guys become hysterical if they do not get all attention from you (and everyone else for that matter).

Year cycle.

The year starts with what I call the ‘Iron day’, in my younger days I took a taxi from the new year party directly to the office and after a barrel of coffee I dealt with the year end’s problems. Nowadays I try to sleep a few hours before I log in from home and deal with year end’s problems. The ‘Iron day’ is the single most important IT operational day and it is an effort of will. Be prepared for anything. (Three years ago our VPN-system broke down so I could not log in at 06.00. I had to take a taxi to the office as in the old days. A software license had expired).

Be cautious with planning to much in the beginning of January, there might be difficult year end’s problems to deal with.  

February is software license renewal month. (Never ever have software licences expire in December, January or during the vacation period.)

For us the summer vacation period is July and August (in Sweden it is July and in Belgium it is August). Avoid plan anything during the vacation period. You should not start up anything new two weeks before the vacation period, and one week after the vacation period.

The most IT-critical period of the year is the year end. Do not start up any major IT-finance projects from mid november until mid January, the finance guys will be absorbed by year end. Declare ‘production freeze’ from second week in December to second week in January, only allow necessary maintenance in the IT landscape and infrastructure.

When these three cycles are firmly booked in the calendar you only have to mark up holidays like Xmas and easter and bank holidays and your calendar is ready for use and you are prepared for planning the next year.


Office activity peaks and Business Intelligence users

Every Database Administrator knows there are work related activity peaks during the day in the office. You have a long peak of activity shortly after most workers arrive in the morning and another shorter one before lunch. After lunch there is an hour of intense activity and a short burst of activity at the end of the day. In between those two peaks not much is happening. How do I measure? In the ERP systems of course, when we work we tend to use the ERP systems, so to measure the work activity rate just calculate the ERP transaction rate. As you may have noticed I’m vague in timings of the peaks. If you have observed activity peaks in different countries you know activity  differs between countries.  I will not mention more about that other than Swedes tend to come in early in the morning, I myself is a bit extreme I often start between 05.30 and 06.00.
A good friend of mine and ex-colleague Ulf Davidsson co-creator of the Data Warehouse used the Business Intelligence system activity to categorise BI-workers into ‘ doers ’ and ‘thinkers ’. The doers start early in the morning and pick up the reports they need for their day tasks. The thinkers tend to come in later their activity peak coinciding with the after lunch peak. The classification into doers and thinkers are quite useful. Doers need precise detailed but rather limited amounts of information in Excel format, while the thinkers are more for quantity ‘give me all, I want all goods movements in all factories since the days of Eden’ and they prefer olap  presentation of the information, today we use Qlikview  for olap visualization. The doers request 100% quality of data, while thinkers demands performance. (The first thinker using the Data Warehouse Styrbj√∂rn Horn  forced me to upgrade the hardware, at the time we were operating on scrapped laptops, ‘I go bonkers while waiting for my reports’. Styrbj√∂rn later founded his own company Navetti .)
A common mistake creating BI is to create the BI for thinkers, since they seldom have detailed knowledge of data, they assume it is correct and when they finally find bad data quality they can be nasty to deal with. When you create a BI application you should find the doers that benefit from the system and start by create reports for them, if they find the reports useful they will help you iron out quality issues. (You should base your BI on solving problems rather than visions and grand ideas.) When the application is stable with correct data it is time to invite the thinkers, if they find your application useful they will help you with the performance issues. Thinkers often have a budget and they can be surprisingly generous if they like your BI system.
This post is based on generalized observations, there are swedish nighthawks and early birds from other countries. We do not only work in ERP and BI systems. BI doers can also be thinkers and vice versa.
I have no connections to those behind the olap video link, I stumbled across it and I like it, especially the brick wall, to create successful BI you must be in the business.
My only connections to Navetti are professional, we use their software and ‘my’ systems feed their software with information.


Time flies

Not long ago I was just a kid and now I'm a member of the gold-club. It's nice to have the golden needle and the gold watch. And the Gold Club parties in the mirror hall of Grand Hotel are cool! History has been written here, the first Nobel Prize award ceremonies were held in the hall, the very same hall I got my gold club insignia, it's surreal. But it's also saddening,where have all these years gone?


The € operator - supporting the Euro and SAP RFC_READ_TABLE

Recently we have started to use a new technique for extracting data from SAP. We dynamically create predicates for RFC_READ_TABLE . This technique has been very successful and we use it more and more. But there are some problems with our technique and RFC_READ_TABLE. We have to specify the entire predicate in one statement for each search. An example show you what I mean. We like to  find to parts A and B in factory 1100.
A natural predicate for this is:
 “PLANT EQ ’1100’ and (MATNR EQ ‘A’ or MATNR EQ ‘B’)”.
But this is not possible with the current technique, to build the predicate we have to write something like:
“(PLANT EQ ’1100’ and MATNR EQ ‘A’) or (PLANT EQ ’1100’ and MATNR EQ ‘B’)”.
This is not only unnecessary cumbersome, you want to be as precise and succinct as possible since it gives the DB-optimizer a better chance to efficiently walk through the database, and also the buffer in SAP for the predicate is limited, to make this even worse the row length for predicates in RFC_READ_TABLE is limited to 72 chars. Why?? Well its the amount of information a good ol’ punch card can carry.
Clearly the functionality described in the link above needed some enhancement, better support for creating dynamic predicates for RFC_READ_TABLE. And that’s where the € operator  comes in. The euro currency can do with some help, and I do what I can by introducing the € operator to popularize the euro. I think I’m first to introduce the € sign in a programming language, (the Integration Tag Language). The € operator syntax:
$ok = €()filename
It is a bit hard to explain the inner workings of the € operator, but some examples will hopefully show how it works. Example:

            ,('NO_DATA',' ')
            ,('OPTIONS', (MANDT EQ '300' and KAPPL EQ 'M' AND WERKS EQ '1100')/path/file)
Take a look at the OPTIONS row in the example above, here we see the € operator in action. The expression between the parentheses  is the ‘constant’ part of the predicate  and connected to the expressions in the file with an AND like:
MANDT EQ '300' and KAPPL EQ 'M' AND WERKS EQ '1100'” AND ( filerow1 OR filerow2 OR … last_filerow)
A ‘real’ example:
In this example I extract rows from SAP table A17 for predicates (materials) generated in the first job genArray1 .  When running this example schedule the € operator  expands the OPTIONS expression into  this PHP array:
At last; the € operator conveniently chops the array elements into 72 char strings so RFC_READ_TABLE gets the input in punch card format :)


Automatic replication of MySQL databases with Rsync

In some posts   I have written about replicating Business Intelligence information to Local Area Network satellites.
Instead of using normal database backup procedures that guarantees the integrity of the database I use rsync and file copy the database from the source database server over to the target database server. I can do this since I know no updates are done to the database while replicating and I use MySQL MyISAM storage engine. My rsync procedure is very simple, fast and self-healing, but
Do not try this at home
The real reason why I replicate this way is - I like to experiment and try new things and I have not seen anyone replicate databases like this before.  

The Setup

This is how I have set it up. From the controlling ETL server I issue commands via ssh  to the source and target systems:
           Source system           Target system
1        Flush tables
2                                                 Stop MySQL
3        Run Rsync_repl.sh
4                                                Start MySQL.
I use ssh from the ETL server (where my Job scheduler runs) and issue the commands from a job.
First I need to set up SHH (control server):
ssh-copy-id -i ~/.ssh/id_rsa.pub userid@targetDBserver
and then sudo (Target server/BI Satelite):
visudo  (add)
MYSQLADM ALL = NOPASSWD: /usr/sbin/service
and then test it:
ssh -t userid@targetDBserver sudo service mysql status
from the control server. You should receive mysql status from the target database server with any prompts for password.
In the source database server I did almost the same thing. First SHH (control server):
ssh-copy-id -i ~/.ssh/id_rsa.pub userid@sourceDBserver
and then sudo (Source server/BI Master):
User_Alias MYSQL_REPL = userid
MYSQL_REPL ALL=(ALL) NOPASSWD:/path2/rsync_repl.sh *
replicate.sh * is a bash script (appended below) that rsync Mysql databases to the target database server. Now I have all things in place and I can system test from my control server
ssh -t userid@targetDBserver sudo service mysql stop
ssh -t userid@sourceDBserver sudo rsync_repl.sh
ssh -t userid@targetDBserver sudo service mysql start

The automation.

With everything in place and tested, I only have to create a job and schedule it.

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<job name='replicateDB' type='sql'>
<!-- This job replicate databases from Source Host to Target Host -->
<!-- Note of Warning! This is not according to any safe procedure. Do not try this at home! -->
  <sql>FLUSH TABLES</sql>
    <!--Action 1:  stop mysql in target server -->
    <!--Action 2:  replicate from Source to Target -->
    <!--Action 3:  start mysql in target server -->
    <action wait='yes' cmd='ssh' parm='-t @TargetUser@@TargetHost sudo service mysql stop'/>
    <action wait='yes' cmd='ssh' parm='-t @SourceUser@@SourceHost sudo /path2/rsync_repl.sh'/>
    <action wait='yes' cmd='ssh' parm='-t @TargetUser@@TargetHost sudo service mysql start'/>
As a safety measure this job first flush MySQL tables to disk and then runs the exit actions. And that’s it.
If which God forbid the replicated database is trashed, I just have to run the job again. You can guarantee the integrity of the database by running the job repeatedly until no data is replicated, basically if this replication is faster than the ‘update rate’ your database will be fine.  
 I conclude this series of posts  with the rsync_repl.sh script. The comments say it all ‘hopefully the replicated database is OK’ no guarantees!
You can make this procedure secure by take a table lock before the second replicate, but in my case it is not necessary.


Having PHP FUN(ctions) with SAP shop calendar

The other day I needed to calculate number of workdays in a 60 days period for our Tierp Factory .
Since I had to do this in our BI system I needed the shop calendar for Tierp from SAP. I started to look around for some procedure to extract the shop/factory calendar from SAP. To my surprise I didn’t find anything. The closest I found was  the bapi  DATE_CONVERT_TO_FACTORYDATE. This piece of code takes a date and gives the shop calendar equivalent, i.e. it tells you if the date is a work day or not and the closest  work day if it is a holiday. By exposing a range of dates to date_convert_to_factorydate you can create a shop calendar. I also found some interesting tables in SAP where I could extract the information I needed, but I decided to have some fun with my PHP job scheduler and call  date_convert_to_factorydate repeatedly for a long date span, and assemble the shop calendar with the result. I needed to feed the date_convert_to_factorydate with the id  of the Tierp factory calendar, a range of dates , and a flag  telling date_convert_to_factorydate to look forward or backward for closest work day.
The id is stored in SAP TFACD  table, the date span could easily be created by a function and I wanted a backward search. Then I just had to call SAP once  for each date in the span, for this I could use a job iterator, which would do the job, but with a nasty side effect, since the job calling SAP would repeat itself it would log on and log off from SAP once for each date. Instead I pushed down the iterator to the SAP communicator sap2.php, which logs on once and then iterates through  the dates and appends the results. After this its just a simple matter of loading the result into our BI systems mySQL database. OK here we go:
For you who have followed my posts about my PHP job scheduler  and the SAP communications examples . The interesting pieces here are the PHP functions to generate the start of the calendar in <tag name=’CALSTART’...>  and the use of the iterator <mydriver>  which creates a date span, which the script sap2.php will iterate through and call date_convert_to_factorydate once for each date. The second job in this schedule createCalendar just loads the result into MySQL.
Here you see part of the created shop calendar:
Example - The date 2012-07-08 is a holiday and the closest work day is 2012-07-06.  
If you take the time to study the first job getCalendar you will notice there is actually quite a lot FUNctionality packed in there.
My first use of this shop calendar is an advanced formatted Excel sheet, which I will create and email in subsequent jobs. That will be fun too.


Language confusion

Yesterday I was setting up a job using the SAP BAPI function BAPISDORDER_GETDETAILEDLIST. As always I google on a subject before I start, while googling I found a conversation like:

Question - How should I specify the parameter I_MEMORY_READ?

Reply       - It doesn’t harm to read the documentation in the program.

The documentation in the program reads:

SPACE = zuerst Pufferzugriff, dann DB-Zugriff, 'A' = nur Datenbankzugriff, 'B' = nur Pufferzugriff


If your only language is English, you may have problems to understand this. If you’re only knowledge of indo-european language is bad English the above explanation may be incomprehensible. But there is help, feeding the phrases into Google translate gives:

  1. ‘zuerst Pufferzugriff, dann DB-Zugriff’         -> ‘first buffer access, then DB access’
  2. ‘nur Datenbankzugriff’                                -> ‘only database access’
  3. ‘nur Pufferzugriff’                                -> ‘only buffer access’

Now this should be understandable for most people with a basic understanding of English. But it is a hell of an effort to find the relevant phrases in text you do not understand and then feed them into Google translate. On the positive side you are not limited to english, if I feed the first phrase ‘ zuerst Pufferzugriff, dann DB-Zugriff ’  into Google translate I can choose my native Swedish which gives ‘ första bufferten åtkomst, då DB åtkomst ’. Well that was maybe bad luck it’s gibberish, it may look like swedish but it doesn’t mean anything. Even if you have access to Google translate you may need more helpful advice than RTFM, if you do not have a better advice you should maybe not answer at all, or do not give that type of comments, it do no one any good.

These days when English is the Lingua Franca of the world you see a lot of strange English, this post is good example of that. And I have said many things in ‘English’ that have amused others. E.g. my tongue easily slip on the vowel in ‘cook’ making it mean something else.  

Even if you know English well as a second language, you will translate into English differently depending on your native tongue, cultural background and nationality. From working with MRP systems I’m well acquainted with the phrase ‘ Part number was not found ’. A swede would probably say ‘ Article number missing ’ while a german put it like ‘ Material number was not possible ’. I once was talking with an American and an Englishman, the english guy was telling about a car accident he had with a lorry , I could tell from the expression on the american guy he didn’t quite understand, so I jumped in and said he means a truck . ‘Uhha’ said the American.

I do not know if this a true story or not. When the swedish company Electrolux started to market their vacuum cleaners in the US they used the slogan ‘ Nothing sucks like Electrolux .’ I know of a swedish company who tried to sell a range of environmental friendly  beauty products and perfumes in the US using the brand name ‘ Nature calling ’.


Crawl before you walk, my first lines of node.js code.

Last Saturday I finally had some time for my node.js project . After some clumsy attempts to write a web app with the flatiron framework, I realized I knew too little about web development to successfully create something usable in a reasonably timeframe without help. I found the book  ‘The Node Beginners Book’ by Manuel Kiessling and it help me through the process of creating a simple web service. Input is a URL query and the output is a JSON object.
The input query requests products with a SQL ‘like predicate’ and the output JSON object is a list of products. Very primitive web app and very primitive code. But it gave me the opportunity to study javaScript and get the grips on how and why functions are passed around. This was a bit confusing at first but I think I got the hang of it. Now I think it would have been a serious mistake to start with a (Flatiron) framework. If you know Node.js and JavaScript you can probably appreciate a framework better. Crawl before you walk.


Walk an extra mile in shoes like this one?

Last Friday I went down to Belgium on a business trip, to discuss the start of a SAP BW project. When I left I found my old Ecco City Walker shoes I hadn’t used for two years. I put them on and went to the airport there I realised  the rubber soles were sticky, they sort of glued to the ground and unfortunately left black marks where ever I went. Arriving at Brussels airport the soles  literally started to disintegrate and now I not only left black marks but sticky bits of rubber all over. The SAP BW project will run on a very tight schedule, and we most likely will have to walk an extra mile or two to finish on time. I will not be able to walk at all in shoes like this one. After the photo was taken I put the shoes into the garbage can :(


Replicating MySQL with Rsync - Part 2

In some previous posts  I have written about replicating Business Intelligence information to Local Area Networks satellites.
Now I had my replication scheme up and running for some weeks, and it surpasses all my expectations. The replication runs through like a clyster. In the last post I feared the network speed would be to slow, but it seems like the network guys have cranked up the speed.
This Rsync statistics is from last nights replication:

Number of files: 294
Number of files transferred: 140
Total file size: 726588950 bytes
Total transferred file size: 641270064 bytes
Literal data: 553799963 bytes
Matched data: 87470101 bytes
File list size: 3898
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 70309506
Total bytes received: 781434
sent 70309506 bytes  received 781434 bytes  437482.71 bytes/sec
total size is 726588950  speedup is 10.22
I’m not sure what all figures means but it’s fast. The replication is done twice plus the replication of an empty database (also done twice), normally this procedure takes less than three minutes wall clock time.
I run this replication against a live MySQL/ISAM. I just flush the tables and then rsync. I know there is no ongoing activities against this database during the replication. But that actually doesn’t matter that much, you can run a similar scheme against a busy database, if the activity is low run rsync until no data is transferred or run twice then lock the database and run a final rsync replication, but in that case I would prefer a proper backup scheme. I replicate this way because i know there in no activities against the database, and my rsync procedure is fast and simple.
But there is still a problem, a colleague run this procedure manually. And this is certainly not the way it should be. But I just haven’t had time to set up an automatic procedure yet. This procedure is a bit complicated, as it involves taking down the target database system and run the replicate from the source system and this should be controlled by a third server who knows when time is due for replication. Why not run the replication process from the target system? Yes, that is undeniably simpler, but I like to have the process under the supervision of the controlling server who knows when time is due.
This is how I will set it up. From the controlling server I will issue commands via ssh to the source and target systems:

Source system        Target system

1                                                Stop MySQL
2        Flush tables
3        Start Rsync
4                                                Check the database  -  MySqL upgrade script
5                                                Start the database.
I have promised my colleague to replace his manual labor with a shell script next week :)
If you have a simpler/better way of replicate please comment this post.


Web Services - Node.js and Flatiron a start.

Things do not always go according to plans. Yesterday I started to make plans for an Ubuntu private cloud, where I will migrate my Business Intelligence application The Data Warehouse . While thinking about the hardware I opened up my mailbox and found one mail that attracted my attention. It was a request for a Data Warehouse Web Service API. The sender a very experienced Lotus Notes consultant was obviously dissatisfied with the only two interfaces to the Data Warehouse ODBC and JDBC, ‘rigid and old fashioned’ he wrote. WTF my apps should be flexible and futuristic. The consultant was kind to specify what he meant by Web service and suggested me to study WSDL as a starting point.

I’m not an experienced Web programmer. I did what I usually do when I do not know a subject, I asked my friend Google and after a while I found myself study a Node.js tutorial. After some fiddling around with some simple node.js script I realised I can do a lot with this, actually web services is something I should have done a long time ago. Why not use PHP the web-language I already know? The simple reason is I do not know Node.js and it seems even more fit for creating Web Services than PHP. I decided to use a framework from the beginning, and I chosed Flatiron.

Now the problems started; I do not now web development, I do not know JavaScript, I do not know Node.js, I do not know the Flatiron framework. Embarrassingly I did not understand much when I started to create my first Flatiron project. The Flatiron documentation I find is meager. This morning I admitted to myself, I do not have the necessary skills to develop Flatiron/Node.js apps, so I bought the Kindle version of ‘The Node Beginners Book’ by Manuel Kiessling, if the book is what is says it will teach me the basis of JavaScript and Node.js.

Yesterday morning I started by planning for an Ubuntu Cloud and ended up utterly confused of the for me new concepts of Node.js. One thing I know though, if I manage to create Web Services  they will be simple, flexible and futuristic node.js apps floating around in an Ubuntu cloud. When? I do not now, time flies and the backlog at the office is just piling up.


Using MySQL SET to squeeze out material reservations from SAP with RFC_READ_TABLE.

The RESB table in SAP contains order dependent requirements of materials. The key is the reservation number (RSNUM). Over time this table will be very large and do a full table scan for active reservations is not a very good idea. The reservation number is incremented for each new reservation, so if you like to fetch reservations done after reservation 10, you only need to fetch reservations >10.  When a reservation is ‘allocated’ it will never go active again, so if you are looking for active reservation you can safely start from the earliest active reservation.  
I used these facts to set up a performant  extraction routine. We have a MySQL table containing RESB active reservations.
0021006667 is the lowest active reservation so I do not need to inspect lower RSNUM, this is a good start since I can bypass +21 million rows. But still I have more than 2 million (23053466 – 21006667) RSNUM to look at. I do not want to inspect these +2 million rows in one go, I want to chop this up in smaller intervals and process them in parallel. For this I created My SQL statements that extracts min and max RSNUM from the MySQL RESB table and chop this interval into smaller chunks.  And subsequently run all these intervals in parallel against SAP with RFC_READ TABLE, scanning for active reservations and reload the MySQL table with the results.
The first job above ‘getHiLow’ retrieves min and max RSNUM and calculates how many intervals is needed to cover the SAP RESB table from low RSNUM.  The second job ‘createDriver’ creates a result table with low and high RSNUM for each interval. Note! the MySQL limit option  in the last select which stops the cross join from go into an infinite loop. ( These SQL statements are somewhat complex and could be more elegantly expressed in a stored procedure.  But when I created these jobs I did not really understood the problem or  how to solve it. So I started to fiddle around with MySQL SET until I had something that worked well together with RFC_READ_TABLE, and this is the result. )    
The next task is to feed these intervals into RFC_READ_TABLE .                     
The job ‘loadRESB’ takes the result table (driver0) from the createDriver job, run each interval in parallel, extracts requested demands and inserts the result rows into the MySQL RESB table. If you look at the selection (the OPTION array) you see what I mean by active  reservations. ( The only appropriate index I have for this query  is RSNUM. If I had a suitable index this hole exercise would be unnecessary. )
The job takes about 50 minutes to run,(which is too much). If we look at the log we see execution 58869 ended with a bad (0,zero, boolean false) result. This was due to a timeout condition for one of the intervals. (Tasks in SAP have an execution time-limit.) To make this job go faster we can shrink the intervals, run more in parallel and/or analyze the problem and solve it.
Anyway 50 minutes is too long, something should be done to make this go faster. If you have a more performant way to extract these material requirements  please drop me a mail or add a comment.
UPDATE - A closer inspection of the log revealed the typical duration is below nine minutes, which is good enough. These extreme execution times probably shows database locking problems.
The jobs above are executed by my PHP job scheduler  .


PHP, MySQL, Iterators, Jobs ,Templates and Relation sets.

There are some simple integer ranges or sets that are important in programming; especially when databases are involved:
  1. Zero
  2. Zero or more
  3. One or Zero
  4. One
  5. One or more
Before you start code you should establish relations between entities. E.g. in an web trade system you should define the relations between Products and Sales orders. Normally a customer must add at least one Product to a Sales order, a Sales order contains one or more  products. A product on the other hand may or may not have Sales orders, a product can appear on zero or more  Sales orders. We can describe the relation between products and Sales orders with two of my aforementioned sets. Of course there are more in this relation than our two  ‘relation-sets’, but it is a very good start and it is essential to sort out these set relations before you start modeling your database and logic in your system.
In my job scheduling system  an   iterator  is a table (PHP array) that determines how many times a job should execute. I found it very convenient to validate an iterator against relation set definitions, i.e.  the iterator’s rows matching a relation set. For this my iterators have a ‘rowrule’, e.g. ‘rowrule=1+’ means this iterator must have one or more rows otherwise the iterator is invalid. (Since iterators are used to traverse objects, they describes a relation in some way, I think all iterators should implement  rowrules, they are very handy.)  In this hypothetical example; If the <forevery>  iterator fails the rowrule test, the job fails.
  Here you see how the rowrule is implemented in my PHP code:
As you see apart from the relation set tests, it is also possible to specify a boolean expression which is useful but it looks awful to specify rowrule=’(“$day” ==”Monday”)’ , since this validation has nothing to do with number of rows. I have thought about implement a ‘ onlyif ’ alias to rowrule to make this look prettier. ( Onlyif  is the general versioning condition in my job scheduler.)
Note the very important GOTO statement at the end. If an iterator fails validation subsequent iterator definitions are tried until one pass the test or we run out of iterators. This opens up a varity of posibilities to implement logical conditions in jobs. If we take the first example and add an alternative <forevery> iterator we come up with an even more hypothetical example, if the first SQL iterator returns zero rows, we proceed to the next iterator which produce one row, so we run the job once with the query(s) found in ‘report_queries/NoQueries.SQL’.
Now we look at a real example, we have a mysql database table with report names and schedule event (when these reports should run like ‘DAY’, ‘MONTH’ etc.). The queries are stored in a query library with the same name as the reportname in the database table.
We like to schedule a job that run all ACTIVE=’YES’ reports for a given EVENT, but we also
like to run a single report (this is if a report fails and we must rerun the report, of course this never happens but just in case…).The result of each report should be logged.
Standard logging granularity is job level,  so we need to run each report in a separate job. To do this we must create a job per report on the fly,  for that we use  a job template . The purpose of <template>  is to generate jobs from a template with the help of a template iterator.
There is a lot under the hood when processing this schedule. There are two variants (run time parameters) both have a default value, and a validate list which the given parameter is tested against. The REPORTNAME  variant validation list is created by a SQL query! After the variants are validated, the template iterators are created and validated according to the rowrule and the right iterator is selected; now the template job definition is multiplied accordingly. Executing this schedule with event=’TEST’ gives the following expanded schedule:
As you see we found two TEST report queries CAMILLA_TEST_REPORT and CAMILLA_TEST_REPORT2.  All template definitions are replaced by comments telling the boundaries of the template. This expanded schedule is now ready for XML parsing :
This is the ‘syntax tree’ of the parsed schedule. The jobs (defined in the nested  ‘job’ array), are executed one by one and the SQL queries defined by the ‘file’ entry are submitted to MySQL and the result is processed. At the end the result of the execution is logged in a MySQL database tables.
And the log entries:
If you have read my post about my ideas about return codes  you know ‘1’ is success. If you follow the execution of this schedule you see there is quite some logic and processing packed in this tiny XML script (here is  the uncluttered version).
Now I will try to use the same technique on a more advanced application. A report application that sends Excel reports as email attachments, for this I need to build a more complex database with reports, recipients and scheduled events.  If I succeed I post about it.


Amazon got recommendations for me

A few weeks ago I happened to hear a cool recording of Baby please don’t go  by Muddy Waters and The Rolling Stones   http://www.youtube.com/watch?v=Mbao_laqF8E

Inspired by the music I spent some  hours listening to these guys and others favourites like Lightnin’ Hopkins and Champion Jack Dupree mostly via YouTube.

Then this week I got recommendations from Amazon:

Wow, CJD isn’t the most popular artist these days, Amazon must have found my interest in CJD via me looking for him on Youtube . (I’m pretty sure I have not searched for CJD on Amazon. I also wrote a mail where I mention CJD, but Amazon hopefully do not scan my mails that would be a step too far :-)

Anyway this recommendation probably comes from  some Business Intelligence activity.Of course I was curious how Amazon found out of my interest in CJD, so I sent a query to Amazon:

08/01/12 22:19:46

Your Name: Lars Johansson

Comments: I'm curious about your recommendation to buy CD's by Champion Jack Dupree.<p></p>I would appreciate an explanation of the process resulting in you recommending me to buy Champion Jack Dupree CDs.

And I got this response:


I'm sorry to hear you're unhappy with some of the products that were recommended for you on Amazon. To prevent certain purchases from affecting your recommendations, please follow the instructions here:


To prevent certain items you've viewed recently from affecting your recommendations, please follow the instructions here:


If you don't want to see our recommendations, you can log out of your Amazon.com account by moving your cursor to the "Your Account" menu at the top of the home page and selecting "Sign Out" from the expanded list of options. On the next page, leave the e-mail and password spaces blank and click the Amazon.com logo. Your recommendations won't appear on the website until the next time you sign into your Amazon.com account.

I hope this helps! We look forward to seeing you again soon.

Thank you for your inquiry. Did I solve your problem?

If yes, please click here:


If no, please click here:



But I was not unhappy  about the recommendation, I was curious how they found my interest for CJD, so I once again asked Amazon how they did. I’m still waiting for Amazon’s reply.

Working with Business Intelligence I have analyzed our customers behavior to come up with Sales Leads. I’m not using some sophisticated web scanner, I only try to analyze existing customers purchases e.g. have the customer bought spare parts but no service? When did they buy our products? Have they bought the new model? etc. But we have plans to be more active and come up with better Sales Leads and targeted advertisements using our Sales and Marketing statistics.

Many years ago I worked as a DBA, administering the Sales database of a TV-shop. I got complaints from the TV-shop, that the database was down early Sunday mornings. I took down the database for housekeeping like reorg etc. Sunday morning 04.00, and the database was down for an hour. To my surprise I was told there are many (lonely?) drunkards buying like crazy the night between Saturday and Sunday, this was a respectable TV-shop selling normal TV-shop merchandise . So I had to reschedule the jobs to the night after between Sunday to Monday, which was a low activity night.


If Amazon’s recommendation had come the night while I was listening to CJD I would probably had bought some music. Business Intelligence is very much a question of speed. The analysts need the reports now, not tomorrow.