2014-05-29

MySQL data marts in Excel and SQLite

The other week I was looking at SQLite, for use in a web app. Then I remembered a MySQL to SQLite conversion routine I did some years ago. I had to create a detached Data Mart on the fly for a user who needed to analyze some data during the weekend, so I created a multi tab Excel with a database view on each sheet. Just for fun I also created an SQLite database which can act as a an detachable data mart. This time (and still the only time) I have created such a Data Mart the user preferred the Excel sheet. I have long since lost the ITL workflow producing the Data Mart, but I found a test script that still works.
<?xml version='1.0' encoding='UTF-8' standalone='yes'?> <schedule mustcomplete='yes' logresult='no' period='test' Xnotify='admfail.xml' logmsg='TorquearmsWizard datamarts'> <!-- This schedule create two minimal detached datamarts which are selections from the Torquearm Wizard database. --> <!-- This schedule consists of three jobs: --> <!-- 1: convert2sqlite - Create an sqlite database --> <!-- - Note there is nested job zipit which we call to zip the database --> <!-- 2: convert2excel - Create an Excelsheet using PHPExcel --> <!-- 3: send_mail - Send this mail to you --> <tag name='C_DB' value='accessoryConfigurator'/> <tag name='OUTBI' value='datamart'/> <tag name='sqlscript'> <!-- Datamart extract queries --> <value> use @C_DB; select * from tools; select * from attachments; select * from tool_attachments; select * from torquearms; select * from accessories; select * from subaccessories; select * from tool_subaccessories; select * from torquearm_accessories; select * from accessory_subaccessories; select * from comments; </value> </tag> <job name='convert2sqlite' type='sql' parallel='yes'> <sql>@sqlscript</sql> <sqlconverter> <name>sqlconverter_2sqlite.php</name> <target>table</target> <delim>,</delim> <sqlite> <database db='@OUTBI.sq3' directory='@J_convert2sqlite'/> </sqlite> </sqlconverter> <job name='zipit' type='script' pgm='zipfile.php' library='ziplib' create='yes'> <file name='@J_convert2sqlite/@OUTBI.sq3'/> </job> </job> <job name='convert2excel' type='sql' parallel='yes'> <sql>@sqlscript</sql> <sqlconverter> <name>sqlconverter_PHPExcel03A.php</name> <!-- --> <target>@OUTBI0</target> <writer>sqlconverter_PHPExcelWriter01.php</writer> </sqlconverter> </job> <job name='send_mail' type='sendmail'> <mailer>phpmailer</mailer> <recipient>lars.a.johansson@se.atlascopco.com</recipient> <subject>@OUTBI database</subject> <body> Hi, TAW datamart is attached as a SQLite database, and replicated in the Excel one sheet per table. The mart is a subset of the TAW Data Warehouse. </body> <attachment name='@OUTBI.zip' file='@J_convert2sqlite/zipit/ziplib'/> <attachment name='@OUTBI.xlsx' file='@J_convert2excel/@OUTBI0.xlsx'/> </job> </schedule>
There is a lot going on here, first we create an SQLite database and zip it and create the Excel in parallel. Finally we ship the result as a mail.


The Excel consists of one sheet per SQL query. Here you see the Attachments sheet from the
query select * from Attachments.




The SQLite database is similar to the Excel where the sheets are SQLite tables.


If you look closely at the ITL workflow above you see the conversions to Excel and SQLite are done via sql converters processing the MySQL result tables. For the conversion to Excel I use the excellent PHPExcel package. The conversion to SQLite I have written myself. I’m pretty happy with this script, quite a lot of functionality per line of ITL code. ITL started as a primitive job scheduler, but these day we build more or less any type of background workflow with ITL. But as the name suggests Integration Tag Language, most workflows are ETL integration routines. You can follow ITL job ‘status’ in our Data Warehouses on twitter.


The SQL converter code creating an SQLite table out of a MySQL result table

<?php
/* Copyright (C) 2010 Lars Johansson
This code is free software;
you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation;
either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. */
/**
* SQL result converter - dynamically included in function {@link execSql()}
*
* This converter converts SQL select(s) result to an SQLite database.
* Note! First we call {@link sqlconverter_CSV.php} to create a result, which we then inserts into the Sqlite database.
* Syntax:
*<sqlconverter> <br>
* <name>sqlconverter_2sqlite.php</name> <br>
* <target>table</target> <br>
* <delim>,</delim> <br>
* <sqlite> <br>
* <database db='accessoryconfigurator.sq3' directory='@J_convert2sqlite'/> <br>
* </sqlite> <br>
*</sqlconverter>
*
* @see sqlconverter_default.php
* @version  1.0.0
* @package adac
* @subpackage sqlconverter
*/
$sqllog->logit('Note',"Enter sqlconverter_2sqlite.php using target=$sqltarget,xml=$xmlconverter");
/** Include the CSV converter to do a basic conversion of the result table */
include("$sqlconverterDir".'sqlconverter_CVS.php');
$sqllog->enter('Note',"sqlconverter_2sqlite.php meta=$metafile,sql=$sqltarget");
$xmlSqlite = $xmlconverter['sqlite'][0];
$sqliteDb=$xmlSqlite['database'][0]['db'];
$sqliteDbDir=$xmlSqlite['database'][0]['directory'];
$metarows = file_get_contents($metafile);
$sqlitehdl = new SQLite3("$sqliteDbDir/$sqliteDb");
$sqllog->logit('Note',"sqlite_open $sqliteDbDir/$sqliteDb, msg=$sqliteError");
if ($sqlitehdl == FALSE) {
$log->logit('Error',"sqlite_open $sqliteDbDir/$sqliteDb, msg=$sqliteError");
}
$liteq = $sqlitehdl->query("begin transaction");
$metarows = explode("\n",$metarows);
$querytxt = '';
foreach($metarows as $metarow) {
if("$metarow"=='') continue;
$metawords = explode(';',$metarow);
$metatmp = explode(':',$metawords[0]);
$metacol = trim($metatmp[1]);
$metatmp = explode(':',$metawords[2]);
$metatab = trim($metatmp[1]);
$metatmp = explode(':',$metawords[4]);
$metadefault = trim($metatmp[1]);
$metatmp = explode(':',$metawords[6]);
$metalen = trim($metatmp[1]);
$metatmp = explode(':',$metawords[7]);
$metacharsetnum = trim($metatmp[1]);
$metatmp = explode(':',$metawords[8]);
$metaflags = trim($metatmp[1]);
$metatmp = explode(':',$metawords[9]);
$metatype = trim($metatmp[1]);
$metatmp = explode(':',$metawords[10]);
$metadecimals = trim($metatmp[1]);
$formatlen = $metalen;
switch ($metatype) {
case 3:
$metaformat = 'integer';
break;
case 12:
$metaformat = 'char';//This i a TIMESTAMP in char format yyyy-mm-dd hh:mm:ss
break;
case 254:
$metaformat = 'char';
break;
case 0:
case 246:
$metaformat = 'numeric';
$formatlen = "$metalen" . '.' . "$metadecimals";
break;
case 253:
$metaformat = 'varchar';
break;
default:
$metaformat = 'invalid';
$sqllog->logit('Error',"Column $metatab.$metacol metatype=$metatype does not have a data format!");
break;
}
$querytxt .= ",'$metacol' $metaformat";
}
$querytxt = "create table $metatab (" . substr($querytxt,1) . ');';
$sqllog->logit('Note',"$querytxt");
$qresult = $sqlitehdl->query("$querytxt");
if (!qresult) {
$sqllog->logit('Error',"Could not execute query: $querytxt");
$log->logit('Error',"Could not execute query: $querytxt");
}
unset($metarows);
$liteq = $sqlitehdl->query("commit");
$liteq = $sqlitehdl->query("begin transaction");

$datarows = file_get_contents("$sqltarget");
$datarows = explode("\n",$datarows);
for ($di=1; $di <= count($datarows); $di++) {
if ($datarows[$di] == '') continue;
$datarow = $datarows[$di];
$querytxt ="insert into $metatab values($datarow)";
$qresult = $sqlitehdl->query($querytxt);
if (!$qresult) {
$sqllog->logit('Error',"Could not execute query: $querytxt");
$log->logit('Error',"Could not execute query: $querytxt");
}
}
unset($datarows);
$liteq = $sqlitehdl->query("commit");
$liteq = NULL;
$sqlitehdl = NULL;

2014-05-04

PHP calling Python

For no reason at all, I decided to make Python a first class scripting language in the Data Warehouse. It turned out to be very simple it took about six hours to implement the Python support, most of the time spent on reading and deciding how to design the Python interface. I have no previous knowledge of Python so it took a while to get going. And I did as I use to, Google and trial & error until it worked. What I have up and running now needs some polishing and fine tuning, but it works pretty well.



This schedule consists of two jobs, the first one calling Python Code, the second ‘getPythonTags’ job picks up two job references created by the first ‘testPython’ job. Normally a script job creates output in the job map e.g. graphic images, but they may create symbolics like the @testPython.FNAME and  @testPython.LNAME for use in subsequent jobs.
Implementing Python support was done by this code:


The result is stored by the Python code in the PHP array PYTHON_RESULT.



This array consists of RESULT (true) and the job array which carries the symbolics in the _xlate array. Having implemented the Python support I only had to write a Python skeleton program. And here it is.



With the help of the serek package imported at the top of the code it was child’s play to import the important control blocks (PHP arrays) from the Data Warehouse and export the result to the Data Warehouse as a PHP array.



In this hideous display you can see how the SQL query  
select '@testPython.FNAME', '@testPython.LNAME'; 
is translated to select 'Kalle', 'Kula';

It was fun to implement the Python support, if it will be used I do not know, we will see. But I will probably implement support for Node.js and Perl6 just for the hell of it. Looking at the watch it took me about ten hours (not six) to do this. Time flies when you have fun.