2016-05-22

Adding just a tiny little feature 2

In Part 1 I described how I wanted to add a tiny little feature to the Data Warehouse’s SAP connector. And as far as part 1 goes it looks like a walk in the park, but then reality hit me.  
I wanted to implement support for prereqs in  <sql> statements  like this:
<sql>
  <prereq>@TABLE_ROWS</prereq>
  <autoload duplicate='replace' database='@C_DB' truncate='yes'/>
</sql>
.
But after spending some four five hours looking at the code, I realised it would take me a weekend implementing this, so I decided to implement prereq on autoload level, which was much easier like this:
<sql>
  <autoload duplicate='replace' database='@C_DB' truncate='yes/>
      <prereq>@TABLE_ROWS</prereq>
  </autoload>
</sql>

Then I needed to carry the new @TABLE_ROWS symbolic variable over to subsequent jobs, so I placed it into the job symbolic table. But it turned out that the symbolic table was already ‘fixed’ at this late phase of the execution of the job. It turned out to be a somewhat tedious job to analyse and code the ability to carry over the @TABLE_ROWS, since it appeared in two different symbolic tables (I have still not figured out why, it looks awkward) written to disk. Finally after some ten hours of looking at the code I decided to do a somewhat ugly hack in the SAP connector:   
if (array_key_exists('prereq',$job['sql'][0]['autoload'][0])){
 if (file_exists('TABLE_ROWS.CSV')) {
  file_put_contents('TABLE_ROWS.CSV',trim(trim(file_get_contents('TABLE_ROWS.CSV')),"$Zsfd"));
    $TABLE_ROWS = trim(trim(file_get_contents('TABLE_ROWS.CSV')),"$Zsfd");
    file_put_contents('TABLE_ROWS.CSV',$TABLE_ROWS);
 } else {
$TABLE_ROWS = 0;
 }
  $Zjdir = $job['_directory'];
 $job['_xlate']['TABLE_ROWS'] = $TABLE_ROWS;
 file_put_contents("$Zjdir".'/JOB',serialize($job));
 $log->enter('Note',"Table rows inserted into $Zjn _xlate; Saving job in $Zjdir");
 $Zsdir = $schedule['_directory'];
 $Zjn = $job['name'];
 $Zji = findJobIndex($Zjn,$schedule['job']);
 $schedule['job'][$Zji]['_xlate']['TABLE_ROWS'] = $TABLE_ROWS;
 file_put_contents("$Zsdir".'/SCHEDULE',serialize($schedule));
 $log->enter('Note',"Table rows inserted into $Zjn($Zji) _xlate; Saving schedule in $Zsdir");
}

Yes, it took me some ten hours to get those lines of code right. Now I can do this:
<job name=’extractFromSAP’>
<sap>
...
</sap>
  <sql>
     <autoload duplicate='replace' database='@C_DB' truncate='yes/>
         <prereq>@TABLE_ROWS</prereq>
     </autoload>
  </sql>
</job>
<job name=’postProcessSAPData’>
  <prereq>@extractFromSAP.TABLE_ROWS</prereq
  
</job>
TABLE_ROWS is the number of data lines extracted from SAP

All in all adding just a tiny little feature took some twenty hours. It is long time since I looked at the code, most of the time I spent on understanding the code and the consequences of the code change. As you have seen the actual code added and changed is small. If I had a better understanding of the code it would have taken one to three hours complete the change. Now it took about 20 hours, it could have taken 200 hours, you never know what it will cost or what the end result will be when you add a tiny little ‘thing’ to a complex system. Was this change worth the efford? Nah can’t say that, but it was a bit fun.
Still a question remains, what is the result of the extractFromSAP job if the prereq prevent autoloading? Was it a success or a failure? Or was it bypassed?
You can argue for all three alternatives, now it is counted as a success, but I think 'bypassed' better reflects the result. I have to think about that.
Excerpt from the job log:
160522 090816.992827 20590 Note Enter execJob, name=extractFromSAP
160522 090816.997957 20590 Note Enter evaluateTagArray, Stage=execute    
160522 090817.021057 20590 Note Enter include, script=sap2.php    
160522 090817.039549 20590 Note Enter connect2sap, Connecting to SAP=acta_prod with codepage=1100    
160522 090821.602623 20590 Note Enter writeSqlCreateLoad, Table=DATA, id=6
160522 090821.604143 20590 Note Enter writeSqlCreateLoad, Table=FIELDS, id=7
160522 090821.605823 20590 Note Enter writeSqlCreateLoad, Table=OPTIONS, id=8
160522 090821.607159 20590 Note Enter writeSqlCreateLoad, Table=TABLE_ROWS, id=0
160522 090821.608432 20590 Note Enter writeRRT_stmts, table=VBAP
160522 090821.609183 20590 Note SAP rfc_read_table row length=101
160522 090821.610056 20590 Note Enter fixRfcReadTableRowsDelim, from=DATA.CSV, to=VBAP.CSV
160522 090821.611289 20590 Note Enter include, Table rows inserted into  _xlate; Saving job
160522 090821.612002 20590 Note Enter include, Table rows inserted into extractFromSAP(0) _xlate; Saving schedule
160522 090821.625621 20590 Note Enter connect2mysql, DB=OPS,database=OPS,user=xxx
160522 090821.626233 20590 Note Current character set in Mysql: utf8
160522 090821.626996 20590 Note Prereq Autoload  0 statement 0 result=False
160522 090821.627232 20590 Warning Prereq(s) for autoload noname not satisfied, processing intercepted.
160522 090821.627557 20590 Note Exit script=/home/tooljn/dw/pgm/opsproc/sap2.php result=1 autoload bypassed (prereq)
160522 090821.628088 20590 Note script /home/tooljn/dw/pgm/opsproc/sap2.php ending with rc=1
160522 090821.646820 20575 Note JOB extractFromSAP result=Success,(True,True,True) start=090816, duration=5
160522 090821.650473 20575 Note Enter connect2mysql, DB=OPS,database=OPS,user=KalleAnka - dwdb via TCP/IP
160522 090821.651115 20575 Note Current character set in Mysql: utf8
160522 090821.656999 20575 Warning Prereq(s) for job PostProcessSAPData not satisfied, processing intercepted.
160522 090821.657559 20575 Note SCHEDULE hua_bulkrep result=Success,(True,True,True) start=090816, duration=5  
160522 090821.658128 20575 Log  executed successfully (True,True,True), of 2 jobs, executed OK=1 Failed=0, Bypassed=1

2016-05-15

Adding just a tiny little feature

Often but not always when I revisit old code I realise there are room for improvements. This time I had reason to review the SAP connector code I done for importing data from SAP into an relational database, the basic structure is very simple:
<job>
<rfc>
SAP extraction code goes here…
</rfc>
<sql>
SQL import code goes here...
</sql>
</job>


I looked into this because I came across a job that runs frequently, but seldom finds any new data to export from SAP, (the job is the first in a workflow that creates an alarm that signals ‘incoming bulk order’). It annoyed me when I realized there is no way to stop the SQL part nomatter of the result from the SAP rfc part of the job. The rest of the workflow was stopped by a prereq:
<prereq type='sql'>
SELECT IF(COUNT(*)>0,1,0) FROM import_table;
</prereq>


Prereqs are booleans saying stop or go, I decided I would implement support for prereqs in the sql part of the job like this:
<job>
<rfc>
SAP extraction code goes here…
</rfc>
<sql>
<prereq>got data from rfc</prereq>
SQL import code goes here...
</sql>
</job>


This is a pretty neat design. I’m reusing the prereq which is a very commonly used instruction of my data tag language and it’s self explanatory. The rfc part already creates a file with the number of rows found, so it’s just a matter of making the number accessible to the sql part. And very easy to evaluate the prereq, if you just feed the extracted rows to the prereq it will be evaluated as a boolean, 0 results in false and anything above to true. I decided to call the extracted rows count TABLE_ROWS and put it into the jobs symbolic table for accessibility that was easy I only changed


 file_put_contents('TABLE_ROWS.CSV',trim(trim(file_get_contents('TABLE_ROWS.CSV')),"$Zsfd"));   
Into this:
$TABLE_ROWS = trim(trim(file_get_contents('TABLE_ROWS.CSV')),"$Zsfd");   file_put_contents('TABLE_ROWS.CSV',$TABLE_ROWS);
$job['_xlate']['TABLE_ROWS'] = $TABLE_ROWS;          // create TABLE_ROWS @symbolic


And now I can add a prereq like this:  
<prereq>@TABLE_ROWS</prereq>


Next was to add support for a prereq in the sql part so I took a look at the real job’s sql:
<sql>
  <fieldclosure>"</fieldclosure>
   <autoload>replace</autoload><database>@C_DB</database><truncate>yes</truncate>
  <table>
     <name>@TABLE1=@DWTABLE1</name>
     <idxdef>PRIMARY KEY `VBELN`,`POSNR`),INDEX(`MATNR`,`WERKS`)</idxdef>
     <comment>This table is derived from SAP table @TABLE1</comment>
  </table>
</sql>


The line beginning with <autoload is parameters for automagically load a table into a RDBMS, and it looks like crap. I decided a facelift was needed this is what I would like it to look:
<sql>
  <prereq>@TABLE_ROWS</prereq>
  <fieldclosure>"</fieldclosure>
  <autoload duplicate='replace' database='@C_DB' truncate='yes'>
     <table>
        <name>@TABLE1=@DWTABLE1</name>
        <idxdef>PRIMARY KEY (`VBELN`,`POSNR`),INDEX(`MATNR`,`WERKS`)</idxdef>
        <comment>This table is derived from SAP table @TABLE1</comment>
     </table>
  </autoload>
</sql>


Not pretty maybe, but less ugly and support for sql prereqs. Now I only had implement the new functionality. I started with the new XML syntax. Just to keep things simple I decided to downgrade the new syntax to the old syntax. I had a function that scans for <autoload> and if found pass it over to a function that does the autoload of SAP data into tables.
function autoloadSapTables($context,$job,$sapParms,$driverIndex) {
$log = $GLOBALS['log'];
$sqlarr = array_key_exists('script',$job) ? $job['script']['sql'] : $job['sql'];
$log->enter('Info',"");
for($i = 0; $i<count($sqlarr); $i++){
if (array_key_exists('autoload',$sqlarr[$i])) {
 $log->logit(info',"autoload found i=$i");
 $sqlxml = parseAutoload($context,$job,$sqlarr[$i],$sapParms,$driverIndex);
}
if ($sqlxml === FALSE) return FALSE;
$sqlarr[$i] = $sqlxml['sql'][0]; // Note this is our copy of the job control block
}
}
return array('sql' => $sqlarr);
}


This is the function with downgrading code inserted:
function autoloadSapTables($context,$job,$sapParms,$driverIndex) {          
 $log = $GLOBALS['log'];
$sqlarr = array_key_exists('script',$job) ? $job['script']['sql'] : $job['sql'];
$log->enter('Info',"");
for($i = 0; $i<count($sqlarr); $i++){
if (array_key_exists('autoload',$sqlarr[$i])) {
$al = $sqlarr[$i]['autoload'][0];
if (array_key_exists('duplicate',$al)
or array_key_exists('database',$al)
or array_key_exists('truncate',$al)
or array_key_exists('table',$al[0]) ){
// This is the new restructured autoload, downgrade to old format
    $log->logit('note',"autoload new format found downgrading i=$i");
 $al['autoload'] = [['value' => $al['duplicate']]];
 $al['database'] = [['value' => $al['database']]];
 $al['truncate'] = [['value' => $al['truncate']]];
 $sqlxml = parseAutoload($context,$job,$al,$sapParms,$driverIndex);
}
else {
 $log->logit('note',"autoload old format i=$i");
 $sqlxml = parseAutoload($context,$job,$sqlarr[$i],$sapParms,$driverIndex);
}
if ($sqlxml === FALSE) return FALSE;
$sqlarr[$i] = $sqlxml['sql'][0]; // Note this is our copy of the job control block
}
}
return ['sql' => $sqlarr];
}


And that is where I am now, I didn’t do what I originally wanted to accomplish. I suppose that is what happen to most of us when start looking into old code where you only intend to add or change something minor. Next weekend I hope to get some time to finalish the task.


P.s.
You may rightly wonder why is this bulk order alarm not implemented directly in SAP? I do not know, probably it was too expensive or took too long time to get into production.