2016-08-28

The Data Warehouse, the first 15 years



It was 15 years ago today I started up the Data Warehouse, and it has been in style ever since. In its first years it was not much more than some PHP basic scripts  and a MySQL database. But the success of the Data Warehouse and the steep growth of data imports, made it necessary to restructure the data flow in the Data Warehouse.
2004 I decided a more formal approach was needed. I wanted to build a workflow engine (a job scheduler) and a supporting language. The new version of the Data Warehouse was ready for production 2006. The following years some features were added.
From 2009 very little if any development has been done. Still the Data Warehouse is alive and kicking remarkably well. There have been attempts to replace it with mainstream products on the market. These attempts have failed, mainly due to excessive complexity and slowness of those products.

When I started 2000 I could not get funds for Business Intelligence activities at all in the company. I had to use scrapped PCs for servers and do coding myself and use free software. Much have happened since then, today BI activities are allowed to cost, management have realised the value of good BI and today the big guys are prepared to spend millions on financial analytics.
Unfortunately you cannot say the same for what I call operational analytics, i.e. what goes on in the factories purchasing, logistics and production. But the workshop guys at the company do not care much, they already have my low cost Data Warehouse. I was told some months ago, in one of the major factories the ERP system is used for registering only, all planning etc are done in the Data Warehouse + some surrounding apps. The reason for the increased use of the Data Warehouse is speed and costs! I’m told development of extra features in standard ERP systems, take forever and the cost is ridicules high.
Fifteen years is not much for an IT system and I expect the Data Warehouse will celebrate 20 years five years from now. What’s more remarkable about the Data Warehouse, it is barely maintained at all the last five to seven years, still it cracks on, now with a new BI crew. Take any of the standard BI system on the market five to seven years ago. Would you like to run any of those with a new almost untrained crew? I wouldn’t.
Of course I’m not happy about all things in the Data Warehouse, but it is a well designed, stable and balanced Data Warehouse. ‘World class’ a senior consultant Business Analyst once told me, ‘it beats everything I worked with’. All in all I’m very happy with the Data Warehouse.

2016-08-06

Sharepoint the OData interface

In my last post I wrote about my agonies when I tried to extract data from a SharePoint list. Finally I put together an OData request that worked, but there were still one thing nagging me, in the request there was a ‘dynamic’ filter:
$filter=Year eq '2016' and Month eq 7


Since I want the last month’s currency rate figures I tried to create a view in SharePoint showing only the last month’s currency rates something similar to an SQL request like:
Create view last_period as
 select R1.*  from MY_EXCHANGE_RATE as R1
join
 (select * from MY_EXCHANGE_RATE
   group by year,month
   having year = max(year) and month = max(month)) as R2
on R1.year = R2.year and R1.month = R2.month


I knew this was not going to work with SharePoint lists, since this is pretty advanced and SharePoint lists are just lists plain and simple, these lists do not even qualify as a basic database manager, of course I was right, for now I’m stuck with my dynamic filter.
I struggled on with my Odata interface, in the last post I wrote the response from the Odata call was a shitload of complex XML:


This is just the very first part of the output, I knew even before I tried to parse it it was going to be a rough ride. This is a good example why XML has got such a bad reputation. Just because there is an option you do not necessarily have to use it, keep it simple stupid. I decided to go Powershell scripting, I have played around with it before and I liked it. My intention with this script was call Sharepoint to extract the monthly currency rates, then parse the response into a text file, and finally deliver the text file via FTP to a target application, in this case my Data Warehouse.
After the struggle of figure out how to set up the OData request ,the first part of my script call SharePoint Odata and extract the XML response is straightforward:
#Before we start, where are we?
$cwd = Get-Location
#write-host "$MyInvocation.ScriptName runs in $cwd"

write-host "Starting"
#Set up constants
$currsel     = 'Currency_x0020_Code/currencyName,Currency_x0020_Code/Title,Currency_x0020_Code/currencyNumCode
,Currency_x0020_Code/currencyAC_Reporting,Currency_x0020_CodeId,Currency_x0020_Code/Title
,Currency_x0020_Code/currencyNumCode,Currency_x0020_Code/currencyAC_Reporting';

$rtesel  = 'Currency_x0020_CodeId,Year,Month,Value%5Fx0020%5FEnd%5Fx0020%5Fof%5Fx0020%5FP
,Average%5Fx0020%5FValue%5Fx0020%5FEnd%5Fx0,Value,Average%5Fx0020%5FValue';

$spsel       = '$select='+"$currsel,$rtesel"
$spexp       = '$expand=Currency_x0020_Code';
$splist      = 'MonthlyCurrencyRates'
$sproot      = 'https://thecompany/codes/_api/Web/Lists'
$tempfile    = 'ftp.txt';

$FTPserver   = "ftp://thedatawarehouse/"
$FTPmap      = "ftpload/indw/"
$FTPfileName = "monthlyCurrencyRates.txt"

#Credentials
$Username    = 'xxx'
$Password    = 'xxx'
$FTPuser     = 'xxx'
$FTPpw       = 'xxx'  

#Load the webinterface and fix credentials
#The credentials ares supposed to be a certificate but it is unclear to me how to use certificates
# in Powerhell scripting at this time. For now I use userid & password
$wc = New-Object system.Net.WebClient
$credentials = New-Object System.Management.Automation.PSCredential $Username, $Password
$wc.Credentials = $credentials

#Get Year and month for the web service
Get-Date -format "yyyy-mm-dd HH:mm";
$defaultValue = (get-date).year;
($defaultValue,(Read-Host "Press enter to accept [$($defaultValue)]")) -match '\S' |% {$yr = $_}
$defaultValue = (get-date).month;
($defaultValue,(Read-Host "Press enter to accept [$($defaultValue)]")) -match '\S' |% {$mo = $_}
write-host "Fetching currency rates for $yr - $mo"

#Set up the parameters for the web service
$uri = "$sproot/GetByTitle('$splist')/Items?" + "$spsel&$spexp" + '&$filter=' + "Year eq '" + $yr + "' and Month eq " + $mo;

write-host "Call SharePoint via the websevice"
$xml = [xml]$wc.downloadString("$uri");


I got substantial help setting up the authentication to SharePoint from a colleague, apart from that the first part was a walk in the park. I
Next thing parse the horrendous XML output proved to be trickier. I know this First I tried a number of googled solutions that promised  to deal with namespaces and other ‘fancy stuff’ in the OData XML, nothing worked, after many failed attempts I decided to try the built in XPATH capabilities of PowerShell. It turned out to be a very nice experience, simple straight to the point  and pragmatic, except for one little detail, that took me more that a day to figure out, in the much overly verbose response some of the columns were declared with both attributes and text/data.
 <d:Month m:type="Edm.Double">2</d:Month>


Whatever I tried I could not grab the text part, I only got the attribute. After many hours and many many foul words, I decided to step back and think. Over the years I have learned not to mix attribute and data in an XML tag, it always come back and bite you sooner rather than later. All of a sudden I remember I once had seen ‘#text’ in an article on XML parsing, maybe… And yes it worked adding an extra ‘#text’ node gave me the month node text. All of a sudden it was a breeze to skip all the bogus, just point to the data and get it
#For every currency extract the monthly figures
$op = "Code NumCode ACrpt Year Month val avg_val orig_val orig_avg_value currency name `n" #Header line
foreach($entry in $xml.feed.entry){
   $op += '"'   + $entry.link.inline.entry.content.properties.Title+ '"'
   $op += ',"' + $entry.link.inline.entry.content.properties.currencyNumCode.'#text' + '"'
   $op += ',"' + $entry.link.inline.entry.content.properties.currencyAC_Reporting + '"'
   $op += ',"' + $entry.content.properties.Year + '"'
   $op += ',"' + $entry.content.properties.Month.'#text' + '"'
   $op += ","  + $entry.content.properties.Value_x0020_End_x0020_of_x0020_P.'#text'
   $op += ","  + $entry.content.properties.Average_x0020_Value_x0020_End_x0.'#text'
   $op += ","  + $entry.content.properties.Value
   $op += ","  + $entry.content.properties.Average_x0020_Value
   $op += ',"' + $entry.link.inline.entry.content.properties.currencyName + '"'
   $op += "`n"
}
$op = $op.TrimEnd("`n") #Remove the last new line


I know the produced verbose XML is of standard XML formats, normally Atom feeds and that you with the help of support libraries do not need to bother about the details, but I could not make any of those work. And for me parsing the XML it would have been easier with something less verbose, the fancy Atom feed I had to deal with was just crappy noise surrounding the data.  


Now I wanted to save the data in a file before I FTP it away, of course I used the Out-File command to do that. It worked nicely, except for one little thing. When I FTP the file it was corrupted when it reached the target FTP server. After many futile attempts I recalled a problem I had a year ago. For some reason Microsoft software insist writing BOM markers in UTF-8 files, but Microsoft's software seldom handle these BOM markers well, to get rid of the unwanted BOM marker I replace Out-File command with [IO.File]::WriteAllLines
write-host "Write the result to file ($tempfile)"
$Localfile = "$cwd\$tempfile"
If (Test-Path $Localfile){
Remove-Item $Localfile
}
[IO.File]::WriteAllLines($Localfile, $op) # Note! No utf-8 BOM marker.
  # Out-File command plus System.Net.FtpWebRequest send a currupt file!


The rest, sending the data over the FTP server was easy-peasy:
$FileContent = gc -en byte $Localfile
$FileLength  = $FileContent.Length

#Send the result to the recipient via FTP
#Constants
$FTPfile = $FTPserver + $FTPmap + $FTPfileName

#Create Request Object
$FTPRequest = [System.Net.FtpWebRequest]::Create("$FTPfile")
$FTPRequest.Method = [System.Net.WebRequestMethods+Ftp]::UploadFile
$FTPRequest.Credentials = new-object System.Net.NetworkCredential($FTPuser, $FTPpw)
$FTPRequest.UseBinary = $FALSE
$FTPRequest.ContentLength = $FileContent.Length
$Run = $FTPRequest.GetRequestStream()

write-host "Send $FileLength bytes from $Localfile to $FTPfile"
$Run.Write($FileContent, 0, $FileContent.Length)

#FTP Cleanup
$Run.Close()
$Run.Dispose()

write-host "Done"


Now all I had to - run the script:


And write a Data Warehouse import job:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<schedule logmsg='Import currency rates into the Data Warehouse'>

   <job name='getCurrencyRates' input='monthlyCurrencyRates.txt' minrows='2' >
<!-- Bombs out if there is less than two rows in the input file, which means no data (1 header row) -->

 <tag name='RTETABLE' value='MY_EXCHANGE_RATE'/>
 <sql>
   USE test;
   
   drop table if exists `@RTETABLE`;
   
   CREATE TABLE if not exists `@RTETABLE` (
   `currcd` char(3) NOT NULL DEFAULT '',
   `currcdn` char(3) NOT NULL DEFAULT '',
   `rpt` char(1) NOT NULL DEFAULT '',
   `year` char(4) NOT NULL DEFAULT '',
   `month` char(2) NOT NULL DEFAULT '',
   `rpt_value` decimal(11,8) DEFAULT NULL,
   `rpt_value_avg` decimal(11,8) DEFAULT NULL,
   `rate` decimal(11,8) DEFAULT NULL,
   `rate_avg` decimal(11,8) DEFAULT NULL,
   `name` char(30) NOT NULL DEFAULT '',
   `periodstart` date NOT NULL,
   `CREATED` datetime DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`currcd`,`year`,`month`)
   )  COMMENT='Monthly Exchange Rates';
   
   LOAD DATA LOCAL INFILE '@J_infile0' replace INTO TABLE @RTETABLE
                   FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
   IGNORE 1 LINES
   set periodstart = DATE(concat(year,'-',month,'-01'))
   ;
 </sql>  
   </job>
</schedule>


And run it:


Finally look at the glorious result:


The experience of writing a SharePoint OData interface was not entirely unpleasant, PowerShell scripting is fun. Even though I still not got the hang of it (and .Net), I can produce some pretty advanced stuff, at least if I compare with what I can do with BAT scripts and with BASH scripts too, but it is not a fair comparison, PowerShell script with .Net are so much more. I cannot say the same about SharePoint,  e.g. the lists sucks when you try to do anything more than the simplest of tasks with them. SharePoint design goal is only to do simple tasks in a simple way, but marketed as as a platform that can do more.
Unfortunately you have to make design decisions and compromises between functionality and user friendliness. The internal field name of SharePoint lists is a good example. The internal names are very important for a developer trying to do something beyond the simplest of tasks. As it is now you are encouraged to neglect internal names when you define your data.
SharePoint is so simple and intuitive anyone can create advanced applications”. This is just a marketing hoax, but more about this some other time, now I am happy with the working integration.

This is a 'no frills' integration, next is to make an OData client work, I mentioned in this post I failed to make such clients work, now is the time do a serious attempt to make one work.

2016-08-01

Going bonkers with SharePoint ODATA

This is the SharePoint Odata interface I created to extract the currency rates of the month, a join of two SharePoint lists:
https://company.com/metadata/currcodes/_api/Web/Lists/GetByTitle('MonthlyCurrencyRates')/Items/?$select=Currency_x0020_Code/currencyName,Currency_x0020_Code/Title,Currency_x0020_Code/currencyNumCode,Currency_x0020_Code/currencyAC_Reporting,Currency_x0020_CodeId,Year,Month,Value%5Fx0020%5FEnd%5Fx0020%5Fof%5Fx0020%5FP,Average%5Fx0020%5FValue%5Fx0020%5FEnd%5Fx0,Value,Average%5Fx0020%5FValue&$expand=Currency_x0020_Code&$filter=Year eq '2016' and Month eq 7


Almost self explanatory, don’t you agree? It took me a weekend to pull this together, but maybe I’m a bit daft. Anyway I started selecting the entire list:
https://company.com/metadata/currcodes/_api/Web/Lists/GetByTitle('MonthlyCurrencyRates')/Items
 
I got a shitload of XML in response, but it was a good start, simple request, complex response. No worries we just limit the response with odata=nometadata in the accept header later on. I added a $filter on year and month column, (for reason I do not know this was not defined as a datetime value), anyway the filter was easy to create:
$filter=Year eq '2016' and Month eq 7


It worked after just a few tries. Wow SharePoint ODATA is easy! Let’s limit the columns, I started by:
$select=Year,Month


Yike! It worked too, happy smile on my face. Now I wanted to join my Currency rates list with the Currency List, and all of a sudden hell broke loose. After a lot of Googling around I realized joining lists in SharePoint is both limited and complex. I understood you need to specify join as $expand using a lookup field in the list. After searching for a while I found the field ‘Currency Code’ linking to the Currency list. As $expand seemed a bit complex I tried to display the ‘Currency Code’, and nothing fucking worked, whatever I tried I just got error in return. I assumed it was the ‘blank’ in ‘Currency Code’ that was the root to my problems. After hours of testing and googling I found there is an internal name that you can find by looking at the last part at the URI at the SharePoint change column transaction:
The picture is a bit small, but the important item here is the Internal name at the upper right

For ‘Currency Code’ it was ‘Currency%5Fx0020%5FCode’, silly me who couldn’t figure out blanks are replaced by  ‘_x0020_’, that is the most natural thing to do. Anyway my query didn’t improve a bit by this hard earned knowledge. Whatever I tried no darned ‘Currency%5Fx0020%5FCode’ showed up in the response.
I decided to sleep on this. Next morning after a healthy breakfast I pondered maybe you cannot display the lookup field, you have to do this $expand thing. After a lot of googling and some testing I got the currency name from the currency table by :
$select=Currency_x0020_Code/currencyName,Year,Month&$expand=Currency_x0020_Code
Finally I could create the SharePoint ODATA interface. This is a Powershell script snippet I use to define the interface:
$usel = '$select=Currency_x0020_Code/currencyName,Currency_x0020_Code/Title,Currency_x0020_Code/currencyNumCode,Currency_x0020_Code/currencyAC_Reporting,Currency_x0020_CodeId,Currency_x0020_Code/Title,Currency_x0020_Code/currencyNumCode,Currency_x0020_Code/currencyAC_Reporting,Currency_x0020_CodeId,Year,Month,Value%5Fx0020%5FEnd%5Fx0020%5Fof%5Fx0020%5FP,Average%5Fx0020%5FValue%5Fx0020%5FEnd%5Fx0,Value,Average%5Fx0020%5FValue';

$uexp = '$expand=Currency_x0020_Code';

$uri = "https://thecompany.com/metadata/currcodes/_api/Web/Lists/GetByTitle('MonthlyCurrencyRates')/Items?" + "$usel&$uexp" + '&$filter=' + "Year eq '" + $yr + "' and Month eq " + $mo;

$res = $wc.downloadString("$uri");


I still do not know how to continue a PS script statement on the next line so  you have to live with the horrible formatting. If you study the ODATA interface at the top of the post carefully, you’ll see the column Currency_x0020_CodeId. After I created the interface I stumbled upon a post that explained if you want to display a lookup column you must add ‘id’ as suffix, silly me again such a lack of imagination not to figure that out, this small problem caused me to do hundreds futile tests.

To be a bit serious, if you are still reading, take another look at the interface at the top. You can argue the column names in our lists were not wisely chosen, but any good software should prevent or warn users from choosing names that will lead to confusion. This is a classic a design miss, hiding complexities to achieve a user friendly system, this almost always fails. In this case the internal column name is to important to hide away.

Now I only have to take care of the output. Remember I wrote I would limit the output by specifying odata=nometadata in the accept header. It turn out this does not work for XML, and I cannot make it work for JSON either. But that is for another day and another post. I need some mental recreation after this interface writing.