Yesterday I got a call "the users are displeased with some reports. The column 'project_definition' is defined as char(9) while it should be char(11), this makes many joins usesless or even invald. How can we fix this?'.
Since the most data coming from SAP into the Data Warehouse is automagically created by the import mechanism is not only redefine a definition from char(9) to char(11), it's correcting the import mechanism, that's why I was called.
I was puzzled, I was pretty sure the algorithm calculating column length in MySQL was right. It had worked untouched for ten years. Recently I done some minor changes in other parts of the Data Warehouse (first in about 5 years), ‘that could not possibly be the reason’ I said to myself, but as any experienced programmer knows when you say something as stupid as that it often is the very reason for the bug, it’s called bit rot, and it can do very awkward things, so I decided to take a look at the code, it was untouched and looked alright to me. Then I looked at the produced definition of ‘project_definition' it showed char(14) not char(11). That’s funny now I got three disparate definitions I decided to have a look at the definitions of 'project_definition' in the Data Warehouse.
Using the information_schema database in MySQL:
SELECT `TABLE_NAME`FROM `COLUMNS` WHERE `COLUMN_NAME` = 'project_definition'
Project_definition showed up in 69 tables (including test and obsolete tables), here are a few:
Then I checked the length of project_definition:
SELECT distinct `CHARACTER_MAXIMUM_LENGTH` FROM `COLUMNS` WHERE `COLUMN_NAME` = 'project_definition'
Oh golly! now I had 5 lengths (including 11). What was going on here?
Then I remembered, I did this ‘SAP Project’ 9 year ago for a workshop in Nantes. Instead of extracting data directly from SAP tables, I decided to extract data via BAPI’s. I didn’t know the SAP projects subsystem and I had very little time, so I thought BAPIs would be a shortcut since SAP had prepared the data I needed, or so I thought.
I asked my Data Warehouse accomplice at the time, Ulf Davidsson, he told me ‘It’s an exceptionally bad idea, BAPIs never give you what you want, there is always data missing, you have to enhance the BAPI and then you have lost the advantage you are looking for, and it takes longer time than finding the data yourself, grab table data or write your own ABAP code, then you have control’.
In retroperspective Ulf was right, actually in more ways than he knew at the time. But I’m a stubborn son of a bitch, so I created the Projects database entirely of BAPIs. The first thing the users told me ‘we are missing X,Y,Z…’. I stupidly asked ‘is that really necessary?’. ‘Without X,Y,Z… this is useless’ was the reply, so I had to roll up my sleeves and start digging in the SAP database, fortunately it was not that much missing. The real problem for me if I recall this right, I did not have control of the data I just run the Bapis enforced it with some table data. I had to spend much time figuring out how to join the data, eventually I got it right and the users were happy. But due to the atomic nature of BAPIs it is extremely time consuming to extract data via BAPIs. It took about an hour to extract the requested data, it was bad but the night is long. After a few years, there were many more projects, another workshop started to do projects and then a distribution center started to use projects and the night was not so long anymore. Grabbing purchase requisitions began to take ridicules long time. I did a quick fix by parallelize the extractions and cut the time down to about 3 hours. It was bad, but manageable.After a few years it was not manageable any more.
I had to rewrite the worst performers into table extractions. This stabilized the performance it’s stable under the hour now, it’s bad but stable.It is a mess, a mix of BAPIs and tables, but apparently a useful mess, when the extraction is late or have failed, users start complaining early morning.
I have asked a number of coworkers to rewrite the projects app, but the reply is always ‘NO, that mess you have to clean up yourself, I will not touch it’. And to be honest neither will I. And now I have this length problem. I suspect the BAPIs do not have as strict and consistent definitions as the SAP tables. And that the BAPI definitions have changed over time. And this has gone unnoticed as long as project names were not longer than 9 chars. I can probably just delete the projects database and it will correctly rebuild itself, but I’m not sure, so I change the char(9) definitions into char(14) in MySql and wait and see what happens.
BAPIs or the BAPI concept is not bad. Mass extractions is clearly not the intended use of BAPIs, they are low volume schnittstellen into SAP, e.g. extract one Sales order and eventually update it is the intended use of BAPIs. Nor should you combine the output of BAPIs to create a reporting structure,.
For mass extraction of SAP data be cautious with BAPIs, use table extraction (RFC_READ_TABLE et al.) or own ABAP code you have control over.