2013-09-08

Moving a Data Warehouse - 3

In some posts I have written about the migration of my Data Warehouse  from my own hardware to Dell servers. This move was partly initiated by my transition to a new job in the company, the management didn’t dare to run the Data Warehouse on the hardware I built .

When I originally designed the Data Warehouse infrastructure two key principles were low cost  and simplicity . I needed a database so I created a database server. I needed an ETL engine so I created an ETL server. Then I needed PhpMyAdmin so I created a PhpMyAdmin server. One function one physical server.  The only extra in my Irons was an extra Network Interface for an internal server network. And in the beginning servers were scrapped IBM desktops. I maximized RAM, replaced the hard disk and added a network interface, dirt cheap. Then I installed a Linux and one application, fired up the new server and forgot about it for two to five years. My servers were mostly replaced when I needed more capacity not due to hardware failures. One of very few hardware problems I have had is described here .

I avoid software tweaking and optimization, I try do do standard installs right from the distro, e.g. I choose ‘big’ for Mysql config file that’s is about how much tuning I do. I have all databases and indexes on the same disk! About half a terabyte database with about eight million queries a day ( I have seen peaks over 15 million queries a day), this is on a custom made server with 16GB RAM.

Now this has changed with the migration to ‘real’ servers. The one server one function  approach would have been all too expensive, so I had a choice either pack more functions in one server or go virtual. I have for some years wanted to test a virtual solution, so I decided to go virtual without testing. I decided I go for two servers one physical database server and virtual host for all other servers. I do not believe for a second you can have a low cost simple virtual high performance database server. But the rest of my Data Warehouse servers could well be virtual, this way I could keep my one function one server  philosophy and still be reasonable cost efficient. I was right and I was wrong.

The new environment is much more complex, virtual servers add a software abstraction layer between the iron and Linux, and by going virtual you also need a software layer between your hard disks and the virtual servers for practical space management. For all this to work you need an expert to manage this environment. And an expert costs and the expert has his own preferences and experiences, e.g. a Linux professional does not necessarily know Mageia Linux. Since we didn’t have in house Linux operations expertise we hired a consultant. A mistake was not to listen to the consultants recommendation of virtualization software and Linux distro.  Not that it’s difficult for a Linux professional to learn another distro it just takes some time, but more important, the support of my now non standard infrastructure it will always be exotic for the consultants operations team. I should have spent more time with the consultants upfront going thru the server setup. We would probably have had a better server setup still adapted to my Data Warehouse.

This is complication I would not have had if we had used Windows Server instead of Linux distros, since Windows is a singular opsys. I do not know if this is good or bad.

I end this post with a humble statement. Still few people seem to have my insights in hardware  and infrastructure for Business Intelligence systems. Actually very few I talk to make any distinctions between any type of applications in this respect, the same hardware fits all give or take some RAM and CPU; that is the adaptation to applications you see. I believe most hardware infrastructure is grossly overpowered/priced and designed for ERP transactional applications.    

No comments:

Post a Comment