During the writing of a series of posts on parallel scheduling of workflows in PHP, we have migrated an ETL application from my old physical server with an Intel I7 8 core processor to a virtual server in a Dell R720XD with a XEON E5-2609 4 core processor, and the ETL virtual server is assigned 2 CPUs. If you look at the CPU consumption within the ETL server it does not look very alarming.
You can see the server is working hard, but not that hard. I knew the E5-2609 was a weak processor for the load, but I figured it wouldn’t matter since the hard work is background processes run at night. If the jobs takes a bit longer no one would ever notice. But I was wrong, and I’m a bit embarrassed I should have known better. When you stress any kind of system things starts to happen, bad things. Computers, operating systems and programs are no exceptions.
I fork threads by PHP pcntl function and they failed random and silently in the new server. It is hell to debug these kind of failures where and why does failures occur, the root cause might not even be in the server since most threads communicate with remote ERP systems the prolonged execution time might trigger erratic behavior in those ERP systems or the communications layer or in the operating system or in … there are many many reasons parallel processes can go wrong, many more than single threaded processes. And I have to admit I run this in in PHP 5.6 RC2, yes I know you should not run production on RC versions, but that's just the way I am, early adopter and a bit lazy. During the Christmas holiday I will upgrade to a better PHP version, it’s not an easy task since I have to compile and link in some plugins, and do a lot of testing before I can switch. As a temporary circumvention we have moved the ‘biggest’ jobs back to the old server. Next week I will buy a new processor to the server. The good thing is I can have a much better processor than I could when I bought the server, now I choose between Intel XEON E5-2660 v2 and XEON E5-2670 v2, both of them are monster performers compared with the original E5-2609. The most important thing is the whopping 20 CPU’s these new processors can muster with hyper-threading. If I assign 10 processors to my ETL server, it will be more than sufficient. Then there is an interesting question, can I do the operation myself? I’m pretty experienced as a computer engineer, but I never tinkered with a rack server before. I’m not even sure I know how to get the server out of the rack. Anyway I do hope the thread failures will go away with the new processor. Debugging parallel processing is hard, swapping a processor is hopefully a clean simple operation.