A wrong software patch that was going to cost me a career turned into an invaluable lesson on crisis handling.
Five minutes that changed history
A few more years on in my stay in China I became technical consultant of Tecnomatix for SCADA, that stays for Supervisory Control and Data acquisition and MES, Manufacturing Execution System, two products representing the pillars of shop floor IT. It was 2004 and I was sitting in the control room of a large steel mill in Jinan, Shandong.
I had been invited by Schneider Electric, which was reselling a piece of SCADA software of Tecnomatix. I was there trying to solve all sorts of problems with a large application to control the furnaces and all steps and machinery involved in the production.
It was a multi M$ project that Schneider Electric sold, mostly in automation hardware. The SCADA software, as often happens when a large hardware corporation sells it, was a trivial part of the whole value and therefore there was very little budget in relation to it. I learned that the engineers called to use the software package only received a mere introductory training and then they were left to themselves to cope with a very complex project.
The problems I was facing then were mostly due to lack of knowledge and experience of the engineering team. Despite good will they managed to have a number of errors with unpredictable effects, from alarms not being displayed to instrumentation settings changing spontaneously.
After trying all possible ways to solve the problems on their own, finally Schneider electric agreed to call me, the guru of that software, for 5 days of assessment and correction of the problems.
The apocalypse
Well into the third week of my visit I was finally about to run a patched version of the system. I asked and obtained a test window of 30’ minutes during a production pause to install, test, and possibly validate, or roll back the patch
I received the go-ahead from the production team, who had been requested to backup all data in the system to be able and restore them in case of problems. I looked at my colleague of Schneider and he entered the run command.
All seemed fine for about 2 minutes. Then the phone rang and the operators reported that all set-up values of the system were zeroed. Just after that the production siren announced the immediate resuming of production and then the hell on earth happened because no equipment was ready to run.
Within 20 minutes I managed to restore the previous version of the system, but the production department had not prepared any backup of the data and therefore it took another good 15 minutes to manually enter again all most important settings and resume production with a delay of nearly one hour.
The power of truth
My colleague at Schneider Electric was shaking but managed to lead me out of the place and back to the hotel. It took some beers to manage and some jokes about him and me being fired and possibly used as additives in the next batch of steel.
The next morning we arrived to the plant and a meeting was gathered right away with all division leaders and managers and the plant leader, a huge man with a broad smile and a calm voice.
My colleague was visibly nervous. I had revised in my mind what happened and I was sure to have a strong point, but I was not sure at all that the leader was keen to sacrifice anyone else other than me, the representative of the supplier of a software seen by all as the cause of all problems and now even of a loss of production costing possibly millions of CNY.
Then the plant leader asked one by one all leaders to explain what happened. Each of them gave a version from their point of view asking their managers to add the relevant details. My comprehension of Chinese at the time was still quite limited, but I was getting the impression that the accounts they were giving were quite fair. Despite the palpable tension at times there was even laughing when someone’s attempt to escape their responsibility was promptly exposed. I took that as a good sign.
Then it was my turn. I was granted the possibility to speak in English and my colleague of Schneider Electric was translating. The plant leader asked me why the parameters were zeroed. I explained that due to a mistake I made designing the software patch all parameters were zeroed.
Then he asked why the production was affected by my test and I explained in very simple and plain way all the sequence of events, and finally how the untimely restart of production and the lack of back-up caused the production delay.
The turning point
The next questions were not for me, but for the production leader. I learned that he had pressure to resume production earlier from his commercial and financial counterparts.
I was granted a new 30’window to test the patch. All departments were informed that safety procedures had to be followed.
My 5 day visit to the plant lasted 3 months during which I was paid a very generous consulting fee and won the respect of all engineering team and leaders.
Schneider electric obtained full approval for the project and won the bid for a new even bigger one.
I received personal appraisal letters from Schneider and Tecnomatix heads and landed a job with UGS PLM that meanwhile acquired Tecnomatix, with a very nice pay check
The lesson I learned
- Be honest and take responsibility for your job and your mistakes
- Do not try to make and tell your judgement to whom is in charge but provide them what they need to make an informed judgement
- do not panic, grab a beer!