Coral Sea were surprised with the response from this owner…
We are back on line and will be fully up date by the end of the month. All payments will be processed as normal and most of you wont even notice that anything has occurred, BUT for those who want the gory detail read on. There are some excellent lessons for anyone who relies heavily on technology for the day to day running of their business.
The perfect storm!
We encountered 2 consecutive hard drive failures; the second, 4 minutes after the first drive had been repaired. This twice resulted in a 100% total data loss from our server including phone system.
Our manufacturer’s warranty was a next business day guarantee that took 4 days to provide the necessary parts.
Our daily backup to external hard drive that is swapped and checked every 7 days had (unknown to us) been failing for the 6 days prior to the failure.
As with all mistakes, you are not judged on the mistake or mistakes made but on how you respond. I will leave the final judgment on our performance to our owners and tenants but i do want to take this opportunity to thank my team.
I am incredibly proud of how the team responded to this crisis and their performance under pressure was truly inspiring. Thanks guys, I would be happy to face any crisis with the same team.
Read on below for the lessons we have learned and the disaster timeline. At the bottom, I have listed some of the people who assisted. Without these folk our crisis could have been a whole lot worse. Thanks guys.
Coral Sea property
PS I know it was fun BUT, lets NEVER do that again!
1. Back Up Back Up Back up
Double redundancy is not enough.
We had double redundant server with Backups done daily and removed from site every 7 days. This prevented the disaster escalating even further but 7 days data is HUGE!
Back up to a separate storage device EVERY DAY.
Our Score 7/10
2. Keep people in the loop.
People want to know what is happening.
We lost Phones, email, & all data TWICE. The perfect storm!
Have alternate arrangements in place for Communication.
The only numbers we had available were from our old style hardcopies (kept for just such an event!!) We also had full email contacts held off site on the web. We also used our web homepage for regular updates.
Our Score 8/10
3. Have a disaster plan
Once the server is down it is too late to contingency plan. You can only control the controllables , BUT if you plan for the worst case you will have breathing space to make better informed decisions. Avoid reactive decisions, take time and look at the big picture.
Our score 5/10
4. Remove/delete excessive data.
We had over 1 years worth of daily backups on our hard drive. Fantastic I hear you say. This unnecessary data added over 18 hours to our first rebuild and 10 hours to our second recovery. Backups are fantastic but put them where they need to be, off site and on a separate hard drive or similar storage device. Keep your server lean and mean, both for performance and for saving time rebuilding WHEN disaster strikes.
Our score 1/10
5. Make routine emergency
In the days leading up to the failure our data base failed to back up on 3 separate occasions. (Three different users) Our network was slow and our email server bounced incoming emails at random times. These routine glitches were the warning signs of the full scale emergency to come. This was the server firing warning shots of its imminent failure,
unfortunately we were not listening. Some of these glitches were routinely reported to our IT company most were overlooked.
Report EVERY fault in writing and hold your IT department responsible to fix the problem.
No Bandaids, no half solutions.
Our score 4/10
6. Have great people.
You cannot always choose your hardware suppliers (Ours let us down and contributed and extra 3 full days to the emergency) BUT – you can choose your people and service suppliers.
Our IT tech worked day and night and despite multiple heartbreaking setbacks, he kept getting up off the floor and coming up with solutions.
Our team worked together and daily came up with innovative and creative solutions to keep servicing our customers despite almost impossible circumstances; then worked almost constantly for 48 hours to get the system back on line.
Our business coach, bank manger and other associates all helped provide an outsiders perspective and were excellent sounding boards for public perception and damage control.
Our score 9.5/10
We were not prepared for a disaster of this kind and found ourselves making reactive decisions on the hop. We will be substantially better prepared for any future emergencies.
To avoid this kind of disaster happening in your business.
Plan NOW for the worst case, look for the warning signs and always make routine emergency.
Disaster Time line
Sun 13May 9pm approx
Server fails – no alarms no notice.
Mon 14 May 8am
1 hardrive of 4 has failed, small setup fault prevents redundancy setup from allowing continued operation.
Server covered under manufacturers, Next business day guaranteed repair.
It tech requests 2x hard drives + other parts for fix.
Parts despatched from Sydney STANDARD FREIGHT*–no explanation for delay in despatch.
Tuesday 15 may –
Investigations discover parts not yet sent (internal fault with manufacturer) Parts now sent STANDARD FREIGHT*. (2 day delivery time) (so much for next day gtee!!)
Thursday 17 may
parts picked up at airport 7am Thursday 17 may.
Parts installed – First hard drive. 36 hour server rebuild begins
Friday 16 may
11.55am Server rebuild complete – System restored
11.59am Second hard drive fails. (AAAAAAAAAAHHHHHHHH!!!!)
12.30 pm Tech Installs 2nd hard drive (Fantastic planning to order 2!!)
1.00pm Data transfer from offsite Backup unit begins (18 Hours)
Sat 17 may
9.30 am Server operational
9.40 am Backup data corruption discovered
Sat 17 may – Sun 18 may
31 Hours Cora Sea Team rebuild and re enter all lost data (Lots of coffee!)
Sun 18 May
6pm The pirates of the Coral sea are up to date and back online! Crisis averted, a frosty beer earned by all!!
Our owners have been fantastic and we appreciate the understanding shown in trying circumstances.
Our tenants have been awesome, After 7 days offline we have A TOTAL OF 2 TENANTS IN ARREARS!!
Out It tech Jarrod Lowe from Onboard IT must have felt like a heavyweight boxer fighting Mike Tyson. Every time he came up with a solution he got knocked back down by a hardware failure of some sort. A great man to have on your team in a crisis.
The calm in the storm, James Hooper our business coach provided sound advice and practical solutions to keeping our customers happy and informed. As one of our owners he was also uniquely positioned to give us a running commentary on the crisis from an owners perspective.
Ian worked with me from Saturday afternoon until way after all normal people are in bed sometime on Sunday morning to reconstruct the over 430 transactions that needed to be entered and help balance our trust account. His logic and calm manner and attention to detail saved untold extra hours of work and ensured we met our promise to be back online by Monday Morning.
Thanks mate you truly are a champion bloke.
My Awesome team
You guys rock! The teams first real test under fire and in my eyes they passed with flying colours
My lovely wife
She had to put up with a half crazed and stressed man living on no sleep and coffee for 7 days. You deserve a medal. But maybe a weekend away will have to do.Thanks Verena.