The Hitchhiker's Guide
To The Life And Luck of Thea Rivera

Home
Get Email Updates
Marathon Training Mileage
Kenny
Jenn
Coyote
Koyote
Jenny
Janel
Evilbutcute LJ (me!)
Blacklist Productions
Email Me

Admin Password

Remember Me

674488 Curiosities served
Share on Facebook

Great Server Fiasco of Thanksgiving Day Weekend
Previous Entry :: Next Entry

Mood:
Annoyed

Read/Post Comments (1)

For those of you who do not know what I do, I manage tech support people for a customer base of shared server customers. That is, we have a bunch of servers, and each of those servers house the sites and stores of many many businesses.

Many businesses who rely on people who do Christmas shopping after Thanksgiving.

Many people who have the tendancy to get pissed off when something happens, and even more pissed of if the excuse for that soemthing happening is really really lame.

Did I mention that I get to handle escalations of all angry customers?

Okay, with that being said, here's a recount of how wonderfully perfect things went here in web hosting land.

Yes. That was sarcasm, but everything from here on out is actual fact. The stupidity you read here has not been embellished in any way.



Friday evening, while in Rialto, I get a call on my cell phone. It was the Boston data center, and they were telling me that one of the shared servers, with about 60 customers on it, had been down for a while due to a hardware failure. I asked if they called our oncall person, and they said no. So, I told them to call the on call person (who they should be calling before calling me) and they did so... over 4 hours after talking with me.

Not hearing anything else about it, I had assumed that everything was handled appropriately. How wrong I was!

Sunday morning I awake to a call. It is Boston, who tells me that the server is still down, and that they NEVER got a hold of the on call person.

So of course I go nuts, and was ready to tear into my on call tech. I called all of his numbers, and called my other techs to see if anyone could do anything about the mess.

Finally, oncall tech calls me back, and said that he had been working for many hours on that machine with Boston ever since Friday night except for late Saturday/early Sunday when he was asleep.

And then I heard hoe screwed up the server rebuild was.

An IDE was built instead of a SCSI, with nowhere to mount the backup drive to set up the server.

The server was not provisioned to be anywhrere close to having what was needed for the customers. There were no scripts, no applications, 256 MB ram instead of 1 GB which is standard for the shared servers, and the network card was fucked, and drives weren't mounted correctly.

Meanwhile, our tech was trying to help the guys in the NOC figure out what was going on, but he had to walk the tech through using vi, as he didn't know how to edit a file. All the tech needed was one line commented out of a script. He gave up after quite some time. Come on! If you're going to be a sysadmin, for a NOC full of unix servers, know how to edit a text file!

We finally got customers up, but only barely, and not for long. When Monday came around, I really reamed the other managers about this. I had to do this around escalated calls, but at least I had some time to gather my thoughts.

And the worst thing about it is I don't think anyone else even realizes how many points of failure there were in only 4 days.

*sigh*

We moved customers over to a totally different server because we couldn't leave them on the screwed up one.

I'm hoping the rest of the week goes smoother.


Read/Post Comments (1)

Previous Entry :: Next Entry

Back to Top

Powered by JournalScape © 2001-2010 JournalScape.com. All rights reserved.
All content rights reserved by the author.
custsupport@journalscape.com