attempting to ignore snarkyness, but failing - and that says what about the doze...

jacquesm · on Oct 28, 2009

There once was a really nice quote here on HN: "you can't outsource responsibility".

If in two years time you've never ever had a look at what kernel you are running, especially while tuning a system for performance you only have yourself to blame.

Don't tell me you're running a 'stock' kernel and never bothered tuning it for your application, or considered upgrading it. Also, in your resources list you should have the exact machine configuration, there are tools to retrieve that sort of info automatically.

Then, when you're done, store it in http://inventory.sf.net/ or something like that.

It's typical that the people at rackspace would simply drop in the requested hardware, and that you yourself deal with the configuration.

The smart money is on running some tests after they've done that to make sure it went ok. Asking for a CPU upgrade and not checking if they're operational is just plain stupid.

I figure you literally asked rackspace to upgrade the CPU, and that's what they did.

Did you explicitly ask them to install an SMP kernel with a specific version and they didn't do it ? Or did you expect them to do it but you didn't check if they actually did until today ?

Two full years of trying to tune a box for performance and not noticing this, then publicly blaming rackspace is simply cheap, an attempt at pinning the blame on rackspace, for something that you should have noticed long ago yourself.

Kudos for writing about it but the title should be "How I messed up". That's taking responsibility and then make sure it never ever happens again.

nailer · on Oct 28, 2009

'Don't tell me you're running a 'stock' kernel and never bothered tuning it for your application, or considered upgrading it.'

Not sure why you've got 'stock' in quotes. Vendor kernels are used by hundres of thousands of servers, each sharing the same bug reports and security updates. There's a massive benefit unless you think you can do those bug reports and security updates better than your OS vendor.

Most custom compiles are by people who don't understand loadable modules or read somethign written before they existed.

jacquesm · on Oct 28, 2009

Ok, point taken, but there are definite advantages to 'rolling your own'.

forkqueue · on Oct 28, 2009

Such as?

Using a vendor-supplied kernel means that there are extremely likely to be other people using most of the same stack as you, many of them on the same hardware. If there are problems, it's likely that other people have noticed the issue, even if the bug hasn't been found, so it's much more likely to get fixed.

If you compile your own kernel (and/or copy of Apache, MySQL etc etc) you're running something unique to you. If you have problems, you're on your own.

If you're paying for Red Hat Enterprise, use the Red Hat Enterprise packages unless there's a good reason not to. If something goes wrong, you can call Red Hat support and have at least a steer in the right direction. Custom-compiling everything just for the sake of it, just to have new 'shiny' stuff is crazy.

jacquesm · on Oct 28, 2009

It's not for the 'new shiny' at all, it's got to do with optimizing your kernel to match your hardware and getting rid of loadable module support in favor of a kernel that has on board exactly that which is needed to operate your system.

A 'stock' kernel has a whole pile of things in it that might be the next remote exploit, by removing such stuff you marginally increase security.

Other things you might need:

   - kernel support for booting from raid filesystems without trickery
   - processor family optimizations
   - maximum number of cores (stock = 8, we run 16 on quite a few machines)

As for compiling, I do that anyway, it's a small job compared to the number of times that you need to do it. And you're just as much 'on your own' to solve problems, the chances of having them are less though (because the system you are running is considerably leaner).

Second your redhat enterprise solution, that's not what I'm using though on most of our machines (either centos or debian), but that's a good solution too.

nailer · on Oct 29, 2009

Not a big user of software RAID, but AFAIK booting from a metadisk / is still out of the box doable as it was a few years ago:

"- processor family optimizations"

RHEL / CentOS include a variety of kernels precompiled for different CPU architectures.

"- maximum number of cores (stock = 8)"

What distro? RHEL / Centos support far more than that, we've got quite a number of 32 core machines and have a few 64 core boxes in test.

inaka · on Oct 28, 2009

Honestly your tone stings a bit, jacquesm. I guess i shouldn't have titled it 'don't trust rackspace' but my larger point is exactly the opposite of what you wrote - take responsibility for your servers - don't trust anyone, even the most expensive hosting provider, to do it for you.

jacquesm · on Oct 28, 2009

That's a whole lot better, if that was the message then it somehow got lost to me.

Again, apologies for the tone, but it really seems to be a trend to make a mistake, 'blame someone', then blog about it.

What I would suggest you do, and this is meant very seriously, is find a cheaper hosting provider (EV1/The Planet is about half of what you pay right now) and spend the rest on getting a part-time sysadmin that really knows his stuff.

The difference in $ should be minimal, then look over the guys shoulder at how it is done, but keep doing what you know is your 'level' anyway. That way you get the best of both worlds, excellent care and you don't break the bank, at the same time you'll learn a huge amount.

And if your startup grows you just might have found an employee for the future.

Spend some time looking around, your best bet would be a guy or girl that does sysadmin duties for a larger company using UNIX that wants to make some extra $ in their spare time.

sailormoon · on Oct 28, 2009

I really do not dig this tone. The guy is obviously not a system admin. He paid top dollar for rackspace managed hosting precisely so he wouldn't have to do the kinds of things you mention.

"You can't outsource responsibility" is utter nonsense. It is completely impossible to "own" responsibility for everything important in a complex society. Meaningless platitudes should not distract from the fact - Rackspace did not do their job.

Yes, he messed up. He messed up by making assumptions and not checking Rackspace's work more closely. That's not the same as messing up in your own work. His post is a reminder to be more careful checking on the work of your "upstream". There's no need to pile on with the "if you didn't know 'top 1' you shouldn't be running a startup!" etc.

thaumaturgy · on Oct 28, 2009

I'm not a big fan of the tone, either, however, jacquesm is spot-on in his assessment.

For one thing, my understanding of Rackspace's business practices -- and I've only dealt with them peripherally, so I might be a bit wrong here -- is that they "manage" things like their network, and the actual server hardware, and stuff like that. So, if you want a CPU upgrade, sure, they'll do that. If you need your server rebooted, they'll do that too. But, they don't have anyone sitting there monitoring your system's performance metrics and doing your sysadmin duties for you.

The way I read it, Rackspace did do their job: they upgraded the hardware. It was up to the server admin -- not Rackspace -- to check that the software was then configured correctly.

And finally, I don't generally agree with statements of the form, "If you don't know X, you shouldn't be doing Y", but ... looking at dmesg and top are both really, really, really standard sysadmin operations. Entry level stuff, really. Sysadmin work doesn't just mean messing around with Apache's configuration; there are many more nuances, and it's likely that their system is vulnerable to problems that they don't even know about.

jacquesm · on Oct 28, 2009

The tone is probably in large part because the OP does not take any responsibility for his own part in this and instead is pointing his finger at a third party that may have been partially at fault. But that is by no means sure.

This is typical with what I think is a real problem in society, the 'externalization of blame'.

Inability to see your own responsibility is a serious issue, and it is really pervasive. If I were in the OPs position I would be headbutting a piece of concrete for 20 minutes to make sure I never ever make a mistake like that again, and I would thank rackspace for finally finding the fault that I could have noticed in 5 minutes two years ago.

That's why you have post-delivery checklists, burn in tools and inventory management, staples of everybody that has # on machines that do customer work.

I'll try to keep my 'tone' better under control, apologies for that.

At least it wasn't in Dutch ;)

inaka · on Oct 28, 2009

probably not a good idea to comment authoritatively on a company you haven't worked with, but no, kernel management is part of rackspace's job. performance and monitoring is part of their job. they have an SLA and this is absolutely part of it...

jacquesm · on Oct 28, 2009

I didn't say he shouldn't be running a startup, I said he should not be managing the servers their customers stuff runs on.

As for the tone, you may disagree with that but that does not distract from the fact that if you operate a business, that you should know your stuff.

And if you outsource something you should at least know how to check up on the bits that you've outsourced.

Outsourcing does not mean that your responsibility disappears, it simply changes from 'doing' to 'monitoring'.

Maybe rackspace did not do their job, I have no insight in the communications that went on between the party involved and rackspace.

All we get here is a pointing finger without any responsibility taken, that is not a realistic picture.

It could be the difference in the wording of the upgrade request ("please install another CPU in our machine" vs "please install and configure another CPU in our machine").

Even then, rackspace probably should get part of the blame, but really not all of it. The fact that the situation persisted for two years is completely on the OPs account, in two years you have many more opportunities than your hosting provider to find this out, after all they will leave your machine alone unless it malfunctions and there is no indication that they ever were requested to look in to this, and when they were they actually found the problem.

I quote from the article "In investigating an unrelated issue, we followed up with Rackspace on a Kernel patch that couldn’t be applied to our server. One of the technicians immediately realized why – we were not running the SMP kernel."

How come someone is trying to patch a kernel, can't apply the patch and then still doesn't clue in to the situation ?

Also, we do not know if the SMP kernel was installed or not, it might have been, and then on the final reboot the wrong kernel was brought up. And that's a very easy mistake to make.

But dmesg would tell you in a heartbeat, as would 'top '1'', which you would be using plenty of times while debugging performance issues to make sure all your cores are doing the right amount of work.

sailormoon · on Oct 28, 2009

"I said he should not be managing the servers their customers stuff runs on."

And what if it's only him? No go then huh?

You've been saying a lot of this kind of thing lately. That guy before with the App Store payment problem? You came down on him like a ton of bricks. And now this. Just because people haven't dotted every i and crossed every t. It's not exactly the hacker mentality is it?

jacquesm · on Oct 28, 2009

Then he could hire a part-time sysadmin, there are plenty of those looking for work. I figure for $200 / month he can switch to a similar powered dedicated server with a competitor and pay a guy for 4 hours worth of real hands on sysadmin time every month. That way he pays roughly the same and comes out ahead in every way.

Managing UNIX systems that have to perform well under load takes quite a bit of knowledge. Sure, everybody can install 'ubuntu', 'redhat', 'gentoo' or whatever flavor is popular this week. But that does not make you a system administrator. I wouldn't trust myself with my customers machines either, simply because to stay up-to-date on all the holes in all the packages that you may have installed and keeping them patched is real work.

I don't think I came down on the app-store guy 'like a ton of bricks', in fact I gave what I thought was pretty sensible advice and offered (after Sam Odio did) to help him out.

But it's essentially the same problem as what is happening here, blame company X because of something that you caused yourself.

The app store guy:

  - quit job before having money in the bank
  - set up overly complicated corporate structure to avoid non-existent liability

This guy:

  - take responsibility for a part of the operation that he's not qualified to do
  - keep on messing for two years without calling in outside help (sure, it will cost)

And both of them point the finger at another party.

So maybe that's why it seems to you that this is a 'lot of this kind of thing'.

As for whether or not it is the hacker mentality is not my thing, I call it as I see it.

I've had people here rip me to bits for making a stupid remark (and rightly so), if you can dish it 'Rackspace is at fault because they don't know how to upgrade a cpu' or 'Apple is at fault because they don't pay me' then you should be able to take it.

sailormoon · on Oct 28, 2009

Since it's basically personal I'll take it to e-mail if that's OK with you.

jacquesm · on Oct 28, 2009

Anytime!

dagw · on Oct 28, 2009

It all comes down to what I paid for. Check your contracts carefully.

Some places basically just give you a computer and make sure that it always has power and network and that the hard drive is backed up, and everything after that is up to you. Other places give you 24 hour sys-admins that continuously monitor everything and basically manage every aspect of your server. The former obviously costs a lot less than the latter.

It's perfectly OK to outsource responsibility, but you've got to pay top dollar for someone to take on that responsibility. You cannot go for the cheap option and expect all the services offered by the expensive option.