Womack Report

November 13, 2009

Observation Beats Logic

Filed under: Computers,Work — Tags: — Phillip Womack @ 7:12 pm

I had a good day today.  A good day in a sort of terrible fashion. 

It’s a good day because I reinforced a valuable lesson I thought I had learned a long time ago.  I apparently haven’t learned it well enough yet.

It’s terrible because I spent most of the week blundering about uselessly on a problem that I could have fixed much more quickly and easily.  It’s also terrible because I have a potentially hideous computer virus problem at work, but I take the useless blundering much more seriously as a personal failing.

Earlier this week, one of my coworkers reported that she couldn’t get to various resources on the network.  Not that unusual an event, could happen for any number of reasons.  She mentioned that she had clicked on something and then her connection had shut down.  She thought that was at fault.

So, I went and tinkered around with it.  Couldn’t ping any servers.  Check the status on the network conectors…huh.  Shows no network connectors exist.  Check in the device manager, and the network adapter drivers are all flagged as having problems.  Also, there are a number of devices I don’t recognize.  “Ah, ha!”  I say to myself.  “Here’s the problem!”

I try to reinstall the drivers.  No success.  Network is integrated into the computer motherboard, so it’s not a loose connection or anything.  I check everything that I think of, then decide it must be a random hardware failure.  The machine is under warranty, so I just contact Dell to get a new motherboard put in.  Dell has me troubleshoot a bit more, covering ground I’ve already checked, then the support tech agrees to send a service tech with the motherboard.  Easy enough, it’ll take a few days.

Today, the tech came and swapped the motherboard.  When I started the machine up again, the same issue occurred.  Drivers don’t work right, can’t reinstall them.  The on-site tech immediately made himself scarce when he realized the problem was still there; he pointed out that he just installed the parts Dell told him to and jetted before he could get sucked into any responsibility for getting it working.  Not the most courageous move for him, but whatever.

So, at this point I figured the motherboard had to be good; not much chance of two failing the same way.  I decided to do a full repair of Windows, replace all the system files and reinstall all the devices, and see what I could figure out that way.  Not a clean format, but next best thing.

Most of the process runs fine.  When I’m ready to start it up again, the system freezes at the loading screen.  Hard reset it, it comes up, and the network works again!  Hooray!  Then my email proxy starts screaming about all the virus messages it’s blocking.  Hooray?

One disconnected network cable and a few virus scans later, I’ve discovered a moderately horrifying trojan infection on that machine.  Moreover, when I clean all the infected files up, the network adapter screws up in exactly the sme way I’d been seeing before.  So, not too hard to connect some dots there.

At this point in the story, I kick myself for being an arrogant idiot.  See, right back at the beginning, the user told me she’d clicked on something bad, and then the network stopped functioning.  So, why was I chasing a hardware failure?  Clicking a link in a web browser is never going to cause a hardware problem on a computer motherboard.  It’s impossible.  Seeing what looked like a hardware problem should just have tipped me off that some software issue was screwing with the network card drivers, and I should have responded accordingly.  Instead, I wasted several days, and seriously inconvenienced my user.  I could have been better than I was.  Next time I will be.

What kills me on this whole story is that I used to gripe at my support techs when I worked at Video Insight about this exact issue.  “Listen for clues,” I frequently said, “don’t just assume you know what’s going on and disregard the person on the other end of the call”.  That was me, today.  My user gave me the clues to solve the problem, and then I ignored them to focus on what I thought the problem ought to be.

You can never win by fitting your problem to your solution.  It’s tempting, when you see a problem that’s very similar to something you’ve seen before, to ignore the small differences and apply the familiar solutions.  It’s satisfying, and it’s easy.  You build a chain of reasoning, and it’s totally perfect, flawless.  All you have to do is ignore this one tiny fact someone observed.

But observation beats reasoning every time.  What actually happened is what happened, and all the clever logic in the world won’t change that.  You have to build your reasoning on top of your observations, not build your tower of reasoning and then pick out the observed facts that give you the conclusion you want.

I apparently needed to hear that again.  And I’m glad to learn that lesson again so cheaply.

Powered by WordPress