Sysadmin Stories: Depends on the machine

by Stephen on October 19, 2009 · 0 comments

in Sysadmin Stories

1

From: kochmar@sei.cmu.edu (John Kochmar)
Organization: The Software Engineering Institute

A long time ago, back when the Apollo 460 was around and I had just
graduated from college, I had the good fortune of being one of two
adminstrators in charge of making a cluster of 460’s a part of our
environment. One of the things I was tasked with was geting them onto
our network.

Well, I was young, I had the manuals, and a guy from Apollo tech
support was there to help. How hard could it be, right?

Well, we got out the manuals, configured the system (relying heavily on
the defaults), and within 2 hours, we had that puppy on the network.
Life was good.

About 3 hours later, I get a phone call from a systems programmer /
developer from CMU campus (the SEI is a part of CMU, and we are on their
network.) He told me that if I didn’t take the &%@*ing Apollo off the
network, he was going to do hurtful things to me physically.
Life was not so good.

As it turned out, in default mode, the Apollo answered every address
request it saw, even if it is not the machine the request was for.
Kind of a “hey, I’m not who you are looking for, but I’m out here in
case you decide you’d rather talk to me.” Apollo considered this a
feature, and they took advantage of it in their OS environment.

However, one of the earlier versions of a heavily network dependant OS
developed at CMU considered this a bug. The OS would issue a request,
and expect only the machine it was looking for to answer it. Of
course, it would assume that if it got an answer to its request, it
must be the machine it expected to talk to. It didn’t look at the
address of the answer it got, so if it wasn’t the correct machine, most
of the time the OS would hang or panic.

The outcome? Over about 3 hours time, more and more of campus was
talking to our little 460, which had just enough muscle to keep up with
the requests. By the time campus figured out what was going on, we had
an Apollo merrily answering the network requests for hundreds of
machines (the ones that were still up, that is.) This caused the part
of campus who used the new OS going to hell in a bucket, one very busy
Apollo 460, and one very warm ethernet.

Well, we turned off the Apollo, configured it not to chat to all of
campus before putting it back on the ethernet (this time, we did it
while talking with campus, making sure we didn’t cause the same
problems we did the last time — we didn’t have a packet monitor at the
time), and campus changed their OS to look at the request response
before assuming it was the correct one. I also learned to think very
carefully about default values before using them.

2

From: dinicola@itnux2.cineca.it (Attilio Dinicola)
Organization: Laboratorio di Fisica Computazionale, INFM. Trento Italia

I was mor’ing somethin at the system console, ultrix os under me!

I wanted to press a ^L and, unfortunately, the nearest ^P suspended

system activities: a console mode prompt appeared.

So, I pressed:
res
Thinking .. resume .. but res became restart and the system
rebooted destroying all processes.

Naturally, Murphy was in front of me and some batch jobs were
running since four or five days before. WERE .. RUNNING!

3

From: sam@bsu-cs.bsu.edu (B. Samuel Blanchard)
Organization: Dept. of CS Ball State University Muncie IN

kill -1 1 on an Altos SV box is not good. I pulled this one trying to show
off. No more gettys appeared when uses logged off. When I went to the console,
I calmly typed 0 to the Run Level request prompt. 2 would have been nice?
It was my first SystemV like box, and it seemed to have such nice berkley
commands.

A control-s on a Sequent S27 console can cause processes to hang waiting to
write to the console. Unfortunatly, su is one such process. No real problem
since I don’t blindly reboot on request ๐Ÿ˜‰

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: