<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>UnixNewbie.org &#187; Sysadmin Stories</title>
	<atom:link href="http://www.unixnewbie.org/category/server-admin-tips/sysadmin-stories/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.unixnewbie.org</link>
	<description></description>
	<lastBuildDate>Mon, 30 Nov 2009 16:58:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Sysadmin Stories: Moral of these stories</title>
		<link>http://www.unixnewbie.org/ss-moral-of-these-stories/</link>
		<comments>http://www.unixnewbie.org/ss-moral-of-these-stories/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 23:28:38 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=531</guid>
		<description><![CDATA[Moral of these sysadmin stories...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: jarocki@dvorak.amd.com (John Jarocki)<br />
Organization: Advanced Micro Devices, Inc.; Austin, Texas</p>
<p>- Never hand out directions on &#8220;how to&#8221; do some sysadmin task<br />
  until the directions have been tested thoroughly.<br />
  &#8211; Corollary:  Just because it works one one flavor<br />
    on *nix says nothing about the others. &#8216;-}<br />
  &#8211; Corollary:  This goes for changes to rc.local (and<br />
    other such &#8220;vital&#8221; scripties.</p></blockquote>
<p>2</p>
<blockquote><p>From: ericw@hobbes.amd.com (Eric Wedaa)<br />
Organization: Advanced Micro Devices, Inc.</p>
<p>-NEVER use &#8216;rm <any pattern>&#8216;, use rm -i <any pattern>&#8216; instead.<br />
-Do backups more often than you go to church.<br />
-Read the backup media at least as often as you go to church.<br />
-Set up your prompt to do a `pwd` everytime you cd.<br />
-Always do a `cd .` before doing anything.<br />
-DOCUMENT all your changes to the system (We use a text file<br />
 called /Changes)<br />
-Don&#8217;t nuke stuff you are not sure about.<br />
-Do major changes to the system on Saturday morning so you will<br />
 have all weekend to fix it.<br />
-Have a shadow watching you when you do anything major.<br />
-Don&#8217;t do systems work on a Friday afternoon. (or any other time<br />
 when you are tired and not paying attention.)</p></blockquote>
<p>3</p>
<blockquote><p>From: rca@Ingres.COM (Bob Arnold)<br />
Organization: Ask Computer Systems Inc., Ingres Division, Alameda CA 94501</p>
<p>1) The &#8220;man&#8221; pages don&#8217;t tell you everything you need to know.<br />
2) Don&#8217;t do backups to floppies.<br />
3) Test your backups to make sure they are readable.<br />
4) Handle the format program (and anything else that writes directly<br />
   to disk devices) like nitroglycerine.<br />
5) Strenuously avoid systems with inadequate backup and restore<br />
   programs wherever possible (thank goodness for &#8220;restore&#8221; with<br />
   an &#8220;e&#8221;!).<br />
6) If you&#8217;ve never done sysadmin work before, take a formal<br />
   training class.<br />
7) You get what you pay for.<br />
 <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_cool.gif' alt='8)' class='wp-smiley' /> There&#8217;s no substutite for experience.<br />
9) It&#8217;s a lot less painful to learn from someone else&#8217;s experience<br />
   than your own (that&#8217;s what this thread is about, I guess <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  )</p></blockquote>
<p>4</p>
<blockquote><p>From: jimh@pacdata.uucp (Jim Harkins)<br />
Organization: Pacific Data Products</p>
<p>If you appoint someone to admin your machine you better be willing<br />
to train them.  If they&#8217;ve never had a hard disk crash on them you might want<br />
to ensure they understand hardware does stuff like that.</p></blockquote>
<p>5</p>
<blockquote><p>From: dvsc-a@minster.york.ac.uk<br />
Organization: Department of Computer Science, University of York, England</p>
<p>Beware anything recursive when logged in as root!</p></blockquote>
<p>6</p>
<blockquote><p>From: matthews@oberon.umd.edu (Mike Matthews)<br />
Organization: /etc/organization</p>
<p>*NEVER* move something important.  Copy, VERIFY, and THEN delete.</p></blockquote>
<p>7</p>
<blockquote><p>From: almquist@chopin.udel.edu (Squish)<br />
Organization: Human Interface Technology Lab (on vacation)</p>
<p>When you are doing some BIG type the command and reread what you&#8217;ve typed<br />
about 100 times to make sure its sunk in (:</p></blockquote>
<p>8</p>
<blockquote><p>From: Nick Sayer <mrapple@quack.sac.ca.us></p>
<p>If / is full, du /dev.</p></blockquote>
<p>9</p>
<blockquote><p>From: TRIEMER@EAGLE.WESLEYAN.EDU<br />
Organization: Wesleyan College</p>
<p>Never ever assume that some prepackaged script that you are running does<br />
anything right.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-moral-of-these-stories/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Miscellaneous</title>
		<link>http://www.unixnewbie.org/sysadmin-stories-miscellaneous/</link>
		<comments>http://www.unixnewbie.org/sysadmin-stories-miscellaneous/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 23:25:21 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=529</guid>
		<description><![CDATA[Miscellaneous sysadmin stories...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: hirai@cc.swarthmore.edu (Eiji Hirai)<br />
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA</p>
<p>We were running a system software that had a serious bug where if anyone<br />
had logged out ungracefully, the system wouldn&#8217;t let any more users onto the<br />
system and users who were logged on couldn&#8217;t execute any new commands.  (The<br />
newest release of the software later on did fix this bug.) I had to reboot<br />
the machine to restore the system to a sane state.  I did a wall <<EOF We<br />
need to shutdown blah blah... EOF and then shutdown.  Well, I should've<br />
waited since at the precise moment, one of our users was doing a once-a-year<br />
massive conversion of our financial data (talk about bad luck).  I had<br />
shutdown in the middle of a very long disk write and thus, data was lost.<br />
We did recover that data and life went on.</p>
<p>Moral: make damn sure that *no one* is doing anything on your system before you<br />
reboot, even if other users are vociferously clamoring for you to reboot.</p></blockquote>
<p>2</p>
<blockquote><p>From: robjohn@ocdis01.UUCP (Contractor Bob Johnson)<br />
Organization: Tinker Air Force Base, Oklahoma</p>
<p>Management told us to email a security notice to every user on the our<br />
system (at that time, around 3000 users).  A certain novice administrator<br />
on our system wanted to do it, so I instructed them to extract a list of<br />
users from /etc/passwd, write a simple shell loop to do the job, and<br />
throw it in the background.  Here&#8217;s what they wrote (bourne shell)&#8230;</p>
<p>       for USER in `cat user.list`; do<br />
          mail $USER <message.text &#038;<br />
       done</p>
<p>Have you ever seen a load average of over 300 ???</p></blockquote>
<p>3</p>
<blockquote><p>From: Iain.Lea%anl433.uucp@Germany.EU.net (Iain Lea)<br />
Organization: ANL A433, Siemens AG., Germany.</p>
<p>I used to work at Siemens R&#038;D in Erlangen (33000 people out of 115000<br />
population work at Siemens &#8211; 12000 in the R&#038;D area).  We were working<br />
on a project porting an ISO FTAM implementation in Ada to C.</p>
<p>About 2 months into the project we received a new project leader who<br />
decided there were too few people working on the project (sigh!).<br />
Anyway we were promised that a &#8220;Spitzen Klasse&#8221; (Outstanding) SW guy<br />
was being sent over from the next lab.</p>
<p>The fateful day turned up (had to be a monday) and there was our very<br />
own &#8216;Einstein&#8217;. We gave him a tour of the lab (ie. Coffee machine on<br />
the left, laser on the right etc.) finally getting to out work area.<br />
We had a couple of fast 386&#8242;s (this happened in &#8217;89) running Xenix 386.<br />
We told Einstein that I was the sysadmin for both machines and that if<br />
*anything* was strange or not working to speak with me.  OK so the first<br />
morning went off without a hitch and we all went to get someting to eat<br />
around midday.  All except Einstein who said he wanted to check a few<br />
things out (Code practices we thought etc. &#8211; turned out to be Page 3 of<br />
that months playboy).</p>
<p>We came back from eating to find Einstein twiddling his thumbs and<br />
saying that he could no longer log in on either machine.  Ermmm&#8230;</p>
<p>I asked him if *anything* had happened while we were away.  He thought<br />
and thought and then said &#8220;Nothing really but the lights went out for<br />
a few minutes&#8221;.  OK I thought &#8220;fsck the disks, remount them and away<br />
we go&#8221; but then I stopped and asked him again &#8220;Anything else?&#8221;.  He<br />
then really started looking around and found the palms of his hand<br />
the most interesting thing he&#8217;d ever seen.  He answered &#8220;Well I know<br />
a little about Unix and fsck is the &#8216;ajax&#8217; cleaning program of Unix<br />
so when it started again after the lights came back on it started<br />
fsck and asked me for a scratchpad file.  I just took the one it<br />
printed on the line above!&#8221; (ie. the name of the filesystem to clean).</p>
<p>Another comment he made was &#8220;Must be a fast machine as fsck ran quick&#8221;.</p>
<p>Bad you might say until he told me he had done the same thing to our<br />
backup machine.</p>
<p>Needless to say Einstein &#038; our project leader exited stage left&#8230;</p>
<p>And we eventually got a backup tape from our data safe stored at<br />
another lab. The SW guy is kind of a living legend around here <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p></blockquote>
<p>4</p>
<blockquote><p>From: rca@Ingres.COM (Bob Arnold)<br />
Organization: Ask Computer Systems Inc., Ingres Division, Alameda CA 94501</p>
<p>Many moons ago, in my first sysadmin job, learning via &#8220;on-the-job<br />
training&#8221;, I was in charge of a UNIX box who&#8217;s user disk developed a<br />
bad block.  (Maybe you can see it already &#8230;)</p>
<p>The &#8220;format&#8221; man page seemed to indicate that it could repair bad<br />
blocks.  (Can you see it now?)  I read the man page very carefully.<br />
Nowhere did it indicate any kind of destructive behavior.</p>
<p>I was brave and bold, not to mention boneheaded, and formatted the user disk.<br />
Heh.</p>
<p>The good news:<br />
	1) The bad block was gone.<br />
	2) I was about to learn a lot real fast <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
The bad news:<br />
	1) The user data was gone too.<br />
	2) The users weren&#8217;t happy, to say the least.</p>
<p>Having recently made a full backup of the disk, I knew I was in for a<br />
miserable all day restore.  Why all day?  It took 8 hours to dump<br />
that disk to 40 floppies.  And I had incrementals (levels 1, 2, 3, 4,<br />
and 5, which were another sign of my novice state) to layer on top<br />
of the full.</p>
<p>Only it got worse.  The floppy drive had intermittent problems reading<br />
some of the floppies.  So I had to go back and retry to get the files<br />
which were missed on the first attempt.</p>
<p>This was also a port of Version 7 UNIX (like I said, this was many<br />
moons ago).  It had a program called &#8220;restor&#8221;, primordial ancestor of<br />
BSD&#8217;s &#8220;restore&#8221;.  If you used the &#8220;x&#8221; option to extract selected files<br />
(the ones missed on earlier attempts), &#8220;restor&#8221; would use the *inode<br />
number* as the name of the extracted files.  You had to move the<br />
extracted files to their correct locations yourself (the man page said<br />
to write a shellscript to do this <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> ).  I didn&#8217;t know much about shell<br />
scripts at the time, but I learned a lot more that week.</p>
<p>Yes, it took me a full week, including the weekend, maybe 120 hours or<br />
more, to get what I could (probably 95% of the data) off the backups.<br />
And there were a few ownership and permissions problems to be cleaned up<br />
after that.</p>
<p>Once burned twice shy.  This is the only truly catastrophic mistake I&#8217;ve<br />
ever made as a sysadmin, I&#8217;m glad to be able to say.</p>
<p>I kept a copy of my memo to the users after I had done what I could.<br />
Reading it over now is sobering indeed!  I also kept my extensive notes<br />
on the restore process &#8211; thank goodness I&#8217;ve never had to use them since.</p></blockquote>
<p>5</p>
<blockquote><p>From: jimh@pacdata.uucp (Jim Harkins)<br />
Organization: Pacific Data Products</p>
<p>A friend of mine admins an RS6000 for a state college.  The weekend before<br />
the fall semester started the Powers That Be decided to physically move the<br />
system to a different room.  She stayed late friday night, moved the machine,<br />
and then it wouldn&#8217;t boot.  I was in Sunday afternoon looking at it, wouldn&#8217;t<br />
boot for nothing.  Monday morning, first day of classes, an IBM rep comes in<br />
and reformats the hard disk without telling her.  Turns out this was the<br />
machine all the professors were doing their class plans on.  So not only<br />
couldn&#8217;t they have them printed out, but when school started monday morning<br />
the teachers discovered they had lost all the work they&#8217;d done in the week<br />
before school started.  Seems she never did backups because the teachers<br />
always bitched about how slow the system was when she did, and she hadn&#8217;t<br />
learned about cron yet (I told her about that one).</p>
<p>In her defense, she&#8217;d only been using the RS6000 for less than a month before<br />
this happened.  She didn&#8217;t know UNIX.  She hadn&#8217;t had any training.  She<br />
still had her regular job to do.</p>
<p>To make things worse, when she called me monday night she was in tears as<br />
she told me how she had to personally visit all the professors and tell them<br />
their work was gone.  I blurted out &#8220;Stupid of you not to make backups&#8221;.  Here<br />
she is looking for a shoulder to cry on and I go and tell her the same thing<br />
everybody from the department chair on down to the janitor had been saying.<br />
Oops.</p>
<p>The moral?  If you appoint someone to admin your machine you better be willing<br />
to train them.  If they&#8217;ve never had a hard disk crash on them you might want<br />
to ensure they understand hardware does stuff like that.  I also found out<br />
she was unplugging and plugging cables all over the place without powering<br />
down the system.  Her hardware knowledge was essentially &#8220;this thing goes into<br />
the wall, then the lights blink&#8221;.</p></blockquote>
<p>7</p>
<blockquote><p>From: rick@sadtler.com (Rick Morris)<br />
Organization: Sadtler Research Laboratories</p>
<p>Slightly off the subject, but not too far off, is the phenomenon of &#8220;Sysadmin<br />
Wannabees.&#8221;  I&#8217;ve been Sys Admin of UNIX at 3 sites now.  The phenomenon has<br />
occured at all three.</p>
<p>You are talking to a fellow programmer, or a programmer is within ear shot.<br />
A new user (or even an old user) comes up to you and asks something like:<br />
&#8220;How would I list only directory files within a directory?&#8221;</p>
<p>Now it has been my experience that the question is not complete.  Is this a<br />
recursive list?  Is this a &#8220;one-time&#8221; thing, or are you going to do it many<br />
times?  Is it part of a program?  (Sometimes questions like this end up as<br />
an answer to a C question executed as a system(3) call rather than a preferred<br />
library call.)  Anyway, as you ponder the question, the many alternatives (in<br />
unix there&#8217;s always another way), the questioner&#8217;s experience, whether or not<br />
they want a techie answer or a DOSie answer, the programmer within ear shot<br />
pipes in with an answer of how *THEY* do or would do it.</p>
<p>It is invariable.  It happens every time.  I don&#8217;t think I take all that<br />
long to answer.  But the Wannabee answer is rapid.  Like the kid in class<br />
who raises his hand going &#8220;oo&#8221; &#8220;oo&#8221; &#8220;oo&#8221;.</p>
<p>I have seen my predicessors get all bent out of shape when the Sysadmin<br />
Wannabees jump on their toes.  I usually let the answer proceed, indeed,<br />
often these Wannabees give a complete answer, even doing it for the<br />
questioner.  After a bit I return to the questioner and ask if the question<br />
was properly answered, if they understand the answer, or if they want any<br />
more information.  It also shows me how deeply the Wannabee understands<br />
just what is going on inside that pizza box.</p>
<p>Have any other of you sys admins seen this phenomenon, or is it my slow<br />
pondering of potential answers that drives the Wannabee to jump in?</p></blockquote>
<p>8</p>
<blockquote><p>From: rslade@cue.bc.ca (Rob Slade)<br />
Organization: Computer Using Educators of B.C., Canada</p>
<p>I had a job one time teaching Pascal at a &#8220;visa school&#8221;.  The machine was a<br />
multi-user micro that ran UNIX.  I have enough stories from that one course<br />
to keep a group of computer educators in stitches for at least half an hour.</p>
<p>The finale of the course was on the last day of classes.  When I showed up<br />
and powered up the system, it refused to boot.  Since all the students&#8217; term<br />
projects and papers were in the computer, it was fairly important.  After<br />
a few hours of work, and consultation with the other teacher, who did the<br />
sysadmin and maintenance, we were finally informed that the new admin<br />
assistant around the place had decided that the layout of the computer lab<br />
was unsuitable.  (I had noticed that all the desk were repositioned: I thought<br />
the other teacher had done it, he thought I had.)  The AA had, the night<br />
before, moved all the furniture, including the terminals and the micro.  She<br />
did not know anything about parking hard disks.</p>
<p>We knew now, that we were in trouble, but we didn&#8217;t realize how much until<br />
we started reading up on emergency procedures.  For some unknown reason,<br />
booting the micro from the original system disks would automatically reformat<br />
the hard disk.</p>
<p>(The visa school refunded the tuition for all the students in that course.)</p></blockquote>
<p>9</p>
<blockquote><p>From: corwin@ensta.ensta.fr (Gilles Gravier)<br />
Organization: ENSTA, Paris, France</p>
<p>I am sysadmin at my office&#8230; I won&#8217;t name it, because that&#8217;s not<br />
the subject&#8230; Of course, UNIX is my cup of tea&#8230; But, at home, I have an<br />
MS DOS machine&#8230; As old habits die hard, I have set up MKS toolkit on my home<br />
PC&#8230; And, as I have a C:\TMP directory where Windows and other applications<br />
put stuff, that remains, as I sometimes have to reboot fast&#8230; (ah, the fun<br />
of developping at home!)&#8230; So, in my AUTOEXEC.BAT file, I have the following:<br />
rm -rf /tmp<br />
mkdir c:\tmp<br />
the recursive rm comming from MKS, and mkdir from horrible MSDOS.</p>
<p>At the time, I didn&#8217;t have a tape streamer on my pc&#8230; I was working,<br />
and the mains waint down&#8230; so did the PC.   Windows was running, \TMP full<br />
of stuff&#8230; So, when powers comes back on, rm -rf /tmp has things to do&#8230;<br />
While it&#8217;s doing those things, power goes down again (there was  a storm).<br />
Power comes back up, and this time, it seems that the autoexec takes really<br />
too much time&#8230; So, I control C it&#8230; And, to my horror, realize that I don&#8217;t<br />
have anymore C:\DOS C:\BIN C:\USR and that my C:\WINDOWS was quite depleted&#8230;</p>
<p>	After some investigation, unsuccesfull, I did the following: cd \tmp<br />
and then DIR&#8230; And there, in C:\TMP, I find my C:\ files! The first power<br />
down had resulted in the cluster number of C:\ being copied to that of C:\TMP,<br />
actually resulting in a LINK! (Now, this isn&#8217;t suppose to happen under MSDOS!)<br />
I had to patch in the DIRECTORY cluster to change TMP&#8217;s name replacing the<br />
first T by the letter Sigma, so that DOS tought that TMP wasn&#8217;t there anymore,<br />
then do an chkdsk /F, and then undelete the files that I could&#8230; And rebuild<br />
the rest&#8230;</p></blockquote>
<p>10</p>
<blockquote><p>From: gert@greenie.gold.sub.org (Gert Doering)</p>
<p>I was on a 5 days vacation, the first day my machine crashed&#8230;</p>
<p>How?  Well&#8230;</p>
<p>cron started a shell-skript to extract some files from a &#8220;.lzh&#8221;-Archive.<br />
LHarc found that the target file already existed, asked</p>
<p>&#8220;file <foo> exists, overwrite (y/n)?&#8221;</p>
<p>&#8230; since it was started from cron, it just read &#8220;EOF&#8221;.  Tried again.  Read<br />
&#8220;EOF&#8221;.  And so on.</p>
<p>All output went to /tmp&#8230; what was full after the file reached 90 MB!<br />
What happened next?  I&#8217;m using a SCO machine, /tmp is in my root filesystem<br />
and when trying to login, the machine said something about being not able<br />
to write loggin informations &#8211; and threw me out again.</p>
<p>Switched machine off.</p>
<p>Power on, go to single user mode.  Tried to login &#8211; immediately thrown out<br />
again.</p>
<p>I finally managed to repair the mess by booting from Floppy disk, mounting<br />
(and fsck-ing) the root filesystem and cleaning /tmp/*</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/sysadmin-stories-miscellaneous/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Depends on the machine</title>
		<link>http://www.unixnewbie.org/ss-depends-on-the-machine/</link>
		<comments>http://www.unixnewbie.org/ss-depends-on-the-machine/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 23:21:44 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=527</guid>
		<description><![CDATA[Depends on the machine...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: kochmar@sei.cmu.edu (John Kochmar)<br />
Organization: The Software Engineering Institute</p>
<p>A long time ago, back when the Apollo 460 was around and I had just<br />
graduated from college, I had the good fortune of being one of two<br />
adminstrators in charge of making a cluster of 460&#8242;s a part of our<br />
environment.  One of the things I was tasked with was geting them onto<br />
our network.</p>
<p>Well, I was young, I had the manuals, and a guy from Apollo tech<br />
support was there to help.  How hard could it be, right?</p>
<p>Well, we got out the manuals, configured the system (relying heavily on<br />
the defaults), and within 2 hours, we had that puppy on the network.<br />
Life was good.</p>
<p>About 3 hours later, I get a phone call from a systems programmer /<br />
developer from CMU campus (the SEI is a part of CMU, and we are on their<br />
network.)  He told me that if I didn&#8217;t take the &#038;%@*ing Apollo off the<br />
network, he was going to do hurtful things to me physically.<br />
Life was not so good.</p>
<p>As it turned out, in default mode, the Apollo answered every address<br />
request it saw, even if it is not the machine the request was for.<br />
Kind of a &#8220;hey, I&#8217;m not who you are looking for, but I&#8217;m out here in<br />
case you decide you&#8217;d rather talk to me.&#8221;  Apollo considered this a<br />
feature, and they took advantage of it in their OS environment.</p>
<p>However, one of the earlier versions of a heavily network dependant OS<br />
developed at CMU considered this a bug.  The OS would issue a request,<br />
and expect only the machine it was looking for to answer it.  Of<br />
course, it would assume that if it got an answer to its request, it<br />
must be the machine it expected to talk to.  It didn&#8217;t look at the<br />
address of the answer it got, so if it wasn&#8217;t the correct machine, most<br />
of the time the OS would hang or panic.</p>
<p>The outcome?  Over about 3 hours time, more and more of campus was<br />
talking to our little 460, which had just enough muscle to keep up with<br />
the requests.  By the time campus figured out what was going on, we had<br />
an Apollo merrily answering the network requests for hundreds of<br />
machines (the ones that were still up, that is.)  This caused the part<br />
of campus who used the new OS going to hell in a bucket, one very busy<br />
Apollo 460, and one very warm ethernet.</p>
<p>Well, we turned off the Apollo, configured it not to chat to all of<br />
campus before putting it back on the ethernet (this time, we did it<br />
while talking with campus, making sure we didn&#8217;t cause the same<br />
problems we did the last time &#8212; we didn&#8217;t have a packet monitor at the<br />
time), and campus changed their OS to look at the request response<br />
before assuming it was the correct one.  I also learned to think very<br />
carefully about default values before using them.</p></blockquote>
<p>2</p>
<blockquote><p>From: dinicola@itnux2.cineca.it (Attilio Dinicola)<br />
Organization: Laboratorio di Fisica Computazionale, INFM. Trento Italia</p>
<p>I was mor&#8217;ing somethin at the system console, ultrix os under me!</p>
<p>I wanted to press a ^L and, unfortunately, the nearest ^P suspended</p>
<p>system activities: a console mode prompt appeared.</p>
<p>So, I pressed:<br />
        res<br />
Thinking .. resume .. but res became restart and the system<br />
rebooted destroying all processes.</p>
<p>Naturally, Murphy was in front of me and some batch jobs were<br />
running since four or five days before. WERE .. RUNNING!</p></blockquote>
<p>3</p>
<blockquote><p>From: sam@bsu-cs.bsu.edu (B. Samuel Blanchard)<br />
Organization: Dept. of CS Ball State University Muncie IN</p>
<p>kill -1 1  on an Altos SV box is not good.  I pulled this one trying to show<br />
off.  No more gettys appeared when uses logged off.  When I went to the console,<br />
I calmly typed 0 to the Run Level request prompt.  2 would have been nice?<br />
It was my first SystemV like box, and it seemed to have such nice berkley<br />
commands.</p>
<p>A control-s on a Sequent S27 console can cause processes to hang waiting to<br />
write to the console.  Unfortunatly, su is one such process.  No real problem<br />
since I don&#8217;t blindly reboot on request  <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-depends-on-the-machine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: All about file permissions&#8230;</title>
		<link>http://www.unixnewbie.org/ss-all-about-file-permissions/</link>
		<comments>http://www.unixnewbie.org/ss-all-about-file-permissions/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 23:18:49 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=523</guid>
		<description><![CDATA[All about file permissions...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: jdell@maggie.mit.edu (John Ellithorpe)<br />
Organization: Massachusetts Institute of Technology</p>
<p>Here&#8217;s a pretty bad story.  I wanted to have root use tcsh instead of the<br />
Bourne shell.  So I decided to copy tcsh to /usr/local/bin.  I created the<br />
file, /etc/shells, and put in /usr/local/bin/tcsh, along with /bin/sh and<br />
/bin/csh.</p>
<p>All seems fine, so I used the chsh command and changed root&#8217;s shell to<br />
/usr/local/bin/tcsh.  So I logged out and tried to log back in.  Only to find<br />
out that I couldn&#8217;t get back in.  Every time I tried to log in, I only got<br />
the statement: /usr/local/bin/tcsh: permission denied!</p>
<p>I instantly realized what I had done.  I forgot to check that tcsh has<br />
execute privileges and I couldn&#8217;t get in as root!</p>
<p>After about 30 minutes of getting mad at myself, I finally figured out to just<br />
bring the system down to single-user mode, which ONLY uses the /bin/sh,<br />
thankfully, and edited the password file back to /bin/sh.</p></blockquote>
<p>2</p>
<blockquote><p>From: djd@csg.cs.reading.ac.uk (David J Dawkins)<br />
Organization: University of Reading</p>
<p>About a year back, I was looking through /etc and found that a few<br />
system files had world write permission.  Gasping with horror, I went<br />
to put it right with something like</p>
<p>dipshit# chmod -r 664 /etc/*</p>
<p>(I know, I know, goddamnit!.. now)</p>
<p>Everything was OK for about two to three weeks, then the machine went<br />
down for some reason (other than the obvious).  Well, I expect that you<br />
can imagine the result.  The booting procedure was unable to run fsck,<br />
so barfed and mounted the file systems read-only, and bunged me into<br />
single-user mode. Dumb expression..gradual realisation..cold sweat. Of<br />
course, now I can&#8217;t do a frigging chmod +x on anything because it&#8217;s all<br />
read-only. In fact I can&#8217;t run anything that isn&#8217;t part of sh.<br />
Wedgerama. Hysteria time. Consider reformatting disks. All sorts of<br />
crap ideas. Headless chicken scene. Confession.</p>
<p>&#8220;You did WHAT??!!&#8221;</p>
<p>Much forehead slapping, solemn oaths and floor pacing.</p>
<p>Luckily, we have a local MegaUnixGenius who, having sat puzzled for an hour<br />
or more, decided to boot from a cdrom and take things from there. He fixed<br />
it.</p>
<p>My boss, totally amazed at the fix I&#8217;d got the system into, luckily<br />
saw the funny side of it.  I didn&#8217;t.  Even though at that stage, I didn&#8217;t<br />
know much about unix/suns/booting/admin, I did actually know enough to NOT<br />
use a command like the one above. Don&#8217;t ask. Must be the drugs.</p>
<p>BTW, if my future employer _is_ reading this (like they say he/she might),<br />
then I have certainly learned tonnes of stuff in the last year, especially<br />
having had to set up a complete Sun system, fix local problems, etc <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Anyone else got a tale of SGS (Spontaneous Gross Stupidity) ?</p></blockquote>
<p>3</p>
<blockquote><p>From: mfraioli@grebyn.com (Marc Fraioli)<br />
Organization: Grebyn Timesharing</p>
<p> I was happily churning along developing something on a Sun workstation,<br />
and was getting a number of annoying permission denieds from trying to<br />
write into a directory heirarchy that I didn&#8217;t own.  Getting tired of<br />
that, I decided to set the permissions on that subtree to 777 while I<br />
was working, so I wouldn&#8217;t have to worry about it.  Someone had recently<br />
told me that rather than using plain &#8220;su&#8221;, it was good to use &#8220;su -&#8221;,<br />
but the implications had not yet sunk in.  (You can probably see where<br />
this is going already, but I&#8217;ll go to the bitter end.)  Anyway, I cd&#8217;d<br />
to where I wanted to be, the top of my subtree, and did su -.  Then I<br />
did chmod -R 777.  I then started to wonder why it was taking so damn<br />
long when there were only about 45 files in 20 directories under where I<br />
(thought) I was.  Well, needless to say, su &#8211; simulates a real login,<br />
and had put me into root&#8217;s home directory, /, so I was proceeding to set<br />
file permissions for the whole system to wide open. I aborted it before<br />
it finished, realizing that something was wrong, but this took quite a<br />
while to straighten out.</p></blockquote>
<p>4</p>
<blockquote><p>From: jerry@incc.com (Jerry Rocteur)<br />
Organization: InCC.com Perwez Belgium</p>
<p>I sent one of my support guys to do an Oracle update in Madrid.</p>
<p>As instructed he created a new user called esf and changed the files<br />
in /u/appl to owner esf, however in doing so he *must* have cocked up<br />
his find command, the command was:</p>
<p>find /u/appl -user appl -exec chown esf {} \;</p>
<p>He rang me up to tell me there was a problem, I logged in via x25 and<br />
about 75% of files on system belonged to owner esf.</p>
<p>VERY little worked on system.</p>
<p>What a mess, it took me a while and I came up with a brain wave to<br />
fix it but it really screwed up the system.</p>
<p>Moral: be *very* careful of find execs, get the syntax right!!!</p></blockquote>
<p>5</p>
<blockquote><p>From: weave@bach.udel.edu (Ken Weaverling)<br />
Organization: University of Delaware</p>
<p>A friend of mine called me up saying he no longer could log into his<br />
system. I asked him what he had done recently, and found out that he<br />
thought that all executable programs in /bin /usr/bin /etc and so on<br />
should be owned by bin, since they were all binaries! So he had<br />
chown&#8217;ed them all.</p></blockquote>
<p>6</p>
<blockquote><p>From: rob@wzv.win.tue.nl (Rob J. Nauta)<br />
Organization: None</p>
<p>At my previous employer, the sysadmin would create new user accounts by<br />
hand by editing the passwd file, create a home dir, put some files in<br />
it, and chown &#8216;*&#8217; and &#8216;.*&#8217; to that new user.  Thus, /home/machine<br />
was also chowned (&#8216;.*&#8217; also matches &#8216;..&#8217;).  It was quite handy to see<br />
who was added last, but after a while I slipped him the hint to<br />
chown &#8216;.[a-z]*&#8217; which works much better of course.</p>
<p>But the stories told now are more folklore than real horror.  Having read<br />
2 Stephen Kings this weekend I beg everyone to tell more interesting<br />
stories, about demons, the system clock running backwards, old files<br />
reappearing etc !</p></blockquote>
<p>7</p>
<blockquote><p>From: alan@spuddy.uucp (Alan Saunders)<br />
Organization: Spuddy&#8217;s Public Usenet Domain</p>
<p>About inexperienced sysadmins .. One such had been on a Sun syasadmin<br />
course, and learned all about security.  One of the topics was on file<br />
and group access.  On his return, he decided to put what he had learned<br />
into practice, and changed the ownership of all files in /bin, /usr/bin<br />
to bin.bin!  I was called in when no one could log in to the system<br />
(of course /bin/login needs to be setuid root!)</p></blockquote>
<p>8</p>
<blockquote><p>From: pete@tecc.co.uk (Pete Bentley)<br />
Organization: T.E.C.C. Ltd, London, England</p>
<p>The guys next door had just got a Sun 3/360 (or some such) to host a<br />
VME-bus image processing system &#8211; none of them knew much (or cared<br />
much) about Un*x and so early on a student on loan to them got a<br />
space in the wrong place and did<br />
  pillock# chmod -r -x ~ /*<br />
with the same results (system in single user, refusing to run any commands<br />
or go multi-user).</p>
<p>As it happened<br />
a) This was a government establishment, and so the order for the QIC tapes<br />
   for backups had not yet been approved, hence no backups&#8230;<br />
b) The install script for the kernel drivers for the image processing stuff<br />
   had not worked &#8216;out of the box&#8217;, and so the company had sent an<br />
   engineer down to install it.  I hadn&#8217;t been around when he came and<br />
   built their drivers, and they hadn&#8217;t a clue what he had done.  So,<br />
   there was no way to rebuild the drivers without another engineer call<br />
   and because of (a) there were no backups of the driver&#8230;Anyway, a complete<br />
   reload was therefore out of the question.</p>
<p>These were the days before SunOS on CD-ROM.  In the end I managed to get<br />
the thing up by booting from tape, installing the miniroot into the swap<br />
partition and booting from that.  This gave me a working tar and a working<br />
mount, but no chmod.  Also no mt command.  Also at this time very little<br />
of my Un*x experience was on Suns, so I had no idea of the layout of the<br />
distribution tape.  Various experiments with dd and the non-rewinding tape<br />
device eventually found the file on the tape with a chmod I could extract.<br />
chmod +x /etc/* /bin/* /usr/bin/* on the system&#8217;s existing disk was enough<br />
to make it bootable.  After that I sat the student down with a SunOS manual<br />
and let him figure out the mess and correct the permissions that had been<br />
todged all over the system&#8230;</p></blockquote>
<p>9</p>
<blockquote><p>From: dvsc-a@minster.york.ac.uk<br />
Organization: Department of Computer Science, University of York, England</p>
<p>I was changing the UIDs of a few users on one of our major servers, due to<br />
a clash with some machines newly connected to the net. Fine, edit<br />
/etc/passwd then chown all their files to the new UID. So, rather than just<br />
assume that all files owned by &#8220;fred&#8221; live in /home/machine/fred I did this:</p>
<p>machine# find / -user old_uid -exec chown username {} \;</p>
<p>This was fine&#8230; except it was late at night and I was tired, and in a hurry<br />
to get home. I had six of these commands to type, and as they would take a<br />
long time I&#8217;d just let them run in the background over night&#8230;..</p>
<p>So, you come in the next morning and a user compains&#8230; I can&#8217;t login to the<br />
4/490 &#8211; it says &#8220;/bin/login: setgid: not owner&#8221;.</p>
<p>Okay&#8230;. naive user problem no?</p>
<p>rlogin machine -l root<br />
/bin/login: setgid: not owner</p>
<p>machine console<br />
login: root<br />
/bin/login: setgid: not owner</p>
<p>Okay &#8211; I REALLY can&#8217;t get in&#8230; lets reboot single user and see whats on&#8230;<br />
this worked. /bin/login is owned (and setuid to) one of the users whos UID<br />
I changed the previous day&#8230; infact ALL FILES in the ENTIRE filesystem are<br />
owned by this user..problem!</p>
<p>We `only&#8217; lost about 200 man hours through my little typing mistake.  The<br />
moral: Beware anything recursive when logged in as root!</p></blockquote>
<p>10</p>
<blockquote><p>From: joslin_paul@ae.ge.com<br />
Organization: GE Aircraft Engines</p>
<p>True confession time: Cron is a great way to hide your flubs.  I installed<br />
the COPS security package on a system, then set up cron to recheck the<br />
system once a month.  No problem, right?  Except that I had configured COPS<br />
to put the reports in /.  As a security measure, COPS chmods its directory<br />
to u-rwx,w-rwx so that only the COPS owner can read the reports.</p>
<p>The chronology was</p>
<p>1) Run cops.  Add cops entry to root&#8217;s crontab.  Later that day, notice<br />
that / was 600; change it back.</p>
<p>2) 30 days later: get calls from users &#8211; can&#8217;t log in, &#8220;No shell&#8221; error<br />
messages.  Find / is 600; change it.  Vaguely remember that this<br />
happened once before.  The machine was a sandbox, so almost anything<br />
could have changed /.</p>
<p>3) 30 days later: get calls from users &#8211; can&#8217;t log in, &#8220;No shell&#8221; error<br />
messages.  Find / is 600; change it.  Vaguely remember that this<br />
happened once before.  Happen to think &#8220;cron&#8221;; notice that the only cron<br />
activity for root last night was COPS.  Read COPS source and discover<br />
problem.</p>
<p>Moral: RTFM.  Keep logs, so that you can notice patterns in your data.<br />
Don&#8217;t do anything as root that you can do as a mortal.</p></blockquote>
<p>11</p>
<blockquote><p>From: johnd@cortex.physiol.su.oz.au (John Dodson)<br />
Organization: Department of Physiology, University of Sydney, NSW, Australia</p>
<p>Some years ago when we went from Version 7 Unix on a PDP11 to a flavour of BSD<br />
on a Vax, I was working on the Vax in my home directory &#038; came across a file<br />
that I had no permission on (I&#8217;d created it as root) so the following ensued&#8230;</p>
<p>        $ /bin/su -<br />
        Password:<br />
        # chown -R me *</p>
<p>        mmmmm this seems to be taking a long time !<br />
        kill.<br />
        # ls -l</p>
<p>        the result was that I was in / after the su !<br />
        (good old V7 su used to leave you in the current directory <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>It took me quite a while to restore all the right ownerships to /bin /etc &#038;<br />
/dev (especially the suid/sgid files)<br />
I&#8217;d managed to kill it before it got off the root filesystem.</p></blockquote>
<p>12</p>
<blockquote><p>From: adb@geac.com (Anthony DeBoer)<br />
Organization: Geac Computer Corporation</p>
<p>I was once called in to save a system where most things worked, but the<br />
main application package being used on it hung the moment you entered it<br />
(leaving the system more than a little useless for getting things done).<br />
I poked around for awhile, verified that the application&#8217;s files were all<br />
present, undamaged, and had the right permissions.  The folks who<br />
normally used the machine had also discovered that all was well if root<br />
tried to run it.  But nothing was visibly wrong anywhere.  So, being a<br />
bit hungry by then, I took a break for supper, and about halfway through,<br />
the little voice at the back of my head that sometimes helps me said,<br />
&#8220;/dev/tty&#8221;.  Sure enough, somebody had chmod&#8217;ded it to 0644, and the<br />
application directed (or tried to direct, in this case) all its I/O<br />
through it rather than just using stdin/stdout like a sane normal process.</p></blockquote>
<p>13</p>
<blockquote><p>From: mike@sojurn.lns.pa.us (Mike Sangrey)</p>
<p>To set the stage:<br />
   We used the csh.<br />
   We were fairly new to Unix.<br />
   We were developing a fairly eloborate system in &#8220;C&#8221;.</p>
<p>We made some fairly harmless (most of the time) mistakes:<br />
   We had &#8220;.&#8221; (dot) in root&#8217;s PATH. (Yeah, I know, so sue me.)</p>
<p>We had the forsight to set up a pseudo-user for our package.  Certain<br />
of these programs were to run setuid as the pseudo-user others weren&#8217;t<br />
setuid and were to be only run as that psuedo-user.  You know the<br />
scenario.  The problem was that sometimes during development, one of<br />
us didn&#8217;t have the permission to execute a program.  We frequently<br />
fell into executing things as root.  One particularly frustrating day<br />
we did something even more stupid:</p>
<p>    chmod 777 *.</p>
<p>Then, just to make sure (of how stupid we can be) we flipped to a<br />
virtual terminal that was su&#8217;ed to root.  The next command, which used<br />
the csh&#8217;s history mechanism, executed a &#8220;C&#8221; program &#8212; NOT the<br />
executable, mind you, the source.  Believe it or not, the end effect<br />
was the same as</p>
<p>    cd /<br />
    rm -fr *</p>
<p>Sort of reminds me of the story of a hurricane, a junk yard and the<br />
creation of a 747.  Who&#8217;d a thunk it?!!</p>
<p>Take some inexperienced people and a powerful system; add profuse<br />
doses of frustration and wha-la!  &#8212; You have a Stephen King shell<br />
script.</p></blockquote>
<p>14</p>
<blockquote><p>From: mba@controls.ccd.harris.com (Belinda Asbell)<br />
Organization: Harris Controls</p>
<p>In article <Bw40Gz.Kw8@cen.ex.ac.uk>, JRowe@cen.ex.ac.uk (J.Rowe) writes:<br />
>> Am I the only one to have mangled a root shell?</p>
<p>Probably not.  I learned the hard way to be careful if messing with<br />
/etc/passwd.  One day, for some reason, I couldn&#8217;t login as root (pretty<br />
scary, since I knew the root passwd and hadn&#8217;t changed it).</p>
<p>Turned out that somehow I&#8217;d blitzed the first letter of /etc/passwd somehow<br />
(vi does bizarre things sometimes).  So I logged in as &#8216;oot&#8217; and fixed it.</p>
<p>NEVER do a &#8220;chmod -R u-s .&#8221;, especially not in /usr&#8230;.</p>
<p>I think that &#8220;mount -o&#8221; or something similar will mount a filesystem read-write<br />
if it&#8217;s come up in singleuser mode and is mounted read-only&#8230;..</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-all-about-file-permissions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Upgrading the system</title>
		<link>http://www.unixnewbie.org/ss-upgrading-the-system/</link>
		<comments>http://www.unixnewbie.org/ss-upgrading-the-system/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 23:05:25 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=521</guid>
		<description><![CDATA[Upgrading the systems...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: rsj@wa4mei (Randy Jarrett)<br />
Organization: Amateur Radio Gateway WA4MEI, Chamblee, GA</p>
<p>Here&#8217;s one that will show that you shouldn&#8217;t work on a system<br />
that you don&#8217;t thourghly understand.</p>
<p>At my &#8220;previous&#8221; employer I was instructed to install a new<br />
(larger) disk drive in a RS/6000 system.  Since a full backup<br />
of the system was done the previous day I just looked at the file<br />
systems vi a df to see which were on the drive that I was replacing.<br />
After this I did a tape backup of these filesystems, ran smit and<br />
did a remove of these filesystems.  I then installed the new disk<br />
and brought the system back up.  When I ran smit and when I was able<br />
to do the installation of the new drive and setup the file systems<br />
I was figuring that this was going to be an easy one.  WRONG!!  I was<br />
aware that you could expand filesystems under AIX but was not aware<br />
that it would expand them &#8216;across physical drives&#8217;!!!  I first<br />
realized that I was in trouble when I went to read in the backup tape<br />
and cpio was not found.  I did an ls of the /usr/bin directory and it<br />
said that the file was there but when I tried to run it it was not<br />
found.  And of course when I went looking for the original install tape<br />
it was not to be found&#8230;.</p></blockquote>
<p>2</p>
<blockquote><p>From: matthews@oberon.umd.edu (Mike Matthews)<br />
Organization: /etc/organization</p>
<p>When I had first gotten my NeXTstation, it had the lil&#8217; 105M hard drive in<br />
it.  I had a 330M external, but alas, no cable for it.  (Life was not fun<br />
when I was essentially netbooting off a &#8220;test&#8221; machine&#8230;. &#8220;.. um, guys, did<br />
you just reboot is-next?&#8221;)</p>
<p>Finally got the cable, just in time for the winter holiday (read: no<br />
network).  Brought the machine home, and I figured I&#8217;d just copy the<br />
configuration files over from the internal to the external (as a nice gesture<br />
to my users so they wouldn&#8217;t have to change their passwords and everything).</p>
<p>The external was a brand new BuildDisk&#8217;d disk (had stock NeXTstep on it).<br />
NeXT keeps the private information of each machine (/dev, /etc, stuff like<br />
that) in a /private directory to make netbooting easier.</p>
<p>Hey, I&#8217;ll just move /private from the 105M to /private on the external.  So I</p>
<p>deleted the external&#8217;s /private and tried to move it via the workspace.</p>
<p>/dev is in /private.</p>
<p>/dev contains device files.  Can&#8217;t move them.</p>
<p>BUT.  The workspace happily deleted all the files it DID copy, so the<br />
internal couldn&#8217;t boot (no /etc) and the external couldn&#8217;t boot (no /dev).<br />
This is before the advent of boot floppies so I was stuck for about a week at<br />
home with $5000 of NeXT computer that I couldn&#8217;t boot.</p>
<p>The moral?  *NEVER* move something important.  Copy, VERIFY, and THEN delete.</p></blockquote>
<p>3</p>
<blockquote><p>From: grog@lemis.uucp (Greg Lehey)<br />
Organization: LEMIS, W-6324 Feldatal, Germany</p>
<p>I&#8217;m currently trying to work out how ISC Unix/386 handles COFF files, and<br />
discovered the /shlib directory, which I suspected wasn&#8217;t really used<br />
(*wrong*). So, to try it out, I did:</p>
<p>+ root adagio:/ 819 -> mv shlib slob<br />
+ root adagio:/ 820 -> xterm<br />
+ /usr/bin/X11/xterm: Can not access a needed shared library</p>
<p>So far, so good. So, put it back:</p>
<p>+ root adagio:/ 821 -> mv slob shlib<br />
+ /bin/mv: Can not access a needed shared library</p>
<p>Oops! So, tried it from a different system, but didn&#8217;t have<br />
permission, so:</p>
<p>+ root adagio:/ 822 -> chmod 777 slob<br />
+ /bin/chmod: Can not access a needed shared library</p>
<p>OK, so let&#8217;s just cp them across.</p>
<p>+ root adagio:/ 823 -> cd slob<br />
+ root adagio:/slob 824 -> mkdir /shlib<br />
+ /bin/mkdir: Can not access a needed shared library<br />
+ root adagio:/slob 825 -></p>
<p>Then I wrote a program which just did a link(2) of the directories.<br />
Yes, gcc and ld didn&#8217;t have any problems, but even after the link was<br />
in place, it still didn&#8217;t work. I had to reboot (but nothing else),<br />
after which it did work. No idea why that made any difference.</p></blockquote>
<p>4</p>
<blockquote><p>From: erik@src4src.linet.org (Erik VanRiper)<br />
Organization: The Source for Source</p>
<p>I run on a 386/25.  Small system, 4 inbound lines, etc.  I was installing a<br />
new SCSI drive to complement my 2 MFM&#8217;s.  Took me forever to get everything<br />
just right.  Things finally worked, so I figured I would shutdown and play<br />
with the jumper settings to see what this thing could do.  What did I do?<br />
Well, I just turned off the power, that&#8217;s all.</p>
<p>erk.  Just rebuilt the kernal, did not do a haltsys, or a shutdown, or anything.<br />
Just shut the power off.  ARGH!  Took me 3 weeks to clean up the mess.</p>
<p>You tend to get in this cycle of &#8220;try&#8221; &#8220;haltsys&#8221; &#8220;power off&#8221; &#8220;change jumpers&#8221;<br />
&#8220;power on&#8221; &#8220;try&#8221;.  Well, once everything worked, I guess I was a wee bit<br />
excited and forgot a step.  <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p></blockquote>
<p>5</p>
<blockquote><p>From: almquist@chopin.udel.edu (Squish)<br />
Organization: Human Interface Technology Lab (on vacation)</p>
<p>Two miserable flubs:</p>
<p>1) /etc/rc cleans tmp but it wasn&#8217;t cleaning up directories so I changed<br />
the line:<br />
  (cd /tmp; rm -f &#8211; *)<br />
to<br />
  (cd /tmp; rm -f -r &#8211; *; rm -f -r &#8211; .*)</p>
<p>About 15 minutes later I had wiped out the hard drive.</p>
<p>2) One of the user discs got filled so I needed to move everyone over to<br />
the new disc partition.  So, I used the tar to tar command and flubbed:</p>
<p>cd /user1; tar cf &#8211; . | (cd /user1; tar xfBp &#8211; )</p>
<p>Next thing I know /user1 is coming up with lots of weird consistency errors and<br />
other such nonsense.  I meant to type /user2 not /user1.  OOOPS!</p>
<p>My moral of the story is when you are doing some BIG type the command and<br />
reread what you&#8217;ve typed about 100 times to make sure its sunk in (:</p></blockquote>
<p>6</p>
<blockquote><p>From: anne@maxwell.concordia.ca (Anne Bennett)<br />
Organization: Concordia University, Montreal, Canada</p>
<p>After about four months as a Unix sysadm, and still feeling rather like a<br />
novice, I was asked to &#8220;upgrade&#8221; a Sun lab (3/280 server and ten 3/50<br />
diskless clients) from SunOS 4.0.3 to 4.1 &#8212; of course, this &#8220;upgrade&#8221; was<br />
actually a complete re-install.</p>
<p>Well, the server had no tape drive, not even any SCSI controller.  There<br />
were no other machines on its subnet other than the clients, so I had no<br />
boothost (at that time, I did not know that the routers could be<br />
reconfigured to pass the appropriate rarp packets, nor do I think our<br />
network people would have taken kindly to such a hack!).  The clients did<br />
have SCSI controllers, but I had no portable tape drive.  Luckily, I had<br />
a portable disk.</p>
<p>So, with great trepidation (remember, I was still a novice), I set up<br />
one of the clients, with the spare disk, to be a boothost.  I booted<br />
the server off the client and read the miniroot from a tape on a remote<br />
machine, and copied it to the server&#8217;s swap partition.  Then I manually<br />
booted the miniroot on the server by booting off the temporary boothost<br />
with the appropriate options, and specified the server&#8217;s swap partition<br />
as containing the kernel to be loaded.  Once in the miniroot, I started<br />
up routed to permit me to reach the tapehost, and finally invoked<br />
suninstall.  From then on, it worked like a charm.</p>
<p>Needless to say, I was extremely pleased with myself for figuring all of<br />
this out.  I then settled down to do the &#8220;easy stuff&#8221;, and got around to<br />
configuring NIS (Yellow Pages).  I decided to get rid of everything I<br />
didn&#8217;t need, under the assumption that a smaller system is easier to<br />
understand and keep track of.  The Sun System and Network Administration<br />
Manual, which is in many ways an admirable tome, had on page 476 a<br />
section on &#8220;Preparing Files on NIS Clients&#8221;, which said:</p>
<p>   &#8220;Note that the files networks, protocols, ethers, and services need<br />
    not be present on any NIS clients.  However, if a client will on<br />
    occasion not run NIS, make sure that the above mentioned files do<br />
    have valid data in them.&#8221;</p>
<p>So I removed them.  Several hours later, when I had finished configuring<br />
the server to my satisfaction, reloading the user files, etc., I finally<br />
got around to booting up the clients.  Well, I *tried* to boot up the<br />
clients, but got the strangest errors: the clients loaded their<br />
kernels and mounted /, but failed trying to mount /usr with the message<br />
&#8220;server not responding. RPC: Unknown protocol&#8221;.  I was mystified. I tried<br />
putting back the generic kernels on server and clients, several different<br />
ifconfig values for the ethernet interfaces, enabling mountd and rexd on<br />
server&#8217;s inetd.conf, removing the clients&#8217; /etc/hostname.le0 (which I had<br />
added)&#8230; all to no avail.  &#8216;Twas the last work day before the Christmas<br />
break, and I was flummoxed.</p>
<p>Of course, I finally connected the error message &#8220;unknown protocol&#8221;<br />
with the removed /etc/protocols (and other) files, restored these<br />
files, after which everything was fine again.  I was pretty mad, since<br />
I had wasted a whole day on this problem, but *technically*, the Sun<br />
manual above is correct.</p>
<p>It just neglected to mention that of course, *no* machine is running<br />
NIS at boot time, therefore *every* machine needs valid data in the<br />
networks, services, protocols, and ethers files *at boot time*. Grrr!</p></blockquote>
<p>7</p>
<blockquote><p>From: yared@anteros.enst.fr (Nadim Yared)<br />
Organization: Telecom Paris, France</p>
<p>My story happened on a Sun Sparcstation 2</p>
<p>I once wanted to update the libc.so.1.7 to libc.so.1.8 by myself, so<br />
I got root, and then ftp the /lib/libc.so.1.8 to my /lib. Unfortunately<br />
there was not enough room on this partition. So all i got was a file<br />
with zero length.</p>
<p>The problem is that I ran /usr/etc/ldconfig in the directory /lib,<br />
and that was all. Every command could not be executed, cause ld.so<br />
checked for /libc.so.1.8, being the newest one. All i needed was a<br />
statically linked mv, but SUN does not provide usually the source.<br />
Even going single user didn&#8217;t do anything. So i had to install a<br />
miniroot on the swap partition, and cp /bin/mv from the CD-ROM,<br />
and execute-it.</p></blockquote>
<p>8</p>
<blockquote><p>From: TRIEMER@EAGLE.WESLEYAN.EDU<br />
Organization: Wesleyan College</p>
<p>I have been trying to put a at&#038;t 3b2/310 machine on the net for a<br />
while, I&#8217;ll skip the unbelievable hardware problems.  I&#8217;ll skip the<br />
paranoid system admins that forced me to build a temporary net to show<br />
them that the ethernet board worked.  Anyway, I get it up and running<br />
on the temp net &#8211; it works fine &#8211; a little slow, but hey.  Ok, so I&#8217;m<br />
ready to stick it on the net &#8211; you need to power down to do that right.<br />
So, I powered down.  Bad, bad bad mistake.  I had been running a sysadm<br />
shell script &#8211; I needed to change a password so that I could get into an<br />
account.  Well, would you believe that the script, despite the fact that<br />
I wasn&#8217;t in the passwd option anymore held onto the passwd file!  Stupid<br />
machine, stupid script.  Anyway&#8230; what that means is that when I boot<br />
up the machine, it passes diagnostics (A small miracle) runs unix and<br />
doesn&#8217;t let anyone log in!  I almost freaked.  Anyway, so&#8230;</p>
<p>There&#8217;s an undocumented option on the installation disks called<br />
&#8216;magic mode&#8217;  At one point it offers 4 options (none of which is magic)<br />
If you type magic mode at that point, you can get it&#8230; believe it or not<br />
some at&#038;t person had the nerve, and bizarre sense of humor to add one<br />
extra line to magic mode- you see when you type &#8216;magic mode&#8217; it says</p>
<p>  Poof!</p>
<p>That was just about the last thing I wanted to see&#8230; the rest was in a<br />
sense trivial&#8230; ran an fsck&#8230; it fixed it all for me.  So the moral of<br />
the story&#8230; never ever assume that some prepackaged script that you are<br />
running does anything right.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-upgrading-the-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Configuring the system</title>
		<link>http://www.unixnewbie.org/ss-configuring-the-system/</link>
		<comments>http://www.unixnewbie.org/ss-configuring-the-system/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 23:01:16 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=518</guid>
		<description><![CDATA[Configuring the system...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: peter@NeoSoft.com (Peter da Silva)<br />
Organization: NeoSoft Communications Services</p>
<p>Well, we had one system on which you couldn&#8217;t log in on the console for a<br />
while after rebooting, but it&#8217;d start working sometimes.  What was happening<br />
was that the manufacturer had, for some idiot reason, hardcoded the names<br />
of the terminals they wanted to support into getty (this manufacturers own<br />
terminals, that I can understand, but also a handful of common types like<br />
adm3a) so getty could clear the screen properly (I guess hacking that into<br />
gettydefs was too obvious or something).  If getty couldn&#8217;t recognise the<br />
terminal type on the command line, it&#8217;d display a message on the console<br />
reading &#8220;Unknown terminal type pc100&#8243;.  We ignored this flamage, which was<br />
a pity.  &#8216;Cos that was the problem.</p>
<p>It did this *before* opening the terminal, so if it happened to run between<br />
the time rc completed and the getty on the console started the console got<br />
attached to some random terminal somewhere, so when login attempted to open<br />
/dev/tty to prompt for a password it failed.</p>
<p>Moral: always deal with error messages even when you *know* they&#8217;re bogus.<br />
Moral: never cry wolf.</p></blockquote>
<p>2</p>
<blockquote><p>From: hirai@cc.swarthmore.edu (Eiji Hirai)<br />
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA</p>
<p>rik.harris@fcit.monash.edu.au writes:<br />
> I&#8217;ll mount it in /tmp</p>
<p>Though this may strike most sane sysadmins as bad practice, SunOS (3.4 or so<br />
- my memory is vague) shipped a command called &#8220;on&#8221;.  If you were logged on<br />
machine A and wanted to execute a command on machine B, you said &#8220;on B<br />
command&#8221;, sort of like rsh.</p>
<p>However, A would mount B&#8217;s disks under some invokations of &#8220;on&#8221; and it would<br />
mount it in /tmp!  Of course, lots of folks got bitten by this stupid<br />
command and it was taken out after a long delay by Sun.</p>
<p>Anyone remember the details?  I&#8217;ve blocked out my memory of pre-4.0 SunOS.<br />
Am I just hallucinating?</p></blockquote>
<p>3</p>
<blockquote><p>From: robjohn@ocdis01.UUCP (Contractor Bob Johnson)<br />
Organization: Tinker Air Force Base, Oklahoma</p>
<p>After changing my /etc/inittab file, I was going to kick init by sending<br />
it a HUP signal to tell it the file had changed.  Unfortunately, I missed<br />
and the 1 became a Q&#8230; kill -q 1.  Large systems die in interesting ways<br />
when you lose init!</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-configuring-the-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Partitioning the drives</title>
		<link>http://www.unixnewbie.org/ss-partitioning-the-drives/</link>
		<comments>http://www.unixnewbie.org/ss-partitioning-the-drives/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 22:57:04 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=514</guid>
		<description><![CDATA[Partitioning the drives...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: hirai@cc.swarthmore.edu (Eiji Hirai)<br />
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA</p>
<p>I wanted to create a second swap partition on another disk and made the<br />
partition start at sector 0 of the disk! (which sounded ok at the time since<br />
all other regular &#8216;a&#8217; partitions started on sector 0) Every time I rebooted,<br />
fsck would complain about missing partition tables &#8211; I initially suspected<br />
that the disk was bad but I later realized that swapping was overwriting the<br />
partition table.  I had lost an unknown percentage of the financial data for<br />
the institution that I was working for at the time, right when they were<br />
being audited!  Yikes!  Anyway, we were able to recover the data and life<br />
returned to normal but I did wonder at the time whether I could still keep<br />
my job there.</p></blockquote>
<p>2</p>
<blockquote><p>From: matthews@oberon.umd.edu (Mike Matthews)<br />
Organization: /etc/organization</p>
<p>We had just gotten a 1.2G disk drive for our Sun (which direly needed it) so<br />
we felt we&#8217;d repartition everything.</p>
<p>All went well, except&#8230; on reboot, one of the partitions that was newly<br />
restored from backup got a fsck error.  Fixed it, it rebooted, then another<br />
one got an error.  fscked that one, rebooted it, and doggone it, the first<br />
error was back!</p>
<p>We had a one cylinder overlap.  Sheesh.  At least Ultrix WARNS you of that.</p></blockquote>
<p>3</p>
<blockquote><p>From: mt00@eurotherm.co.uk (Martin Tomes)<br />
Organization: Eurotherm Limited</p>
<p>We had something really wierd happen one day.  I copied a file to<br />
/usr/local on someone elses machine and all seemed to be OK.  A bit<br />
later the user of the machine noticed that the files and directories they<br />
were using on another disk partition were corrupted.  There were 2<br />
gigbyte files on a 650Mb disk &#8211; and lots of them with wierd names and<br />
permissions.  At first I did not connect the two events.  This disk<br />
had given trouble when the power failed a week before, so I fsck&#8217;ed<br />
it.  Now I have run fsck more times than I can begin to imagine and<br />
seen plenty of errors, some needing &#8216;manual intervention&#8217; but I had<br />
never seen anything like this before!  It was spectacular.  And what<br />
was more, when I ran it a second time things got worse.  Then I tried<br />
to backup the /usr/local partition before restoring this corrupt data<br />
and lo, that was corrupt too.  It turned out that our sysadmin had<br />
created the /usr/local disk partition in the wrong place on the disk<br />
and put it over the top of the alternate sectors partition.  By<br />
writing to the /usr/local disk I had written all over the alts which<br />
were mapped into the users partition.  Oh dear, what a mess.</p>
<p>Solution, rebuild all the partitions so they don&#8217;t overlap and<br />
restore, also buy the sysadmin a calculator.</p>
<p>Moral, always do your sums on the /etc/partitions file very carefully<br />
before using mkpart.</p></blockquote>
<p>4</p>
<blockquote><p>From: caa@Unify.Com (Chris A. Anderson)<br />
Organization: Unify Corporation, Sacramento, California</p>
<p>At a company that I used to work for, the CEO&#8217;s brother  was  the<br />
&#8220;system  operator&#8221;.   It was his job to do backups, maintenance,<br />
etc.  Problem was, he didn&#8217;t have a clue about Unix.  We were required<br />
to go through him to do anything, though.</p>
<p>Well,   I   was   setting   up   a   Plexus   P-95   to   be    a<br />
news/mail/communications machine and needed to wipe the disks and<br />
install a new OS.  El CEO requested that his brother do  the  in-<br />
stallation  and disk partitioning.  He had done this before, so I<br />
gave him the partition maps and let him at it.  When he was done,<br />
everything  seemed to be ok.  Great, on with the install and set-<br />
up.</p>
<p>Things went fine until I started  compiling  the  news  and  mail<br />
software.   All  of  a sudden, the machine panicked.  I brought it<br />
back up and the root file system was  amazingly corrupt.   After<br />
rebuilding  things,  it  all seemed to be fine &#8212; diagnostics all<br />
ran fine, etc.  So I started again &#8212; this time keeping an eye on<br />
things.  Sure enough, the root file system became corrupted again<br />
when the system started to load.</p>
<p>This time I brought it down and checked everything.  The problem?<br />
Swap space started at block zero and so did the root file system.<br />
ARRRGGGHHHHH!!</p>
<p>Oh yes, the brother still works there.</p></blockquote>
<p>5</p>
<blockquote><p>From: obi@gumby.ocs.com (Obi Thomas)<br />
Organization: Online Computer Systems, Inc.</p>
<p>I once mistakenly partitioned my Sun&#8217;s boot disk so that the swap<br />
partition overlapped the usr partition. The machine ran fine for a long<br />
time (many months), presumably because the swap space was always nearly<br />
empty. Then, one day there was a memory parity error and the system crash<br />
dumped at the *end* of the swap partition. What should have been a simple<br />
reboot after the crash dump turned into a long and painful re-install of<br />
the entire system (Suns cannot boot without a /usr partition).</p>
<p>Now when I partition a disk I sit there with a calculator and make sure<br />
all the numbers add up correctly (offsets, number of cylinders, number of<br />
blocks, and so on).</p></blockquote>
<p>6</p>
<blockquote><p>From: dp@world.std.com (Jeff DelPapa)<br />
Organization: The World Public Access UNIX, Brookline, MA</p>
<p>obi@gumby.ocs.com (Obi Thomas) writes:<br />
[story about overlapping partitions deleted]</p>
<p>I remember a similar thing once &#8211; on a symbolics machine, a customer<br />
declared a file in the FEP filesystem as a paging file, and as part of<br />
the file system (it was one way to solve their disk space crunch) It<br />
was caught before damage was done &#8211; we weren&#8217;t sure if it was because<br />
they hadn&#8217;t done anything real yet, or simply the machine knew not to<br />
mess with the IRS (the customer).</p></blockquote>
<p>7</p>
<blockquote><p>From: kevin@sherman.pas.rochester.edu (kevin mcfadden)<br />
Organization: University of Rochester</p>
<p>Me and my co-system admin were in the process of repartioning a drive<br />
so that we could allocate more space for incoming mail.  We had<br />
just finished backing up our Data directory from which we were going<br />
to take 10MB from.  Next step was to to actually repartition it which<br />
includes formating.  Anyway, it comes time to give a device name<br />
and we do a df to see which one.  To make a short story long, there<br />
was a /dev/sd2g and a /dev/sd3g, one which was 300MB of stuff we<br />
could delete and the other was 600MB of applications.  We confused the<br />
the two and accidently formatted the 600 MB of applications, which<br />
of course had been backed up&#8230;&#8230;a month ago.  It could have been<br />
worse.</p>
<p>        BUT WAIT!!! It did.  Turns out it took 3 or 4 tries to get<br />
the partition size correct (what the hell is it with telling it<br />
how long it is in hex or whatever?).  It was at this point where<br />
I started to cover my eyes and wander around the building because<br />
we only found out the partition didn&#8217;t work after spending 3 hours<br />
restoring the applications.  4 * 3 = 12 hours to repartition!</p></blockquote>
<p>8</p>
<blockquote><p>From: Nick Sayer <mrapple@quack.sac.ca.us></p>
<p>I had to swap out a 327M disk on a Sun with a 669.  So I partitioned the<br />
669, then newfs&#8217;d a /, /usr and /home filesystem on partitions a, g and<br />
h respectively.  I then copied the / and /usr partition from the 327 over<br />
to the 669.</p>
<p>First, I forgot to run installboot on the new boot partition.  Whoops.<br />
Get out the tape and boot miniroot (5 minutes), then mount / and<br />
use installboot.  Fine.  Now it finds /vmunix correctly.</p>
<p>But on the 327, /usr was on the h partition, not g.  So when<br />
I rebooted with the 669 in place, it mounted the home partition<br />
on /usr.  fsck not found, reboot failed.  Well, that&#8217;s simple, I&#8217;ll just<br />
edit /etc/fstab and reboot.  But vi is on /usr. And home is mounted<br />
on /usr.  No problem, I&#8217;ll just mount usr on /mnt or something and<br />
do it that way.  Nope. vi is dynamically linked, and there&#8217;s no<br />
/usr/lib/ld.so.  Ok, so I&#8217;ll go back to single user and try it there.<br />
But how to reboot gracefully?  sync, shutdown, reboot&#8230; all in /usr,<br />
(mounted on /mnt) and dynamically linked.  So I gave it the vulcan neck<br />
pinch and booted into miniroot (5 minutes).  So miniroot is up.<br />
Fine.  Mount the / partition and use ed on /a/etc/fstab. Panic,<br />
dup ialloc.  The vulcan neck pinch had introduced a slight corruption<br />
in the filesystem.  But how to preen it?  fsck is in /usr, and it&#8217;s<br />
dynamically linked.  Sigh.</p>
<p>The solution was to mount the usr partition as /usr right on top<br />
of the home partition, run fsck to preen the root partition, reboot,<br />
mount /usr again, then remount / read-write, change /etc/fstab<br />
and reboot again.  So all was ok after an hour of fussing.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-partitioning-the-drives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Blaming on the hardware</title>
		<link>http://www.unixnewbie.org/ss-blaming-on-the-hardware/</link>
		<comments>http://www.unixnewbie.org/ss-blaming-on-the-hardware/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 21:00:45 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=510</guid>
		<description><![CDATA[Blaming it on the hardware...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: kelley@epg.nist.gov (Mike Kelley)<br />
Organization: NIST</p>
<p>We have a cluster of HP workstations and, once upon a time, were using<br />
1/4-tape as the backup medium.  This was very slow and cumbersome, as<br />
we were forever increasing the amount of disk space on our system, and<br />
we decided to purchase HP&#8217;s optical jukebox to use both as large<br />
removable media and as the primary backup device.</p>
<p>We had been experiencing occasional problems with the 1/4-inch tape<br />
backups, but HP&#8217;s hardware service engineer convinced us that the<br />
problems were resolved.  A complete backup was performed prior to<br />
installation (by the HP engineer) of the jukebox.  Two unfortunate<br />
things happened.  First, the problems on our backup tapes were due to<br />
intermittent hardware problems on the tape drive which were not<br />
discovered by the extensive diagnostics performed on the tape drive.<br />
Second, the engineer installed the jukebox with the same hardware SCSI<br />
address as our root file system.</p>
<p>As you may have anticipated, the attempt to mediainit the first<br />
optical cartridge resulted in a rather ungraceful failure of the root<br />
file system.  This was compounded by the fact that much of the data on<br />
the backup tapes was not recoverable.</p></blockquote>
<p>2</p>
<blockquote><p>From: robjohn@ocdis01.UUCP (Contractor Bob Johnson)<br />
Organization: Tinker Air Force Base, Oklahoma</p>
<p>We had an operator lay a book on the console keyboard, throwing the console<br />
into system monitor mode.  This stops the system clock, which locks every<br />
session dead in it&#8217;s tracks. At that time we had over 100 user sessions<br />
running.  Most of our inbound lines are essentially modem lines on a very<br />
large &#8220;rotor&#8221;.  After their session hung for a minute or so, many users<br />
disconnected and called back.  They got connected, but received no login<br />
prompt (the system was in a sort of suspended animation).  Little did they<br />
know that they were now on a different port than the one they just abandoned.</p>
<p>A call to the computer room soon identified the problem, and the operator was<br />
given the commands to resume normal system operation.  As near as we can<br />
figure, somewhere around half of the users had disconnected but the system<br />
didn&#8217;t notice because it never saw carrier drop on those ports (being dead).<br />
New, different users had now connected to those ports.  We received several<br />
semi-confused user calls, realized what had happened and invoked the magic<br />
&#8220;/etc/shutdown NOW&#8221; command.  The procedure (should this ever happen again)<br />
will be to manually panic the system and reboot.  I also surgically removed<br />
the keycap from that particular key on our terminal &#8211; you have to work to<br />
press it now!</p></blockquote>
<p>3</p>
<blockquote><p>From: stehman%citron.cs.clemson.edu@hubcap.clemson.edu (Jeff Stehman)<br />
Organization: Clemson University</p>
<p>Many years ago a tiny little college in the middle of nowhere purchased an<br />
NCR tower, then a newfangled contraption.  A half-dozen of us were using it<br />
for an assembly class.  The prof should have made his warnings about TRAP a<br />
little more clear.  One student runs his program and it suddenly begans<br />
spawning processes, rapidly filling the machine.  The prof came in, amused,<br />
logged on as superuser, and killed a process.  Another process was<br />
immediately spawned.  The prof tried again.  He was ignored.  He was also no<br />
longer amused.  After several minutes he gave up and turned off the box.<br />
The tower didn&#8217;t even flinch.  He pulled the plug.  Nothing.  He ripped the<br />
back off the box and dug around.  Finally he found the fuse and pulled it,<br />
killing the machine.  Some of us later claimed we heard laughter as it went<br />
down.</p>
<p>Many times since then I have wished other computers came with a backup<br />
battery as standard issue.</p></blockquote>
<p>4</p>
<blockquote><p>From: pinard@IRO.UMontreal.CA (Francois Pinard)<br />
Organization: Universite&#8217; de Montre&#8217;al</p>
<p>Many things happened in those many years I&#8217;ve been with computers.<br />
The most horrorful story I&#8217;ve seen is not UNIX related, but it is<br />
certainly worth a tale.  Here it goes.</p>
<p>This big (:-) CDC 6600 system was bootable from tape drive 0, using<br />
these 12 inches wheels containing 1/2&#8243; tape.  The *whole* system was<br />
reloaded anew from the tape each time we restarted the machine,<br />
because there was no permanent file system yet, the disks were not<br />
meant to retain files through computer restarts (unbelievable today, I<br />
know <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .  The deadstart tapes (as they were called) were quite<br />
valuable, and we were keeping at least a dozen backups of those, going<br />
back maybe one or two years in development.</p>
<p>The problem was that the two vacuum capstans which were driving the<br />
tape 0, near the magnetic heads, were not perfectly synchronized, due<br />
to an hardware misadjustment.  So they were stretching the tape while<br />
they were reading it, wearing it in a way invisible to the eye, but<br />
nevertheless making the tape irrecoverable.  Besides that, everything<br />
was looking normal in the tape physical and electrical operations.  Of<br />
course, nobody knew about this problem when it suddenly appeared.</p>
<p>All this happened while all the system administration team went into<br />
vacation at the same time.  Not being a traveler, I just stayed<br />
available `on call&#8217;.  The knowledgeable operators were able solve many<br />
situations, and being kind guys for me (I was for them <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> , they would<br />
not disturb me just for a non-working deadstart tape.  Further, they<br />
had a full list of all deadstart backup tapes.  So, they first tried<br />
(and destroyed) half a dozen backups before turning the machine to the<br />
hardware guys, whom destroyed themselves a few more.</p>
<p>The technicians had their own systems for diagnostics, all bootable<br />
from tape drive 0, of course.  They had far less backups to we did.<br />
They destroyed almost them all before calling me in.  Once told what<br />
happened, my only suggestion was to alter the deadstart sequence so to<br />
become able to boot from another tape drive.  Strangely enough, nobody<br />
thought about it yet.  In these old times, software guys were always<br />
suspecting hardware, and vice versa <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p>Happily enough, the few tapes left started, both for production and<br />
for the technicians.  Tape drive 0 being quite suspectable, the<br />
technicians finally discovered the problem and repaired it.  My only<br />
job left was to upgrade the system from almost one year back, before<br />
turning it to operations.  This was at the time, now seemingly lost,<br />
when system teams were heavily modifying their operating system<br />
sources.  This was also the time when everything not on big tapes was<br />
all on punched Hollerith cards, the only interactive device being the<br />
system console.  It took me many days, alone, having the machine in<br />
standalone mode.  The crowd of users stopped regularily in the windows<br />
of the computer room, taking bets, as they were used to do, on how<br />
fast I will get the machine back up (I got some of my supporters<br />
loosing their money, this time <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p>
<p>This was quite hard work for me, done under high pressure.  When the<br />
remainder of the staff returned from trip, and when I told them the<br />
whole tale, we decided to never synchronize our holidays again.</p></blockquote>
<p>5</p>
<blockquote><p>From: ravi@usv.com (Ravi Ramachandran)</p>
<p>At one time, there were three of us working on a unique SVR3.2 motorola<br />
based machine, on a R&#038;D project.  I took care of all the SysAdmin tasks,<br />
I had a back up administrator, and the third person had been stuck into<br />
my group (company politics).  The group project files were in /user and<br />
the individial ones in /user2.  We had managed to get backup from the<br />
operations department for /user only (not even /; security paranoia?).<br />
Anyway, I had another scsi hard disk that I used for making a disk copy<br />
of the primary scsi hard disk every Friday.  This disk was connected, but<br />
not mounted, so that I could do the disk backup from my desk when I wanted<br />
to.  This machine used to sometimes get a scsi error such that you could<br />
not log in, but the processes already running on the machine were not<br />
affected.  If were logged in the console, you just powered off the machine<br />
for a few minutes and rebooted it.  Around holidays time the other Admin<br />
was off in a long vacation.  I had taken Monday off, and headed off for a<br />
four day weekend.  The machine does the same blurp.  The third person<br />
decides the power off the machine &#038; turn it back on immediately.  It does<br />
not come up properly.  She decides to reinstall the machine using the<br />
installation tape that I had unfortunately left in the open.  Reformats the<br />
hard disk, installs the base system, and is stuck at that point when I come<br />
back in on Tuesday.  I almost blow a blood vessel but try to keep calm<br />
&#8217;cause I had made a disk copy about 10 days before (too anxious to get on<br />
my holiday the previous week).  Try to mount the disk&#8230; hit vaccuum.  Try<br />
using dd to look at the disk&#8230; Seemed to be a large /dev/null <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_confused.gif' alt=':-?' class='wp-smiley' />   When the<br />
lady decided to reinstall the system, it asked her what scsi disks she<br />
wanted to reformat, and she said &#8220;y&#8221; for both 0 &#038; 1!!  All my<br />
sample/trial&#038;error work for a year had bitten the dust.<br />
My only (small) consolation was that I was not the only one affected.
</p></blockquote>
<p>6</p>
<blockquote><p>From: williams@nssdcs.gsfc.nasa.gov (Jim Williams)<br />
Organization: NASA Goddard Space Flight Center, Greenbelt, Maryland</p>
<p>Story One is about The Sun 3/260 That Froze Solid.  One day a user<br />
reported that the Sun 3/260 he was using was &#8220;dead&#8221;.  On inspection, I<br />
found the Sun at the console prompt and the keyboard totally<br />
unresponsive.  The L1-A sequence did nothing.  So I power cycled it.<br />
Nothing.  A blank screen, no activity.  I was ready to call service,<br />
then decided to try rebooting with the normal/diag switch set to diag.<br />
On looking at the back of the pedestal, I saw that the ethernet cable<br />
had been pressed up against the reset switch!  ARGGGHHHH!  The user<br />
had pushed the machine back just enough to press the switch and keep<br />
it pressed.  (I don&#8217;t recall if there was a &#8220;watchdog reset&#8221; message<br />
on the console when I found it, but I was new enough to Suns that that<br />
would not have been a dead givaway.)</p>
<p>Story Two involved connecting an HP laserjet to a Sun 3/280.  This<br />
sucker just would NOT do flow control correctly.  I put a dumb<br />
terminal in place of the HP and manually typed ^S/^Q sequences to<br />
prove that the serial port really was honoring X-ON/X-OFF.  But for<br />
some reason the ^Ss from the HP didn&#8217;t &#8220;taste right&#8221; to the Sun, which<br />
ignored them.  Switching the HP serial port between RS422/RS232 had no<br />
effect.  It evenually turned out to be some sort of flakeyness with<br />
the Sun ALM-II board.  Everything worked fine after I moved the<br />
printer to one of the built-in Zilog ports.  Death to flakey hardware&#8230;
</p></blockquote>
<p>7</p>
<blockquote><p>From: ken@sugra.uucp (Kenneth Ng)<br />
Organization: Private Computer, Totowa, NJ</p>
<p>In article <1992Oct16.152629.29804@nsisrv.gsfc.nasa.gov: williams@nssdcs.gsfc.na<br />
[story about connecting HP LJ to a Sun 3/280 with an ALM-II board deleted]</p>
<p>ARRRGGGHHH!!!! DEATH TO ALM-II BOARDS!  Funny though, I do have an HPLJ-2<br />
hooked up to a SUN 690MP through the ALM-2 boards without problems.  However<br />
I also had Sun going up the wall with myself with an Okidata 320 printer<br />
that would hang the port until we reboot the machine  (not a nice thing to<br />
do with a dozen stock brokers).  Funny thing is, we had ANOTHER Okidata 320<br />
printer attached to the same Sun on another ALM-2 port, no problem with that<br />
one.  Hm, switch the printers, no change.  Switch the cables, no change.<br />
Switch the ports, no change.  Wierd.  Finally discovered it was the DATA that<br />
was being sent.  The printer with problems was a label printer, which was<br />
sending a control-s every 10-20 characters or so to pause the Sun.  Apparently<br />
the Sun ALM-2 drivers can not handle control-s'es too frequently.  No problem,<br />
Sun said, just switch to hardware flow control.  Puzzled me, because my docs<br />
said the ALM boards had no hardware flow control.  But his docs said they<br />
were there.  Took the printer off line, started the lpd, data scope showed the<br />
data going out.  Talked to Sun again, tried RTS-CTS, DTR, 'crtscts' in printcap,<br />
'-crtscts' in printcap.  Trying all kinds combinations.  Finally he asked me<br />
which ALM-2 port I was using, 13 I responded.  Oh, ALM-2 ports only have the<br />
hardware flow control in the first four ports.  Whoops <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .  Both docs were,<br />
true, my docs said there was no hardware flow control, which was right, on<br />
the last 12 ports.  His docs said that there was hw flow control, but he<br />
missed the 'on the first four ports' part.  Now it works, and I hope Sun<br />
now has this better documented.</p></blockquote>
<p>8</p>
<blockquote><p>From: gary@resumix.portal.com (Gary M. Lin)<br />
Organization: Resumix Inc.</p>
<p>My company markets turnkey solutions for resume-processing, so most of our<br />
customers are non-technical HR recruiters.  We contract third-party field<br />
service to a fairly recognizable name in the industry.</p>
<p>I received a call from an irate user who noticed intolerable delays after<br />
some upgrades were done to the customer&#8217;s branch offices.  His ELC would use<br />
dial-up to establish a link before running software off the server in a<br />
different site.</p>
<p>He attributed the delay to slow dial-up links and software changes, but then<br />
the customer mentioned that quitting WordPerfect and switching to our applic-<br />
ation took over an hour.  I asked what the system was doing during that hour.<br />
He replied the disk was constantly spinning.  Puzzled, I checked his swap,<br />
which was more than sufficient.  Then finally I noticed his ELC booted with<br />
only 4 meg of memory.</p>
<p>Think the field technician swapped their CPU board a month ago and forgot to<br />
move the SIMMs over.  The worst part of it was the customer went on with this<br />
situation for a month before bringing it to our attention!</p>
<p>Moral of the story:  Check that the service guy puts everything back in.</p></blockquote>
<p>9</p>
<blockquote><p>From: greep@Speech.SRI.COM (Steven Tepper)<br />
Organization: SRI International</p>
<p>I once had problems with files that mysteriously refused to stayed<br />
changed for very long.  It was a PDP-11 Unix system that had crashed,<br />
and I brought it up single-user.  I would change some file and it<br />
would stay changed for a minute or so but then revert to its earlier<br />
state (contents, protection mode, etc).  What happened was that the<br />
write-protect switch on the disk drive had gotten bumped into the &#8220;on&#8221;<br />
position but the device driver failed to report any write errors.  As<br />
long as the data stayed in kernel buffers the changes &#8220;took&#8221;, but they<br />
would disappear once the buffers were reused and the system had to<br />
reread the disk.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-blaming-on-the-hardware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Making backups</title>
		<link>http://www.unixnewbie.org/ss-making-backups/</link>
		<comments>http://www.unixnewbie.org/ss-making-backups/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 20:54:50 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=508</guid>
		<description><![CDATA[Making backups...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: rickf@pmafire.inel.gov (Rick Furniss)<br />
Organization: WINCO</p>
<p>Murphy&#8217;s law #?? , preventive maintenence doesnt.</p>
<p>try this one:   /etc/dump /dev/rmt/0m /dev/dsk/0s1<br />
          Or:   tar cvf /dev/root /dev/rmt0</p>
<p>Backups on unix can be one of the most dangerous commands used, and they<br />
are used to prevent rather than cause a problem.  If any Unix utility were<br />
a candidate for a warning message, or error checking, this would be it.</p>
<p>Just in case you didnt catch the HORROR above, the parameters are backworks<br />
causing a TOTAL wipe out of the root file systems.</p>
<p>More systems have been wiped out by admins than any hacker could do in<br />
a life time.</p></blockquote>
<p>2</p>
<blockquote><p>From: grant@unisys.co.nz (Grant McLean)<br />
Organization: Unisys New Zealand</p>
<p>One of my customers (who shall remain nameless) was having a problem with<br />
insufficient swap space.  I recommended that he back up the system, boot<br />
off the OS tape, repartition the disk, remake the filesystems and restore<br />
the data (any idiot could do this, right? <img src='http://www.unixnewbie.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  ).  I also suggested that if<br />
he wasn&#8217;t confident of achieving all this, we could provide a skilled<br />
person for a modest fee.  Of course he was fully confident so I left him<br />
to it.</p>
<p>Next day I get a call from the guy to say he&#8217;d been there all night and<br />
he&#8217;d had all sorts of funny messages when restoring from tape.</p>
<p>Eventually we tracked his problem down to the backup script he&#8217;d been<br />
using.  It was a simple one liner:</p>
<p>  find / -print [ cpio -oc ] dd -obs=100k of=/dev/rmt0 2>/dev/null</p>
<p>This was a problem because:</p>
<p>  1) His system had two 300MB drives<br />
  2) He only had a 150MB tape drive<br />
  3) The same script was being run every night by a cron job<br />
  4) All his backups were created by this script</p>
<p>(In case you haven&#8217;t worked it out, the dd is to speed up writes to tape<br />
but it has the unfortunate side effect that CPIO never finds out about<br />
the end of tape.  Because the errors were going to the bit bucket, they<br />
never knew their backups were incomplete until they came to restore from<br />
them).</p>
<p>I would have loved to be a fly on the wall when he explained to his boss<br />
that the data was gone and there was no way of getting it back.</p></blockquote>
<p>3</p>
<blockquote><p>From: ravi@usv.com (Ravi Ramachandran)</p>
<p>Live 24 hour online system.  Does backup over the ethernet to a SCSI tape.<br />
Unfortunately, no SCSI on this system to recover if root/ethernet dies.<br />
This was a Compaq Systempro running SCO Unix.  Slated a downtime of 4-6am.<br />
I thought that it will take me only 30 minutes, as I had installed a<br />
similar (Adaptec) SCSI board on a similiar hardware on SCO. Only difference<br />
was that this machine was running MPX (multiprocess extension) and you had<br />
to deinstall it, install the SCSI, and then reinstall MPX (proper procedure).<br />
I had made all my slot/IRQ charts the previous day, and so got busy removing<br />
MPX.  Then said &#8220;mkdev tape&#8221;, go through the IDs, and am almost at home<br />
base.  Then&#8230; &#8220;link kit not installed, use floppy X1&#8243; when I tried to remake<br />
the kernel.  For some reason, when I removed the multiprocessor extension,<br />
the single processor files were not moved to their right location.  And if<br />
I reinstalled the single, all my changes would be lost.  Finally, restored the<br />
OS (from backup) on the remote machine, and then rcp-ed them over to bring back<br />
the MPX version.  Unfortunately, rcp does not maintain the date/ permissions,<br />
etc.  Got a limpimg version of the machine back on-line about 45 minutes<br />
after its slated time, and spent the rest of the day fixing vagrant files.<br />
The next week, I moved the online programs to another machine (a headache),<br />
and reinstalled this machine from scratch.</p></blockquote>
<p>4</p>
<blockquote><p>From: keith@ksmith.uucp (Keith Smith)<br />
Organization: Keith&#8217;s Computer, Hope Mills, NC</p>
<p>My dumbest move ever.  Client in Charlotte, NC (3 hours + away) has<br />
Xenix box with like 15 users running single app.  They have a tape<br />
backup of course.  Anyway they ran slam out of space on the 70MB disk<br />
drive so I upgraded them from an MFM to a SCSI 150MB disk.  Restored<br />
their app &#038; data files, and they were off and running.  Anyway they did<br />
an application directories backup (tar) on a daily basis and backed the<br />
rest of the system up with tar on Monday morning.</p>
<p>Being a nice guy I built a menu system and installed the backups on the<br />
menu so they could do it with a push of the button.  Swell,  It&#8217;s Monday.<br />
Call if anything else comes up.  1 week later I get a call.  Console is<br />
scrolling messages, App seems to be missing yesterday&#8217;s orders, etc.<br />
Call in, and cannot log in.  &#8216;w&#8217; doesn&#8217;t work.  Crazy stuff.  Really<br />
strange.</p>
<p>Grab old drive/controller, fly to Charlotte replace drive, install<br />
app backup tape.  They re-key missing stuff, etc.  Bring new disk back.<br />
Won&#8217;t boot, won&#8217;t do anything.  Boot emergency floppy set.  Looking<br />
around.  Can&#8217;t figure but have backup tape from that morning that<br />
&#8220;completed successfully&#8221;.  tar tvf /dev/rct0.  Hmm, why all these<br />
files look very OLD.  Uh, Where, Uh.  Look at menu command for the<br />
&#8220;backup&#8221; is &#8216;tar xvf /dev/rct0 /&#8217;</p>
<p>Anyway, I owned up to the mistake, re-loaded the SCSI drivers and<br />
changed the command to &#8216;tar cvf ..&#8217;</p>
<p>Hehehe,  Now I DOUBLE check what I put on a menu, and try not to be in a<br />
*HURRY* when I do this stuff.</p></blockquote>
<p>5</p>
<blockquote><p>From: mike@pacsoft.com (Mike Stefanik)<br />
Organization: Pacific Software Group, Riverside, CA</p>
<p>One of the more interesting problems that I ran into was a customer that<br />
was having problems with their SCSI tape drive on a XENIX box. Around midnight,<br />
every night, the system would automatically backup and verify their data. One<br />
day, the customer needed to restore some data files from the last night&#8217;s<br />
backup. She called because, although the restore worked just fine, she didn&#8217;t<br />
see the busy light on the drive come on, and it didn&#8217;t sound like the tape was<br />
moving. I dialed up the system, had her put a tape in and did a retension &#8211;<br />
the drive started winding the tape back and forth, and we both concluded that<br />
she was mistaken. After all, the tape was retensioning, and she wasn&#8217;t getting<br />
any backup or verify errors at all. I just chalked this one up to user<br />
confusion.</p>
<p>A few days later, she called back saying that there really is something wrong<br />
with the tape. She needed to restore some data from a few days ago, and like<br />
before, the busy light on the drive didn&#8217;t come on, but files did restore.<br />
However when she started the application program, the data hadn&#8217;t changed. I<br />
dialed up the system again, and just on a fluke, issued a &#8220;df&#8221; &#8212; it showed<br />
their rather large root filesystem to be nearly full. Confused, I did a &#8220;find&#8221;,<br />
searching for files over 1MB. Of course, what I found was this huge file named<br />
/dev/rct0. As I later discovered, their system had crashed a few weeks ago,<br />
and she had simply answered &#8220;yes&#8221; to a bunch of questions that it asked when<br />
she brought it back up. The /dev/rct0 device was removed (but /dev/xct0 was<br />
still there, which allowed me to retension the tape) and the backup script<br />
never checked to make sure that it was actually writing to a character device.</p>
<p>Needless to say, I modified the backup program to make sure that it was really<br />
writing to a device, and I made her promise to call me whenever the system<br />
crashed or asked &#8220;funny questions&#8221; when it was booting.</p></blockquote>
<p>6</p>
<blockquote><p>From: Nick Sayer <mrapple@quack.sac.ca.us></p>
<p>And then there was the time the / disk was full but nobody knew where<br />
the space was going. &#8216;Course this was on an Ultrix box and everyone&#8217;s<br />
used to using Suns, so they were tarring to /dev/rst*. Sure enough,<br />
/dev/rst8 was a 20M file in a 25M partition.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-making-backups/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sysadmin Stories: Dealing with dev files</title>
		<link>http://www.unixnewbie.org/ss-dealing-with-dev-files/</link>
		<comments>http://www.unixnewbie.org/ss-dealing-with-dev-files/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 20:40:07 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Sysadmin Stories]]></category>

		<guid isPermaLink="false">http://www.unixnewbie.org/?p=506</guid>
		<description><![CDATA[Dealing with dev files...]]></description>
			<content:encoded><![CDATA[<p>1</p>
<blockquote><p>From: nickp@BNR.CA (&#8220;Nick  Pitfield&#8221;, N.T.)</p>
<p>One of my colleagues had been itching to get into sys admin for some time,<br />
so last week he was finally sent on a 5-day sys admin course run by HP in<br />
Bracknell.</p>
<p>On the following Sunday, he decided to try out his new-found knowledge by<br />
trying to connect and configure a DAT drive on one of our critical test<br />
systems. He connected the cables up okay, and then created the device file<br />
using &#8216;mknod&#8217;.</p>
<p>Unfortunately, he gave the device file the same minor &#038; major device numbers<br />
as the root disk; so as soon as he tried to write to this newly installed<br />
&#8216;DAT drive&#8217;, the machine went tits up with a corrupt root disk&#8230;.ho hum.</p></blockquote>
<p>2</p>
<blockquote><p>From: philip@haas.berkeley.edu (Philip Enteles)<br />
Organization: Haas School of Business, Berkeley</p>
<p>As a new system administrator of a Unix machine with limited space I<br />
thought I was doing myself a favor by keeping things neat and clean.  One<br />
day as I was &#8216;cleaning up&#8217; I removed a file called &#8216;bzero&#8217;.  Strange<br />
things started to happen like vi didn&#8217;t work then the complaints started<br />
coming in.  Mail didn&#8217;t work.  The compilers didn&#8217;t work.  About this time<br />
the REAL system administrator poked his head in and asked what I had<br />
done.  Further examination showed that bzero is the zeroed memory without<br />
which the OS had no operating space so anything using temporary memory<br />
was non-functional.  The repair?  Well things are tough to do when most of<br />
the utilities don&#8217;t work.  Eventually the REAL system administrator took<br />
the system to single user and rebuilt the system including full<br />
restores from a tape system.  The Moral is don&#8217;t be to anal about things<br />
you don&#8217;t understand.  Take the time learn what those strange files are before<br />
removing them and screwing yourself.</p></blockquote>
<p>3</p>
<blockquote><p>From: broberts@waggen.twuug.com (Bill Roberts)<br />
Organization: Brite Systems</p>
<p>My most interesting in the reguard was when I deleted &#8220;/dev/null&#8221;.  Of<br />
course it was soon recreated as a &#8220;regular file&#8221;, then permission problems<br />
started to show up.</p>
<p>I was new at the game at the time and couldn&#8217;t figure out what happened!<br />
It look good to me.  I didn&#8217;t know about &#8220;special files&#8221; and &#8220;mknod&#8221; and<br />
major and minor device codes.  A friend finally helped out and started<br />
laughing and put me on the right track.  That one episode taught me a<br />
lot about my system.</p></blockquote>
<p>4</p>
<blockquote><p>From: Frank T Lofaro <fl0p+@andrew.cmu.edu><br />
Organization: Sophomore, Math/Computer Science, Carnegie Mellon, Pittsburgh, PA</p>
<p>    Well one time I was installing a minimal base system of Linux on a<br />
friends PC, so that we would have all the necessary utlitities to bring<br />
over the rest of the stuff.  His 3 1/2 inch disk was dead, so when had to<br />
get the 5 1/4 inch version of the boot/root disk.  Too bad that version,<br />
having to fit in 1.2M instead of 1.44, didn&#8217;t have tar.  We could get a<br />
version of tar, but it was in a tar file (nice chicken and egg<br />
scenario).  I said, okay, since we don&#8217;t have tar, we can&#8217;t use that to<br />
copy the files from floppy to the hard disk, I&#8217;ll use cp instead (bad<br />
move).  It actually seemed to work for a while, then the machine<br />
rebooted!  I did it again, the same thing happened.  Then I realize cp<br />
wouldn&#8217;t work on device files! (this is what happens when you try to<br />
install un*x at 3 AM).  It just read the contents of the device and made<br />
a file containing such, which is undesireable in any event.  (when it<br />
read /dev/port, the device file that references I/O ports, it must&#8217;ve<br />
did something to reboot the machine, that was the file that was causing<br />
the reboots).</p>
<p>    I finally got it working by having him get the tar archive of the<br />
linux binaries (including the tar we needed), and untarring it on one of<br />
the public decstations here, so we could ftp tar to his PC using his dos<br />
tcp/ip stuff.  A funny aside was that it untarred into ~/bin, and<br />
superseded all his normal commands.  We were wondering why everything<br />
wouldn&#8217;t run.  Luckily it wasn&#8217;t too hard to fix after we realized what<br />
happened.</p></blockquote>
<p>5</p>
<blockquote><p>From: hirai@cc.swarthmore.edu (Eiji Hirai)<br />
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA</p>
<p>A consultant we had hired (and not a very good one) was installing Unix<br />
on one our workstations.  He was mucking with creating and deleting<br />
/dev/tty* files and made /dev/tty a regular file.  Weird things started to<br />
happen.  Commands would only print their output if you pressed return twice,<br />
etc.  Fortunately, we solved the problem by re-mknod-ing /dev/tty.  However,<br />
it took a while to realize what was causing this problem.</p></blockquote>
<p>6</p>
<blockquote><p>From: lingnau@math.uni-frankfurt.de (Anselm Lingnau)<br />
Organization: University of Frankfurt/Main, Dept. of Mathematics</p>
<p>broberts@waggen.twuug.com (Bill Roberts) writes:<br />
[story about deleting /dev/null deleted.  -ed.]</p>
<p>Years ago when I was working in the Graphics Workshop at Edinburgh University,<br />
we used to have a small UNIX machine for testing. The machine wasn&#8217;t used too<br />
much, so nobody bothered to set up user accounts, and so everybody was running<br />
as root all the time. Now one of the chaps who used to come in was fond of<br />
reading fortunes (/usr/games/fortune having been removed from the University&#8217;s<br />
real machines along with all the other games). Guess what happened when the<br />
machine said</p>
<p># fortune<br />
fortune: write error on /dev/null &#8212; please empty the bit bucket</p>
<p>Quite a lot of stuff wouldn&#8217;t work after the chap was done with the machine<br />
for the day. You bet we put up proper accounts after that!
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.unixnewbie.org/ss-dealing-with-dev-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
