Sysadmin Stories: Partitioning the drives

by Stephen on October 19, 2009 · 0 comments

in Sysadmin Stories


From: (Eiji Hirai)
Organization: Information Services, Swarthmore College, Swarthmore, PA, USA

I wanted to create a second swap partition on another disk and made the
partition start at sector 0 of the disk! (which sounded ok at the time since
all other regular ‘a’ partitions started on sector 0) Every time I rebooted,
fsck would complain about missing partition tables – I initially suspected
that the disk was bad but I later realized that swapping was overwriting the
partition table. I had lost an unknown percentage of the financial data for
the institution that I was working for at the time, right when they were
being audited! Yikes! Anyway, we were able to recover the data and life
returned to normal but I did wonder at the time whether I could still keep
my job there.


From: (Mike Matthews)
Organization: /etc/organization

We had just gotten a 1.2G disk drive for our Sun (which direly needed it) so
we felt we’d repartition everything.

All went well, except… on reboot, one of the partitions that was newly
restored from backup got a fsck error. Fixed it, it rebooted, then another
one got an error. fscked that one, rebooted it, and doggone it, the first
error was back!

We had a one cylinder overlap. Sheesh. At least Ultrix WARNS you of that.


From: (Martin Tomes)
Organization: Eurotherm Limited

We had something really wierd happen one day. I copied a file to
/usr/local on someone elses machine and all seemed to be OK. A bit
later the user of the machine noticed that the files and directories they
were using on another disk partition were corrupted. There were 2
gigbyte files on a 650Mb disk – and lots of them with wierd names and
permissions. At first I did not connect the two events. This disk
had given trouble when the power failed a week before, so I fsck’ed
it. Now I have run fsck more times than I can begin to imagine and
seen plenty of errors, some needing ‘manual intervention’ but I had
never seen anything like this before! It was spectacular. And what
was more, when I ran it a second time things got worse. Then I tried
to backup the /usr/local partition before restoring this corrupt data
and lo, that was corrupt too. It turned out that our sysadmin had
created the /usr/local disk partition in the wrong place on the disk
and put it over the top of the alternate sectors partition. By
writing to the /usr/local disk I had written all over the alts which
were mapped into the users partition. Oh dear, what a mess.

Solution, rebuild all the partitions so they don’t overlap and
restore, also buy the sysadmin a calculator.

Moral, always do your sums on the /etc/partitions file very carefully
before using mkpart.


From: caa@Unify.Com (Chris A. Anderson)
Organization: Unify Corporation, Sacramento, California

At a company that I used to work for, the CEO’s brother was the
“system operator”. It was his job to do backups, maintenance,
etc. Problem was, he didn’t have a clue about Unix. We were required
to go through him to do anything, though.

Well, I was setting up a Plexus P-95 to be a
news/mail/communications machine and needed to wipe the disks and
install a new OS. El CEO requested that his brother do the in-
stallation and disk partitioning. He had done this before, so I
gave him the partition maps and let him at it. When he was done,
everything seemed to be ok. Great, on with the install and set-

Things went fine until I started compiling the news and mail
software. All of a sudden, the machine panicked. I brought it
back up and the root file system was amazingly corrupt. After
rebuilding things, it all seemed to be fine — diagnostics all
ran fine, etc. So I started again — this time keeping an eye on
things. Sure enough, the root file system became corrupted again
when the system started to load.

This time I brought it down and checked everything. The problem?
Swap space started at block zero and so did the root file system.

Oh yes, the brother still works there.


From: (Obi Thomas)
Organization: Online Computer Systems, Inc.

I once mistakenly partitioned my Sun’s boot disk so that the swap
partition overlapped the usr partition. The machine ran fine for a long
time (many months), presumably because the swap space was always nearly
empty. Then, one day there was a memory parity error and the system crash
dumped at the *end* of the swap partition. What should have been a simple
reboot after the crash dump turned into a long and painful re-install of
the entire system (Suns cannot boot without a /usr partition).

Now when I partition a disk I sit there with a calculator and make sure
all the numbers add up correctly (offsets, number of cylinders, number of
blocks, and so on).


From: (Jeff DelPapa)
Organization: The World Public Access UNIX, Brookline, MA (Obi Thomas) writes:
[story about overlapping partitions deleted]

I remember a similar thing once – on a symbolics machine, a customer
declared a file in the FEP filesystem as a paging file, and as part of
the file system (it was one way to solve their disk space crunch) It
was caught before damage was done – we weren’t sure if it was because
they hadn’t done anything real yet, or simply the machine knew not to
mess with the IRS (the customer).


From: (kevin mcfadden)
Organization: University of Rochester

Me and my co-system admin were in the process of repartioning a drive
so that we could allocate more space for incoming mail. We had
just finished backing up our Data directory from which we were going
to take 10MB from. Next step was to to actually repartition it which
includes formating. Anyway, it comes time to give a device name
and we do a df to see which one. To make a short story long, there
was a /dev/sd2g and a /dev/sd3g, one which was 300MB of stuff we
could delete and the other was 600MB of applications. We confused the
the two and accidently formatted the 600 MB of applications, which
of course had been backed up……a month ago. It could have been

BUT WAIT!!! It did. Turns out it took 3 or 4 tries to get
the partition size correct (what the hell is it with telling it
how long it is in hex or whatever?). It was at this point where
I started to cover my eyes and wander around the building because
we only found out the partition didn’t work after spending 3 hours
restoring the applications. 4 * 3 = 12 hours to repartition!


From: Nick Sayer

I had to swap out a 327M disk on a Sun with a 669. So I partitioned the
669, then newfs’d a /, /usr and /home filesystem on partitions a, g and
h respectively. I then copied the / and /usr partition from the 327 over
to the 669.

First, I forgot to run installboot on the new boot partition. Whoops.
Get out the tape and boot miniroot (5 minutes), then mount / and
use installboot. Fine. Now it finds /vmunix correctly.

But on the 327, /usr was on the h partition, not g. So when
I rebooted with the 669 in place, it mounted the home partition
on /usr. fsck not found, reboot failed. Well, that’s simple, I’ll just
edit /etc/fstab and reboot. But vi is on /usr. And home is mounted
on /usr. No problem, I’ll just mount usr on /mnt or something and
do it that way. Nope. vi is dynamically linked, and there’s no
/usr/lib/ Ok, so I’ll go back to single user and try it there.
But how to reboot gracefully? sync, shutdown, reboot… all in /usr,
(mounted on /mnt) and dynamically linked. So I gave it the vulcan neck
pinch and booted into miniroot (5 minutes). So miniroot is up.
Fine. Mount the / partition and use ed on /a/etc/fstab. Panic,
dup ialloc. The vulcan neck pinch had introduced a slight corruption
in the filesystem. But how to preen it? fsck is in /usr, and it’s
dynamically linked. Sigh.

The solution was to mount the usr partition as /usr right on top
of the home partition, run fsck to preen the root partition, reboot,
mount /usr again, then remount / read-write, change /etc/fstab
and reboot again. So all was ok after an hour of fussing.

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: