Question for the old hands, about disks

Fri Apr 15 11:55:11 BST 2011

gene heskett posted on Thu, 14 Apr 2011 23:54:39 -0400 as excerpted:

> Well, I hitched a ride over to seacrates site and picked up the latest
> firmware updating iso, this about 3 days ago, maybe 4, time has been a
> blur here since.

Seeing the below, I can certainly understand why.

> Following their instructions, I pulled the cables on the other drives
> and booted the cd I burnt.  It updated the firmware on that drive, to
> cc49, from some version in the late 20's.
> 
> So now the drive is stable, but the write speeds are running about 2
> megs/sec.  I replaced the SATA cable with another and doubled the from
> platter read speeds, but they still don't match the other nearly
> identical drive.  But its working with no errors now.

Sounds like they made it try more times before resetting, and possibly, 
now use a more complex ECC mechanism that can recover data that it would 
return an error on before.  You get stable, as data that's difficult to 
read correctly gets read many more times before it gives up, but slow, as 
the retry and/or ecc-recover take time.  (It may be that data that had 
been written with the new firmware would have better ecc and could be 
recovered faster, but it's dealing with data written with the old 
firmware, so...)

But the 2 MB/sec rate sounds like a failing drive too.  Unfortunately...

FWIW, my experience in this are goes back to the event I believe I 
mentioned earlier, the system left running when the AC failed in the heat 
of a Phoenix summer, overheating the drive resulting in a head-crash.  In 
the aftermath of that, I used IIRC dd-rescue on it.  On the parts of the 
disk that weren't damaged, it comparatively flew, but when it got to the 
parts that were damaged and was trying and trying to read them, and dd-
rescue tries until it can't read in one direction, then tries from the 
other end until it can't read, then tries spots in the middle to see if 
the whole area is damaged or just some and if it can read any, it expands 
what it can read in both directions from there until it can't read again, 
so it spends a **LOT** of time trying to read damaged sectors...

It was as you say below, like watching paint dry!

So I know the feeling!  Unfortunately!

> What pulled the plug and flushed the whole system was when I plugged in
> the cable to /dev/sda and booted the cd, it updated the firmware on it
> too, and somehow managed to also do the MBR on the drive so that even
> selecting it as the boot device in the F8 bios menu, it simply hung with
> a blank screen deadlock.  It is still that way despite fdisk showing it
> as bootable, and I have mounted the individual partitions and have the
> majority of the data recovered now.  But, that drives partition labels
> are meaningless & scrambled so that the partition bearing the sea-slash
> label, is actually, from its contents, the old /opt partition.  Other
> labels are similarly miss-placed.  Entirely possible since all the
> distros have fallen in love with the UUID numbers because they are
> supposedly more unique than a human generated label.

Sounds like the firmware update on it redid the way it handles the data 
areas that store critical partition data, thus scrambling some of it.  The 
vast majority of the drive is fine, the partition layout map itself is 
apparently fine (tho I'd be careful with it as it's possible the 
partitions overlap now and writing to one might damage another, inspect 
the partition data with fdisk or the like to be sure...).  But the data 
about those partitions, the UUIDs, labels, etc... not so fine.

FWIW, this is one reason I switched here to GPT based partitions, from the 
legacy MBR based system.  GPT stores two copies of the partition data, one 
at the beginning of the drive and one at the end, and unlike mbr, 
checksums the data.  If it gets corrupted, the checksum is bad and it can 
try reading the other table to see if it's good.  Even if it can't do so 
directly (the onboard logic must be small and simple), there's a good 
chance that gptfdisk (formerly gdisk, one gpt form of fdisk/sfdisk, etc, 
tho there are other tools available) or other gpt disk partitioning tools 
can recover the partitions.  Ever since that bad AC triggered head-crash 
event, I've been rather more paranoid about data integrity than I used to 
be, and this seemed to cover one of the gaps left by my 4-way md/RAID-1 
setup rather well.

The other big advance of GPT is that it does away with the primary/
secondary/logical partition distinction.  All partitions are treated the 
same, with the minimal reserved area specified by the spec allowing 128 
such partitions and expansion from that (if anyone were to actually find a 
reason 128 partitions isn't enough, here, I find that simply tracking them 
starts getting difficult after about two dozen or so, even with the help 
of the GPT partition-level labels (*NOT* filesystem level, actual 
partition level, readable/settable in gpt partitioning tools even before 
the filesystem is made).

(I'm interested in btrfs for much the same reason; it has built-in 
checksumming so will know immediately if the data it gets from the disk is 
bad, and can for instance request it from another disk using its built-in 
RAID-1, if it was configured for that, but as I've explained in previous 
btrfs discussion, that filesystem is still experimental at present -- 
they're only now developing an fsck for it, for instance!  I'm eagerly 
anticipating, but the day I decide it's ready for me isn't likely to be 
for another couple kernel releases, anyway, but almost certainly sometime 
next year, if not second half of this year.  We'll see.)

GPT is part of the new EFI transition from legacy BIOS-based systems, but 
at least on Linux, doesn't require EFI hardware, since the kernel has an 
option that can be enabled for it, grub can understand it (grub-2 
directly, apparently, and there's patches for grub-0.97 or whatever, which 
is what I'm still using, the patches being built-in for gentoo's grub) and 
I believe current lilo does too (tho I'm not sure of that), and a number 
of the partitioning tools do as well (I already mentioned gptfdisk, and 
libparted based partitioners do as well).  That's the bootloader, the 
kernel, and the tools all three, so Linux is good to go, even on BIOS-
based hardware systems.  (FWIW, Apple's already EFI, and for MS, it 
depends on the version, but at least some require EFI based hardware to 
handle GPT on the boot device, but handle it on data devices fine with 
just BIOS.)

> Anyway, I installed a freshly burnt version of pclos, and have been
> watching paint dry at 2megs/sec while mc gets my data back.

"Watching paint dry".  Yeah, that's about accurate when the disk is having 
to reread the data several times to get it correct, due to damage or 
whatever.  As I said, I've been there.

> And yes, I run amanda, so it ought to be easy, but my next outgoing msg
> will be to the amanda-users list because I can't get it to build,
> missing includes bailouts.  With zero clues as to what the package name
> that would fix it.  And since the rpm packages are a totally different
> setup, I'll wait till the local build problems get sorted, its one of
> the few programs that I have been building and using the svn versions of
> since back in the late 90's.

Well, at least the rpm metadata should tell you what the direct 
dependencies are, giving you /some/ clue as to what packages it needs.  
And the same for the srpm, which should give you the build-time 
dependencies as well.

Of course, when you're building it on your own, you may not enable all the 
options the rpm versions do, so you might not actually need /all/ those 
packages, but that's the list I'd start checking against if I ran into 
build-time header errors on a normally rpm based system.  (It's what I 
used to do back in the early '00s, on Mandrake, anyway.)

FWIW, rpmfind is a great resource for this sort of thing.  Try the 
following link, click on the html-page link for a version that looks close 
to what you're trying to build (rpm and srpm, the rpm for run-time deps, 
the srpm for build-time deps), and take a look at the dependencies it 
lists.

Or download the srpm from pclinuxos or rpmfind, and use rpm's "install 
dependencies only, dryrun" functionality, to see what it thinks it needs 
that you don't already have installed.  You can then do it without the 
dryrun if you think it looks right, and that should set you up for 
building.  (Unfortunately, it's been years since I worked with rpm, and 
the switches for the above, even if I did remember them, may well have 
changed by now.  But the functionality is there, with the manpage 
explaining if you need it, I should think.)

Or maybe you're already doing that, having taken for granted that you 
didn't have to actually spell it out, and it's still throwing missing 
headers errors.

Meanwhile, as they say, /test/ your recovery plan, because those backups 
are only as good as your ability to recover them. =:^)  

IOW, if you're going to use self-built backup software, be sure to keep 
copies of it off-disk somewhere, that you can quick-copy back if 
necessary, for recovery.  And test that those copies work on a clean-
install.

Either that, or don't use backup software, but instead, simple copies, as 
I do.  I copy whole partitions at a time, for root, by bind-mounting it 
without overmounts elsewhere, so I can copy the root partition itself, and 
only the root partition.  I use normal copying, either cp in archive mode 
or filemanager copying (mc used to do this well, but the new version 
copies sparse files un-sparse, making them take FAR more room on the new 
partition than they did on the old one, so I don't use it for that any 
more), old working partition to the new, freshly mkfsed partition.

To test the backups at a grub-and-/boot-still-available level, I simply 
reboot and feed grub a different kernel append line root= parameter, 
pointing it at the backup root partition (same disk or external, I have 
both, just in case...).

To test the grub-and-/boot-eaten scenario, I boot to the grub enabled
/boot I installed on a thumb drive, feeding it parameters as needed.  The 
same thumb drive grub installation has kernels and sample grub menu 
options that I can modify as needed, for both my main 64-bit workstation, 
and my 32-bit netbook, and in all /boot locations (netbook, main 
workstation, and thumbdrive) I keep a grubnotes file listing various 
additional kernel append parameters that I've found useful over the years, 
while testing live-git kernels, etc.  I can cat that file from grub if I 
need to, thus reminding me whether it's for instance noapic or apic=off, 
if I suspect I'm having problems with it.

So I don't worry about backup software at all, because my backup software 
is mkfs, mount and cp, with the recovery software being grub and/or mount, 
depending on whether it's the root/system or a data partition. =:^)

What's great about it is that since the backup is a snapshot of my working 
system just as it was when I made the copy, I have a fully functional 
rescue partition too.  No limited function thing, the real deal, including 
all the software I had installed when I made the backup, everything.  So 
if my main working copy doesn't run, I can simply boot the backup, google 
a solution if I need to, and fix the working copy.  If it takes awhile, no 
big deal, I have the usual /home (or a backup if it was lost too) mounted, 
and can do all the usual stuff I'd normally do, checking mail and 
newsgroups, keep up on my rss feeds, all that, in the mean time.  And if 
the former main working copy isn't recoverable, I simply update from the 
backup I'm already running, no problem.

> Anyway, I haven't fallen over, yet, just occupied trying to get a new
> install that didn't use the partitions I gave it, sorted, but now I have
> /home, /opt, /root, and /var back on their proper partitions.
> 
> One positive side effect seems to be that a lot of my kde4 problems have
> vanished too.  And I think I have it all back to 4.6.2 now too.  As I've
> rebooted 20+ times as I get the data moved & the only time I lost any
> config was when I mounted the new /home partition over the top of the
> install directory and rebooted after fixing fstab.  But give me time,
> I'm sure I can screw it up again. :-)

Gotta make it on topic! =:^)

Glad it did seem to fix things up for you, kde-wise.

> Cheers, Duncan & now to see if my recovered kmailrc can still send an
> email.

Seems it worked. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

___________________________________________________
This message is from the kde mailing list.
Account management:  https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.