[kde-linux] Using a kubuntu LiveCD to stress test a troublesome Windows system
Duncan
1i5t5.duncan at cox.net
Sun Jul 29 00:01:22 UTC 2012
Doug posted on Sat, 28 Jul 2012 15:48:59 -0400 as excerpted:
> On 07/28/2012 02:57 PM, Bruce Miller wrote:
>> Please forgive the elementary nature of these questions. A
>> recently-widowed friend has asked me to help troubleshoot her late
>> husband's large and powerful Windows 7 system. I am myself somewhat
>> preoccupied with other matters and am sure that I am missing one (or
>> more) obvious solution(s).
>>
>> The major constraint is that her late husband was a professional
>> photographer and he has image files everywhere on his system. Even I
>> can recognize that he was a truly gifted photographer, but that his
>> computer skills were not to match.
>>
>> There is exactly one USB backup drive and so far I have no idea whether
>> the backup is complete (I doubt it) or restorable. I am being careful
>> therefore to limit anything I do to non-destructive testing
>> Since I almost always use Kubuntu, my primary resource (at the outset)
>> is a CD of Kubuntu amd64 12.04 LTS.
>>
>> The obvious symptom on the Windows 7 machine is a BSOD (Blue Screen of
>> Death) almost every 20 minutes.
>> The first exception code on the BSOD is a memory location of<multiple
>> zeros>1. Her tech support person (owner of a well-known local computer
>> store and fellow photographic expert) says that the single digit in the
>> exception code suggests a hardware failure, probably on the
>> motherboard.
>> I have been running Kubuntu from the liveCD for over eighteen hours
>> which does not support the defective motherboard hypothesis. But my
>> mind has gone blank on how to stress-test the hardware without writing
>> to any hard disks. Google has not been a friend; on this problem, I
>> have found it surprisingly unhelpful.
I don't do MS any longer, and you mentioned google already, but if you
haven't already, try counting the zeros and googling the EXACT error. I
would have tried but without knowing the exact number of zeroes...
>> So far, I have been running just a browser, a konsole session and
>> glxgears. The latter will put some load on the CPU and more on the
>> graphics subsystem. Google did lead me to a sourceforge utility called
>> systester which at least tests the CPU by attempting to calculate pi to
>> many millions of significant digits. But I cannot get it to run.
The first thing I thought about here was memtest86 (or memtest86+, a fork
of the original, note that sometimes one might run while the other one
doesn't so if you try one and it doesn't, see if you can find the other
to try). Some distros ship with it on their liveCD/DVDs and/or install
media (I'm not sure whether Ubuntu does or not). It would be an option
in the boot menu, as it runs in place of the normal Linux kernel.
Typical usage is to run all tests several times thru, say overnight. If
after ~8 hours of testing it's not showing any problems, you can probably
rule out the memory itself as a problem, tho it's still possible you'll
get bus errors under test. (I once had a system that tested fine on
memtest86+, but that would occasionally have memory bus errors under
heavy real-world usage, gentoo, so building packages from source, etc.
Eventually a bios update came out that let me underclock the memory from
the rated speed, and at a notch down from rated speed, I could actually
tighten up various individual memory latencies and it was still solid as
a rock... it just couldn't quite handle the rated speed on that board,
probably because the board tolerances skewed one way while the memory
tolerances skewed the other... just enough to cause occasional problems!)
So a clean result doesn't necessarily say absolutely no problems, but if
you get errors, definitely consider the memory unreliable.
Another alternate, not as thorough, but a kernel option so it's quite
easy to ship a kernel with the option turned on, making it available
sometimes when memtest86* isn't, is the kernel memtester. If it's
available, booting the kernel with memtest=<number> will run that many
test patterns. 0 of course disables (the default), 1 is one test pattern,
4 is four... Note that I've never actually run this myself, only seen
the kernel option, so I'm not sure how long the patterns take, or whether
it tests and then boots normally if it passes, just gives you the results
and lets you reboot, or what. But you may have the option built into the
kernel on the live media, so it's worth a try if you don't see memtest as
an option.
>> So my questions are the following:
>> 1. Does anyone have suggestions on safe non-destructive hardware
>> stress-testing applications? I have no objection to downloading and
>> burning to CD a specialized distro.
The distro that immediately came to mind here is SystemRescueCD. Lots of
info at the link below, but among other things, it has memtest86, various
recovery and hard-drive testing (smart, etc) tools including PhotoRec,
specifically designed to help with photo/video recovery (looks right down
your alley ATM), the usual Linux filesystems, PLUS NTFS-3G for MS/NTFS
filesystems, vfat of course, the chntpw windows password reset utility
(doesn't appear to be needed here, but invaluable if you have friends
that forget their password and call you...), all kinds of stuff.
http://www.sysresccd.org
>> 2. Is anyone familiar with systester?
I can't help you there.
>> The system is an Intel i7 with 8GB of RAM. It is a 2008-vintage Asus
>> P6T motherboard; the catch is that it takes an LGA1366 CPU, which is no
>> longer available. Replacing both the motherboard and CPU will cost
>> north of $500 and my reluctance to recommend that my friend spend that
>> sort of money is heightened by the last 18 hours of evidence that the
>> problem is not with the MB.
> Instead of spending $500, and not even knowing if that would solve the
> problem, why not spend less than $100 and buy a portable hard drive.
> Then, having booted up some Linux system from CD, copy all the picture
> files off the system to the portable hard drive. Or even copy the whole
> hard drive for cleanup later, if you're short of time.
I'll second that. You can memtest86 easily enough since that doesn't
touch the hard drive (which you can even unplug entirely during that
test, if you like). Then consider getting a drive image ASAP. You can
even take the old drive out of that computer and plug it into another to
do the imaging, if you're unsure of the reliability of the existing
hardware.
But do get that second image (or even a third) and work with that. At
this point, it sounds like the photos and other data on the disk is all-
important, an irreplaceable connection to someone no longer around. So
taking a copy of that (or two or three) is likely to be the best
investment possible at this point, STILL well under the $500 mentioned
for new hardware, and you can then freely work with one copy, without
having to worry that you're risking the only copy of the data available.
SystemRescueCD is one tool that can help with that, and is all around a
good live-image to keep handy, for all the other stuff it makes possible
as well.
There's also clonezilla, a distro purpose-built for disk image cloning
and recovery. (FWIW, it also has memtest.) You'd want the live version
for single machine use -- se, server edition, is for massive simultaneous
deployment (40+ systems...).
http://clonezilla.org/
Of course you could remove the disk to another machine for cloning, then
do memtest86 or other direct-from-CD/DVD tests on the now diskless
original barebones system, if you wanted to try doing both at once and
have another system available to stick the hard drive in for cloning...
Meanwhile, for testing the cpu, the product that comes to mind is cpuburn
and/or cpuburnin. I've never actually used these and just googled them,
but they should test the cpu and the system's thermal solution quite
well. I'd suggest using a live-boot with the hard drive disconnected,
and monitor cpu and system temps for a few minutes before leaving it to
run, just in case the system does NOT have the best cooling, etc (tho
anything even half modern has hardware thermal shutdown on the CPU, to
keep it from actually burning up, that should in theory save you from
totally ruining it, but monitoring it's still probably wise, just in
case).
FWIW, google says cpuburn is available for ubuntu, and I've actually not
looked to see what live-distros carry it (tho tomsrtbt is mentioned), but
I would recommend a live-distro for running it, as I said. But since I'd
have to do the same googling you'd have to do for that, I might as well
let you do it. Here's where I started, tho.
http://www.google.com/search?q=cpuburn+linux
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
More information about the kde-linux
mailing list