How do I diagnose cause of corrupt verify? and a THANK YOU to Acronis!

Discussion in 'Acronis True Image Product Line' started by bob44, May 17, 2008.

Thread Status:
Not open for further replies.
  1. bob44

    bob44 Registered Member

    Joined:
    May 6, 2008
    Posts:
    7
    Hello,

    This post is both a thank you to Acronis, and a request for help.

    I have built a new PC with two new SATA drives (one Western Digital, one Seagate).

    I backed up one drive to the other using Acronis 10, then ran verify. Acronis says the archive is corrupt.

    At first, I thought ATI must be buggy, especially after reading some of the posts on this forum, but I ran some tests of my own.

    If I create a large, 2GB .RAR file, and copy it between drives with the "COPY /V" switch, twice (out of at least 200 tries) COPY will say there was an ERROR!

    I have been using computers for almost 20 years, and building them for 15. I have never had an issue with data corruption - that I know of! I am incredibly depressed about this.

    By seeing the "COPY /V" fail, I know that the issue CAN NOT be a flaw with Acronis's verify, but an issue with my new computer.

    I would never have caught this if it weren't for ATI's very thorough verification process! Thank you!

    Can anyone help me solve this issue?

    I've tested both drives with the DOS-based diagnostic boot disks from WD & Seagate, full scan, no errors.
    I've run Memtest+ for 8 hours, no errors.
    Prime95 and Orthos for 8 hours, no errors.
    There are no error messages in the Windows XP (SP3) event log.
    I'm using the latest drivers.

    I don't know what else to do! I don't have access to other hardware to help in testing (ie different motherboard, power supply or RAM).

    Again, thank you to Acronis for spotting this error before I moved my data to these drives, and thanks in advance to anyone who can help me.
     
  2. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    Faulty TI validates mean that TI can't properly read the archive file and recreate all the checksums so this means anything that would impact on that function is suspect. RAM, cables, drives, processor, and the program. However, you ruled out it as being TI's fault with non-TI app failure.

    The one problem I had with TI validates was caused by a marginally bad SATA drive cable. During normal operation there was no indication of a problem but TI wouldn't validate an archive most of the time. Unlike your situation, I did get a message in the event logger which pointed me right to changing the cable!

    There was a fair bit of problem with SATA cables a while ago and they were indeed considered a weak link. I actually had this problem with 2 SATA cables which came with my Asus motherboard - I replaced the first bad one with the second one and it failed after a short time. Replacing them with ones from another manufacturer fixed the problem.

    Also, make sure your SATA cables are not extra long, there is a spec for the maximum length.

    Are your drives SATA2 (3.0 Mb/s) as well as your motherboard? Since it is a new system I assume they are but if your drives have a jumper try forcing them to SATA 1(that's what people usually call the 1.5Mb/s transfer rate). Not all drives have the jumper these days.

    You may find that you can see what your power supply voltages are by looking in your BIOS or perhaps you have a voltmeter. I doubt this is a problem because of your extensive diagnostic testing but a diagnostic is only an approximation of real life.

    Having said that about diagnostics, see what your memory timings are. The PC should not be overclocked or use aggresive memory timings for reliable operation. Set the memory to SPD if it isn't. SPD settings for "high-speed" memory often is more relaxed than what the memory is rated for.

    If your PC will run with only one stick of memory and you have 2 or more, try running with each stick in turn -diagnostics aren't the same as running the system in real life. In my previous life, we used to say that the only diagnostic that mattered was the OS and apps.

    Good luck and let us know if you find anything further.
     
  3. jmk94903

    jmk94903 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    3,329
    Location:
    San Rafael, CA
    First let me congratulate you on good detective work in proving that there is a problem that is not True Image (TI). So many people just blame TI in a case like this and won't accept that there is a system problem. And, I agree that it's really depressing to find this problem on your new system.

    Do you have the latest BIOS for your new motherboard? Large file errors can be caused by BIOS errors in rare cases.

    One simple thing to try is to boot from the TI Recovery CD. Try to validate your backups from that Linux environment. If the old backups don't validate, create a new backup and see if that validates. If it does, then something in Windows may be the problem. If it doesn't, then it's probably a hardware problem unrelated to Windows or any specific drivers.

    Try splitting your backups into 1GB pieces with the split option in TI. Sometimes small files are handled correctly while large files will show errors. If the split backups can be validated, it's almost certainly a hardware problem. Some systems only exhibit problems with large files, and this often shows up with files larger than 1GB.

    Instead of just the Copy command, you might run CRC checks on the file before and after copying between the drives. This often shows up the problem every time rather than sporatically. Here's a good CRC checker:
    http://www.brandonstaggs.com/filecheckmd5/

    Of course, replacing the RAM would be a good test, but I understand that you cannot do that. Can remove half the RAM and repeat the validation or backup creation and validation, that may be helpful. You can then remove that half of the RAM and install the other half and try again.

    If swapping RAM can't be done or finds nothing, re-run Memtest but let it run for at least 24 hours. Some memory errors are very hard to find and only appear after 24-48 hours of testing. Since memory is so often the cause of problems, it's worth the long time to eliminate this "easy" fix.

    SATA cables can produce errors and they are the cheapest component. Replace one or both cables and repeate the validation, new image and validation tests.

    If the above finds nothing that works and validates backups successfully. Then things get harder. I certainly wouldn't expect either of two new hard drives to be defective, but not much else is left that's replacable easily. One new drive would allow testing.

    If it's not the BIOS, Windows, RAM, cables or hard drives, what's left but the motherboard? As painful as it sounds, some motherboards create errors, and sometimes it's actually built into the board and not simply a defective unit. I hope this isn't the problem that you are having, but it's worth contacting the manufacturer to see if they know anything.

    Let us know what you find. This one will be an eduction for all of us.
     
  4. bob44

    bob44 Registered Member

    Joined:
    May 6, 2008
    Posts:
    7
    I am having trouble reproducing the error.

    I have not changed my setup, and have run the backup with validate 10 times in a row, and it is OK each time. But it DID fail several times earlier in the day. This is driving me mad! :doubt:
     
  5. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    It is often better if the thing is dead in the water rather than marginal but such is life.

    Anything change on system such as hw/sw addition deletes? Room temperature significant changes, ...

    I agree with jmk94903 that a 24hr run of Memtest86+ wouldn't be a bad idea if nothing else turns up. For marginal errors I find that the Memtest random pattern test is most likely to pick up the problem so you could turn off the other tests and just hammer it with this one if you wish.

    Although I wouldn't likely do it now since you can't reproduce the problem, another thing to do is to reset the BIOS to all the default settings just in case there is an aggressive setting somewhere.

    Have you tried the TI rescue CD yet to do a validate? It uses Linux and so is different from the Windows environment. Note that this CD must properly run if you wish to do a restore so it must be tested anyway.
     
  6. Xpilot

    Xpilot Registered Member

    Joined:
    May 14, 2005
    Posts:
    2,318
    Another way of testing RAM is to use this test method
    http://oca.microsoft.com/en/windiag.asp
    It reads well but I have not yet used it myself.

    Tracking down intermittent faults can be one of the most time consuming and frustrating experiences.
    I have had such problems in the past. However my way of using TI enabled me to live with the odd failure and still sleep peacefully at night.
    It was only when the fails came thick and fast and eventually showed up in the computer's POST checks that I was stirred into investigating and thankfully was able to effect a cure. It was a poor connection in the 5.5V to the main hard drive.

    So my answer to "How to live with intermittent fails and still sleep at night?" is forget all about running image verifications. They become superflous if you run restores to a spare drive.
    I use a spare drive bay to which is fitted a removable drive slot. The replacement drive/s are in individual drawers something like these.
    http://www.startech.com/category/parts/data-storage/removable-enclosures/sata-internal/list.aspx
    I prefer the sort in covered trays but they all work. With a bit of shopping around they can be had for far less.

    The method I use is to make an image of the current working main drive. Now remove the main drive and put it safely to one side. Put in the replacement drive, boot from the recovery CD and in a few minutes you are up and running on the spare drive.
    If the rogue fault occurs and the restore fails just swap the drives over, re-image and repeat the process.

    This method,which I use all the time, is very appropriate for rare and random failures because at no time are you putting the contents of your main drive at risk and you will have a ready to go copy safely tucked under your pillow.

    PS. I have read on this group that it is possible for Windows to choke if there is too much RAM installed. If you have more than 2 GB installed remove some and try your C/V test again with some large files.

    Xpilot
     
  7. bob44

    bob44 Registered Member

    Joined:
    May 6, 2008
    Posts:
    7
    An update...

    My system has 4GB ram, 2x2GB.

    Running the Microsoft "Windows Memory Diagnostic" boot CD gives errors when doing the extended test "MATS+ uncached". However, all the other tests pass, and other programs like Memtest86 and Memtest+ have 0 errors, after 24 hours.

    I'm wondering how reliable this test is, or if this is just a red herring.
    Has anyone used this program? Specifically, with 4GB ram? I see almost no discussions online about it. It looks like it was written in 2003, before 4GB of memory was common. If Memtest was giving this error, I'd be more sure it's a RAM issue... but I've never heard of this Microsoft program until reading Xpilot's post.

    The addresses given as bad are always high, ie "dbfebb68". Most of the errors are "ffffffff" being read as "7fffffff", but sometimes I get "ffffdff".

    If I remove one of the two 2GB sticks, the test passes! Does not matter which stick I use, or which slot. Also, Acronis seems to validate OK.

    But when I put a 2nd stick back in, the problems come back. (It seems like there are less errors if I arrange the two sticks in single channel mode vs dual channel.)

    Has anyone ever heard of something like this?
     
  8. Xpilot

    Xpilot Registered Member

    Joined:
    May 14, 2005
    Posts:
    2,318
    I have never tried to run XP Sp3 with 4 MB of RAM .You will find it is a full head on No No.
    XP will choke on it, as you have found from the Microsoft memory test I suggested, and this excessive amount of RAM will also cause TI to lose its way.

    Xpilot
     
  9. laserfan

    laserfan Registered Member

    Joined:
    Jan 19, 2005
    Posts:
    117
    What motherboard and RAM are you using? Latest BIOS?

    My mobo has 4 DIMM slots and while it works perfectly (dual ch 128bit) with 2 slots filled I've never gotten 4 to work! Though in my case it's likely because I didn't get 4 matching DIMMs.

    Anyway, post detailed specs?
     
Thread Status:
Not open for further replies.