Corrupt/Can't Verify Corrupt Archives: Let's uncover the problem!

Discussion in 'Acronis True Image Product Line' started by johnmeyer, Sep 11, 2007.

Thread Status:
Not open for further replies.
  1. Xpilot

    Xpilot Registered Member

    Joined:
    May 14, 2005
    Posts:
    2,318
    Perhaps someone could rattle the MUSTANG stable door:D .

    A thought has just ocurred to me. Is johnmeyer using the Backup Locations feature or a Secure zone for his archives? Because if so the rotten apple apple principle could apply. One old bad image can cause the latest and all the previous images to fail validation.

    Xpilot
     
  2. mustang

    mustang Developer

    Joined:
    Apr 12, 2005
    Posts:
    905
    Since you asked, here are my thoughts on the problem. First, it is a big problem with this program and it is getting worse. Even though TI hasn’t changed in quite some time, the number of corrupt image complaints in this forum is increasing. I’m going to take a middle of the road position between blaming the problem on hardware and software. There’s no question the TI engine needs work. The corrupt image problems with TI are far greater than with other imaging programs. On the other hand, it works perfectly on many systems. I think the problem has grown as system speeds have increased. From a marketing standpoint, it’s easy to understand why speed is being emphasized over system stability. Another factor is the complication introduced with using external USB drives. Certainly, some USB drives work better than others. Also, high quality motherboards matched with manufacturer certified memory usually produce less corrupt images.

    To johnmeyer, here are some things I strongly recommend you try in your testing:

    1. Slow down memory timing settings in your BIOS. Following are the settings I see in my BIOS:

    a. “Configure DRAM Timing by SPD”. Set to Disable.

    b. “DRAM CAS# Latency”. Increase number this to slow down memory.

    c. “DRAM RAS# Precharge”. Increase this number to slow down memory.

    d. “DRAM RAS# to CAS# Delay”. Increase this number to slow down memory.

    e. “DRAM Precharge Delay”. Increase this number to slow down memory.

    f. “DRAM Burst Rate”. Decrease this number to slow down memory.

    g. “Performance Mode”. I have only Enable or Auto. Leave at Auto. Disable if possible.

    h. “DRAM Idle Timer”. Leave set to Auto.

    i. “DRAM Refresh Rate”. Decrease this number to slow down memory.


    CAS Latency and DRAM Refresh Rate are the most important. You may even want to try reducing the DRAM Frequeny setting in your BIOS. You may have to hunt for this. Configure System Frequency to Manual and look for DRAM Frequency.

    2. If slowing memory timing doesn’t help, try removing half of the memory from the system. This can be a huge factor. This is a very important step in diagnosing the problem. Don’t be too ******* smart and skip this test. If you only have one stick, well that’s an excuse. Adding memory to a system can cause instability. This makes sense. More memory makes data move though the system faster. This can cause instability at a weak link downstream in the system. It’s not necessarily the memory that is the problem, but we can control memory. A rule of thumb I came up with is to limit memory to ¼ of the maximum allowed by the motherboard manufacturer.

    3. Try using TI8. It had very few corrupt image problems. The progression from TI8 to TI9 is very interesting. When TI9 first came out the corrupt image problem exploded. The image format was the same as TI8, but new features had been introduced in TI9. You could prove it was a software issue by restoring TI9 corrupt images with TI8. At a certain build, Acronis redesigned the engine and changed the image format. This eliminated most of the problem. Now, as system speeds have increased, the problem is coming back. You’ll have to make new images to try TI8. I’d be willing to bet your problems would disappear if you were using TI8. Obviously, you solve the hardware limitations of the TI8 Recovery CD with BartPE and the TI8 plugin.



    There's no way we are going to solve this problem here in the forum. Acronis needs to redesign the program for that to happen.
     
  3. jmk94903

    jmk94903 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    3,329
    Location:
    San Rafael, CA
    Yes, a self healing backup image would be very nice. It would be well worth the extra size for the assurance that one bad bit read wouldn't prevent an image from restoring.
     
  4. jmk94903

    jmk94903 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    3,329
    Location:
    San Rafael, CA
    I've never seen one, but it would certainly be a huge help in testing file transfer errors.

    I really wonder if the USB interface is the root of the problem with USB drives. If a system never has errors with internal drives but reports corruption with USB drives, the answer may be to use Fireware instead.

    I can't recall anyone reporting the corrupt backup problem on an external Firewire drive.
     
  5. Xpilot

    Xpilot Registered Member

    Joined:
    May 14, 2005
    Posts:
    2,318
    Hi Mustang,
    Glad you responded to the call in your usual helpful and incisefull way.
    The point about the possibility of too much memory causing problems is something which would not have ocurred to me ever!
    It seems to be the fashion these days to pile in as much memory as possible either by manufacturers or users so they can feel they have the best. I wonder if these over endowed computers really perform any better than their leaner counterparts.

    Xpilot
     
  6. tjhb

    tjhb Registered Member

    Joined:
    Mar 20, 2005
    Posts:
    11
    I'm grateful for this thread John.

    I use TI10, but I can't recall how bad I found this problem under TI9. I think 10 is slightly worse in my case.

    With one or two exceptions, I've only had a corrupt image problem if I've done one of these things:

    (a) made a new image to a folder already containing Acronis images;
    (b) made a new image to a new folder, created by TI.

    I've mainly had these problems on an external USB drive, only once I think to firewire. (I never make images to an internal disk, just my habit, and I always use the ordinary full CD for image creation and restoration.) Always to NTFS (see below).

    So what I do to avoid problems is always to create a new image to the root folder of the external drive (which is otherwise empty, except for its subfolders) then, after rebooting to Windows, move it to the folder where it belongs, and check or reset its NTFS permissions. I usually validate it at that point too.

    My feeling is that the Linux file system drivers Acronis uses on its bootable CD do not handle NTFS permissions correctly, or possibly NTFS metadata; chiefly folder permissions/metadata perhaps. (A folder created by Acronis from CD conspicuously only has the SYSTEM permission on it; or is it only Administrators? I've forgotten because I avoid this now.)

    Given this feeling, it would be sensible for me to use FAT32 on my backup drives, but I don't. I prefer NTFS, even though I think the Acronis drivers are not within spec.
     
  7. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    I'm in the never have a problem with validating or restoring images. My only problem was caused by my often mentioned marginal SATA cables. The cables caused no obvious problems in normal operation but caused TI to get validate errors. The lack of problem with normal operation was probably due to sucessful disk reads on retries. The Windows Event logger did indicate there were SATA cable issues.

    I think it is difficult to really say if there are more corrupt image messages appearing. If there are, a factor might be the increased redesign of motherboards for the dual-core processors along with their increased speed and perhaps a need for different "drivers".

    The corrupt image declaration is not saying that the data contained in the archive is or is not a true representation of your disk contents. It is saying that the archive cannot be read such that the resulting data extracted matches the archive contents. This is done by 4000 checksums per gigabyte of data. Naturally, the problem can be reading the disk the archive is residing upon, bad memory buffering the data when the checksum calculation is made, or any other hardware flaw such as (unlikely) CPU failure. If you do the MD5 checksum test for file copies, you are doing the same type of validation except with only 1 checksum. The MD5 test can fail for the same reasons but there is a chance it may map memory differently so the odds that a bad result is caused by the disk transfer is increased.

    Validating TI archives in Windows does not mean they will validate with the Linux environment. They are totally different and generally the Linux environment is the poorer one in terms of HW support. You must do a few validates with the Linux CD to verify it works with your HW. After that, the Windows validation should be adequate. Also, a full restore a few times is a good test. If you have done the preceeding, you do not have to restore every archive to test it. Always keep more than one archive!

    I don't know if it is because the owners are more likely to mention the brand of their PC but on the surface, Dell PCs seem to have more grief with TI. ??
     
  8. tgirard

    tgirard Registered Member

    Joined:
    Sep 16, 2007
    Posts:
    1
    Hello All,

    I had the issue occur today, made the image to external USB drive from within Windows XP Pro SP2, mounted the image all OK. Replaced noisy hard drive, restored system using recovery CD (Don't Ask, SATA issues with Boot CD), installed Acronis TI 10.0.4942, went to restore image and it said it was corrupt.

    I could however select another image on same external drive, in same folder and it verified OK.

    I found the issue to be the NTFS permissions and ownership on the image file. I took ownership and gave my user account full control access, as well as Administrators group, file is restoring as I type this.

    Hope this helps, as I have had the same issue in the past.
     
  9. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    Here's a small amount of additional information:

    I did an image backup directly to my one Firewire/1394 drive (the one that, when I copied "corrupt" image from USB drives to this one, they validated as OK). I did a validate immediately after the backup, and the validate failed ("corrupt").

    Thus, I have now experienced this on six USB drives and one Firewire drive. I actually did cable testing on this drive not many months ago, and I am 99% confident that the high-quality cable I am using is good.

    In answer to an earlier question, I have not experienced a problem backing up directly to an internal drive, but I have very seldom done this because it seems like VERY poor backup practice. Any power glitch or aggressive virus might take out any drive permanently attached. External drives are very nice because they are not connected to anything when not in use and therefore are only subject to theft, fire, and failure (although I use them for only a few hours a week, so the failure risk is exceedingly low).

    Someone mentioned in an earlier post that the compression level may make a difference. I will try testing this over the next few days. I always use one of two highest compression levels in order to save space. The next few times I will try the "normal" compression (i.e., the default) and see if I the "corrupt" problems go away. This seems like a long shot, but what the heck ...
     
  10. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    It might be considered poor practice if it is your only backup. I always use an internal HD as my primary backup and copy via Windows Explorer the valdiated image to an external HD as an off-line backup. I do not copy every image to the external HD since I have encountered no problems with the images on the internal HD. I do live in an area where electrical storms happen in the summer but this hasn't been a problem but it obviously could be.

    My primary backup is internal HD, secondary backup is external HD, tertiary backup is a file server. I also copy the occassional backup to DVD and I keep some DVDs at a friend's house. I also spend more time considering the security of personally created data files (pictures, spreadsheets, etc) than the OS and apps.

    I guess I could say that I consider using an internal HD as a good backup practice (with some qualifiers) because it works reliably and it is fast which means I do backups before trying out some new software or bright idea.
     
  11. jmk94903

    jmk94903 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    3,329
    Location:
    San Rafael, CA
    But if the backup always verifies when created on the internal drive, that's important information.

    It means that no errors occurr when writing to the internal drive to create the image. It also means that there are no errors reading the image from an internal drive to validate it. It means that TI behaves as we all believe it should when writing to and reading from internal hard drives. The problems, when they occurr, are with external drives.

    Since some backup images created on external hard drives which fail to validate can be copied back to the internal drive and validated, it's clear that the prime error is reading from external drives when validating and restoring.

    Why should that be so hard?
     
  12. Brian K

    Brian K Imaging Specialist

    Joined:
    Jan 28, 2005
    Posts:
    12,146
    Location:
    NSW, Australia
    Same here. I never do backups directly to external HDs. I copy occasional backups to external HDs at a later time for redundancy purposes.

    I guess I'm one of the lucky ones. TI has never failed to image, verify, restore or clone.
     
  13. Xpilot

    Xpilot Registered Member

    Joined:
    May 14, 2005
    Posts:
    2,318
    In the past I never had any reliability problems with external drives only those of slowness and inconvenience. (version8 and some V9)

    Now I only use internal drives for backup imaging and restores. This has speed, reliability and scheduling advantages over external drives.

    ALL dangers associated with permanently connected hard drives are avoided by having them in removeable drawers. This also gives simple physical off site security possibilities.

    At any one time I have two restored hard drives outside the computer. Both of these have been the main drive in the last two days and either can be re-inserted very quickly.

    Because of this redundancy I only keep one copy of backup images which remain on a slave drive in the computer. If I was to be fanatical about older backups I could instal a second drive rack for swappable backup drives but for my purposes that would be OTT.

    I do not run validations and over the last year and a half the odd restore has failed from "image corruption" this really is a non event as the current main drive is not overwritten and a fresh image can be made with no problem. My actual failure rate is less than 0.5% of all restores.

    Xpilot
     
  14. tjhb

    tjhb Registered Member

    Joined:
    Mar 20, 2005
    Posts:
    11
    This makes good sense to me.

    Why should TI have problems with NTFS only on external drives? No idea. But I think that's the key to this problem.
     
  15. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    Well, I tried "normal" compression, but that resulted in a corrupt file. So much for that idea.

    Good idea. While I don't like the two-step nature of this, I will be willing to do that if it results in reliable backups. I'll try this now and see what happens. It will be interesting to see if the archive validates as "corrupt" once copied. I haven't actually tried taking a good archive made on an internal drive, copying it to the external drive (with verification to make sure the copy is bit-for-bit identical) and then doing a TI validation on the external drive version of the backup.

    I'll report back ...

    [Edit] I just went back to TI (which was still open from my last attempted C: drive image backup) and when I went to edit the script (to change the targe to an internal drive) I got a bunch of error messages saying that the script had been created in an earlier version of TI (I just created it today, so that's not right). I got a few other errors as well. I closed TI and then re-started it, and the problem went away. This sort of thing often happens when the code is over-writing memory that it shouldn't or is in some other way broken. Perhaps this external drive "corrupt" problem is due to some unrelated memory corruption?
     
  16. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    Re: Some progress to report: Corrupt/Can't Verify Corrupt Archives

    OK, I'm getting close to understanding the nature of this problem. I am quite certain that Acronis is partly to blame for this, and it is still up to them to fix the problem, but it is not entirely their fault.

    I'll explain.

    I made a C: image backup to a separate internal IDE hard drive, and verified/validated the backup. I then copied this to an external drive using XCOPY with the /V option to verify (I could have also done a file compare, but that seemed redundant). I then did a validation of this exact copy residing on this external USB drive, and that validation failed ("corrupt"). I then proceeded to attempt validation using various permutations of cables (I tried three different cables) and USB connections on my computer. I found that one cable always gave me a validation failure (even though that was the cable that produced a perfect copy); one cable gave me validation errors sometimes; and a third copy always (so far at least) gave me perfect validation!!

    So it's all a cable issue and the problem is solved, and Acronis is off the hook, right?

    Well, not exactly.

    Here's what I suspect. First of all, every transfer of data -- whether from memory, disk drive, CD, or DVD -- does not go perfectly. There is always a check of some kind to verify that the data has been transferred. The more error-prone the technology, the more checking. With CDs and DVDs, which are one-way devices, and therefore don't allow the original data to be re-transmitted, they not only have error checking, but also extensive error-correction (redundant data). Thus, when a program asks for a chunk of data, if the error checking bits show that the data didn't transmit correctly, the data is resent, or the error-correction -- if available -- is used to attempt to send the correct data.

    Usually all of this goes on "under the covers" and the application asking for the data isn't aware of the retry. However, many programs -- especially in the "old days," when we needed to get more performance -- bypass the standard Windows calls and attempt to deal more directly with the hardware. This has traditionally always been true of backup programs. Some of you may be old enough to remember Fastback, one of the first backup programs. It wrote directly to the floppy drive and even used its own proprietary formatting for the floppy. As a result, the backup was very fast. Unfortunately, it resulted in a lot of unusable floppies, especially if you used junk media.

    I suspect that Acronis is dealing with the drives in some way that is non-standard, and is therefore is either interpreting the normal retries as an indication of a problem with the actual data. This is either done by design, to ensure the best possible data integrity, or by oversight because they did something to improve performance but didn't realize the implications.

    Either way, I think I have now figured out what is going on. I am going to forward a version of this to Acronis support and ask that they suggest to their engineers to alter the way TI works, or at least give the user an option to "relax" checks done on external drives so that only true failures are reported.

    So, when I started this, I said, "Let's uncover the problem!" I now think that we have.
     
  17. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    While I won't bet my life on it, I don't think there is any checking of data when written to HD in normal circumstances. Like the optical disks, HDs also have the redundant data attached to each sector which permits correction of a certain number/type of errors. The controller hardware assembles the ECC and writes it with the data. Data errors are typically picked up on reads only.

    Your cable issue is similar to my SATA cable problem. Writing the archive was OK (I think, but I can't be sure if I validated the problem archive when I fixed the problem.) but validating gave corrupt messages. The Windows Event Viewer showed parity errors on the device and specifically said to change the cable. I did and it fixed the problem. Interestingly, I had a similar failure with another SATA cable which was provided with the same ASUS motherboard. They both are replaced with a third-party cable and all has been fine for a couple of years perhaps.
     
  18. bilby

    bilby Registered Member

    Joined:
    Sep 4, 2007
    Posts:
    17
    My validation problems went away when I switched to BartPE with Acronis 10.0 (Build 4,942) plugin. I backup to an external USB disk and did not change any of the cables when I switched to BartPE. This would seem to question whether you have a complete solution to the problem by blaming the cables. --bilby
     
  19. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    You made a very significant change when you switched to BartPE because you moved from the Linux environment to the Windows environment which is considered to most likely have the best drivers for your USB device or whatever.
     
  20. bilby

    bilby Registered Member

    Joined:
    Sep 4, 2007
    Posts:
    17
    I realize that, but blaming USB cables as it seemed the earlier post by johnmeyer did seems like it may be missing this key point. Changing drivers from Linux to Windows XP appears to solve the problem, too. So which is it, cables or drivers? I don't think the problem is resolved yet.

    As far as I know johnmeyer has not yet tested the BartPE route. I would be most interested in his results should he undertake that task. --bilby
     
  21. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    In re-reading my post, I see now that I needed to be more explicit about the issue with the cables. What I was trying to say, but didn't say explicitly, is that I think that each cable results in a different error rate between the external USB drive and the host computer. This would be true regardless of what O/S is running on the computer. This is a function of the electronics. Once an error has occurred, then it is up to the firmware in the drive and in the computer, and also the O/S drivers, as to how many retries to do, and how to report success or failure back to the calling application.

    I am a little discouraged about the likelihood of getting any fix from Acronis, and I am not sure that I will continue recommending this program to my clients, although I still like the clean interface and so many other aspects of the design.
     
  22. jmk94903

    jmk94903 Registered Member

    Joined:
    Jul 10, 2004
    Posts:
    3,329
    Location:
    San Rafael, CA
    Re: Some progress to report: Corrupt/Can't Verify Corrupt Archives

    I think this is the first time that anyone has shown that the USB cables are very significant factors in writing/reading backup images. It explains a lot and offers a potential inexpensive elimination of this problem in many cases.

    Was there any obvious physical difference between the cables? Was one longer/shorter, or not shielded or...?

    What cable would you recommend a person buy to minimize the problem?
     
  23. tjhb

    tjhb Registered Member

    Joined:
    Mar 20, 2005
    Posts:
    11
    John M's finding regarding cables seems substantial.

    Does anyone actually know (and why is no one from Acronis intervening in this thread?) whether TI9 or 10 use C/C++ calls, cached API, uncached API, or some sort of low-level access, in general, and/or particularly for external drives?
     
  24. Menorcaman

    Menorcaman Retired Moderator

    Joined:
    Aug 19, 2004
    Posts:
    4,661
    Location:
    Menorca (Balearic Islands) Spain
    Re: Some progress to report: Corrupt/Can't Verify Corrupt Archives

    I'm pretty sure you would have discovered this had you carried out the Large File Copy/MD5 Checksum test recommended in my Post #10 above. Ah well, better late than never I suppose.

    Regards

    Menorcaman
     
  25. Xpilot

    Xpilot Registered Member

    Joined:
    May 14, 2005
    Posts:
    2,318
    I presume all these USB cables are not more than 2 metres long and are of USB2 specification. Although USB connectors are quite robust it is important that the cables have enough slack so as not to pull on the sockets in any direction.
    Before I threw out the bad and semi-bad cables I would examine the connectors for any extraneous matter, they are after all not dirt and dust proof. Then clean the contacts with an electronic contact cleaner/lubricant which would could also clean out the sockets at each end. Then if the faults were not cleared I would bin both and move on.
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.