Corrupt/Can't Verify Corrupt Archives: Let's uncover the problem!

Discussion in 'Acronis True Image Product Line' started by johnmeyer, Sep 11, 2007.

Thread Status:
Not open for further replies.
  1. Michel Merlin

    Michel Merlin Registered Member

    Joined:
    May 23, 2006
    Posts:
    33
    Location:
    Versailles (France)
    Memtest86+ found no Mem error on the PC where TIH corrupted the FS

    I finally ran Memtest86+ (v1.70 of 14 Jan 2007) on the PC where Acronis True Image 10 had corrupted the FS (see my earlier reports); results:
    • Walltime: 37 hours 50 min (Sun 18 Nov 2007 23:48:40 GMT -> Wed 21 Nov 13:39:06 GMT)
    • Pass #: 76
    • Test #: 6 (each Pass has 8 Tests; each Test says "116K - 512M 512M" and applies an unknown number of patterns, prebuilt or random)
    • Errors found: 0
    This was only to reply questions and "advices", since of course I knew I would have no bad memory: that system (Athlon XP2000+, GA-7VA, 512MB carefully chosen at Crucial for that exact system, W2KSP4, Enermax Noisetaker EG375AX-VE(W)(24P)) has run for 5 years with no problems, often loaded, with 2 PCI TV cards, 3×3.5" + 1×2.5" HDs, 2 additional HD controller PCI cards, plenty USB devices; TIH archiving or restoring (or mounting) was done (in 2006 with TIH_9b2337) or unsuccessfully attempted (in 2007 with TIH_10.0b4942) with removing one or both TV card(s) and most USB devices; and so on.

    I recall (see reports) that:
    • the FS corruption experienced on my system when (trying) using TIH_10, while different from the "Corrupt/Can't Verify Corrupt Archives" subject of this thread, is similar and more worrying, and has most probably the same cause(s);
    • {I first got problems with my 1st build of TIH_9 (b2323); but they hit TIB archives, not FS or anything else, and they got fixed with b2337}
    • the FS was corrupted immediately the 1st time I tried to create a new TIB archive using TIH_10.0b4942 (upgrade bought 28 Aug 2007); surprisingly, the corrupted partitions were NOT the one (FAT32) where I was trying (in vain) to create the new archive, but all the ones (5, all FAT32, of my 19 partitions on the 3 HDs plugged at that time) where were the archives created in 2006 with TIH_9b2337 (of which I had mounted a few in read-only a few minutes earlier same day 28 Aug 07), plus the main Windows partition (NTFS), total 6 partitions corrupted, with according data losses (each partition has two 16-KB files with the bits CHKDSK could retrieve from the lost data); TIH_10 aborted very soon in the backup attempt, pretending a bad sector; it appeared later that there were most probably no bad sector at all, but the FS corruptions I reported; in the 5 years I had no other FS corruption (or bad sectors) on that system;
    • I sent to Acronis (1 Sep, 2042MB message) the extensive and precise reports they requested, with in addition a clear synthesis; Acronis only replied asking for more, or proposing the new TIH_11 without offering a free upgrade despite I had bought TIH_10 less than one month before (I had to request to finally obtain it);
    • but I don't dare any more to try any of my 3 paid versions of TIH (before doing anything risky, I do a full backup; so how do I when the risky operation is the backup program itself?)
    • So my problem is still pending and I am still without backup program working until I retrieve one of the older ones I used before buying Acronis.
    Versailles, Wed 21 Nov 2007 14:45:45 +0100
     
  2. tachyon42

    tachyon42 Registered Member

    Joined:
    Dec 26, 2004
    Posts:
    455
    Re: Memtest86+ found no Mem error on the PC where TIH corrupted the FS

    I also have unresolved TI corruption problem and have similarly used my system over a number of years.
    My system is an Athlon 900 (1200 at 133mHz) with an ASUS A7V133 motherboard.
    I have done a lot of investigation and believe it's probably due to some motherboard timing issue.

    Your motherboard has Via chipsets (as does mine).
    You might want to read this report
    then check your system at the Gigabyte website for BIOS and chipset driver versions.
    I know the above report mentions a fix but I'm not convinced it was 100% successful.

    I'd be interested to know if TI corruption issues are only occuring with motherboards using Athlons and Via chipsets.
     
  3. Logger

    Logger Registered Member

    Joined:
    Nov 13, 2006
    Posts:
    17
    I have just used DocMemory to discover a bad 1gb DDR400 memory module that I have been unknowingly living for at least six months. Now my corrupt TIB file problem seems fixed. Having removed the dodgy module I can now create a backup and it will validate just fine. Wasn't until I was in the process of troubleshooting other issues on my system that I considered doing a TI restore - when I discovered that my previous archives were corrupt and failed to validate. This precluded using a TI restore to fix the problem. Furthere investigation led to the bad memory module.

    Anyway, I am just vouching for the fact that in my case a bad Veritech DDR400 ram was indeed the cause of my corrupt archives.
     
  4. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    Glad you found your validation problem. I'm sure a lot of people think I'm nuts when I keep preaching about checking your RAM if you have a validation problem. Like everything else, it isn't the only reason a validate can fail but it is certainly a reason, especially if you appreciate how the validation process works. Furthermore, it is easy to run a diagnostic to rule it out with a reasonable degree of certainty.

    Also, your "unknowingly living" with it for 6 months also shows that just because you think the PC RAM is OK because everything else seems to work isn't good enough. Regular, non-ECC, PCs assume the memory works and if the failure doesn't trigger another problem such as an attempt to jump outside the prescribed memory allocation for the program then all will appear just fine. I don't know how the PC memory allocation system works in detail but I'd be willing to guess that many large-memory PCs have locations that are rarely, if ever used until a program comes along and wants to setup large buffers.
     
  5. Arranger

    Arranger Registered Member

    Joined:
    Oct 2, 2005
    Posts:
    21
    After reviewing the basics on Memtest86, I recalled that my recent image corruptions occurred after changing my system memory. I pulled out one stick and my images now verify without corruption. I didn't bother running Memtest86 yet.

    Interesting.....
     
  6. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    Just a quick note: The problem with corrupt images continues, even though I have the latest TI 11. I can't even backup to internal drives anymore. The MEMTEST red herring does not help.

    Once in awhile, the backup works, which is why I keep using this buggy program.

    Damn, I wish the people at Acronis had enough pride in themselves and in their work that they would fix this problem!!
     
  7. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    I obviously haven't gone back and read through this giant thread about everything you have tried or info you have provided to requests from others trying to help.

    Your symptoms IMO point to flakey hardware. It works,
    then it doesn't work, then it works, ...

    You can take Windows out of the equation by only using the TI boot CD but only you know that it can produce and validate an archive and ideally, restore it. Using a BartPE CD setup is another way to almost achieve the same thing.

    Somebody reported fixing problems by turning off their security software. This would only be a problem under Windows.

    Other causes are intermittently bad: disks, disk cables, memory, CPU and RAM and anything else for that matter. I would also check the voltages which you can sometimes find in BIOS. Ideally, these would be checked with a voltmeter while the machine is running full-tilt creating an archive with the CPU underload and the disks being hammered.

    Another thing you could try is to create a several GB archive and then use a free checksum calculator to calculate and record the checksum. Do it several times to ensure you indeed have a consistent value. Now copy the file to different folders, partitions, drives and recheck the checksum each time in the new location. It must be the same.

    You could also try a full system diagnostic program that checks the PC while it is excercising everything. This is a more realistic test than running a diagnostic such as Memtest86 all by itself. I would let it run overnight as a minimum. I don't know the names of any of these anymore.
     
  8. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    The hardware is fine and has been working the same way for five and a half years.

    I finally got to the point today where I couldn't create an archive to any disk drive. I uninstalled/reinstalled, but no go. I then created an Acronis "rescue media" boot CD and did the backup from there. That worked, and the archive validated, both from that Linux environment (I think that's what they use), but also when I returned to Windows, it validated there as well.

    Thus, the problem -- as I've suspected from the first post way back when -- is the Windows Acronis code. If it were my memory (and, yes, I ran the stupid Memtest, and yes, it is the latest version of Memtest, not the old one, and yes, my computer passed 100%) -- if it were my computer's memory then it would cause the backup to fail regardless of O/S. Oh, I'm sure someone could hypothesize some tortured tale about how everything is in a different memory location depending on how each O/S is loaded, but let's face it: everything else has run for almost six years on this PC, and this is the ONLY program having a problem. What's more, Trueimage 11 fails on many other PCs that I own (fails to validate, that is).

    I looked today for an update, but I see that nothing has changed since November, so I have the latest version.
     
  9. tachyon42

    tachyon42 Registered Member

    Joined:
    Dec 26, 2004
    Posts:
    455
    I had a similar issue with TrueImage being the only program that failed (most of the time) on my ASUS A7V133 motherboard based computer over the last 4 or 5 years.
    There is a known fault with early versions of the VIA chipset which was used on that motherboard.
    I was never sure this was the cause of the problem since it was supposedly fixed.
    If you've got the time you could search for my posts on this forum and you'll see that I spent a lot of time trying to find a workaround/solution.
    I'm now convinced it's the motherboard.
    Unfortunately Acronis didn't take up on suggestions I made which might have helped to confirm that it was a hardware issue.
    I expect you might have to accept that TrueImage just won't work with your computer motherboard.
    Of course, you should first follow the various suggestions on this forum to try to eliminate other components which might be the cause of the problem.
    Good luck.
     
  10. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    The fact that your PC runs "normally" doesn't mean there isn't a problem but it is a reasonable assumption. Regular operation normally doesn't mean dealing with multi-gigabyte files which incidently is the problem with the A7V133 motherboard problem that Tachyon42 refers to. I had bad SATA cables that showed no problem with normal operation at all but TI validates would fail. There was an error recorded in the event log on bootup and it actually said the cause was likely a bad cable. I changed the cables from the ones that came with the Asus motherboard to new ones and the problem was cured. I would have sworn the machine had no problems until TI failed.

    The proper operation of the TI boot CD version does point to a Windows/application conflict and while live-imaging under Windows is not a source of many complaints, there certainly could be a conflict. Are you running some piece of software that might be down in the guts of the system on all the machines that fail?

    So you are convinced you don't have a memory problem, fine, but I wouldn't call Memtest86+ stupid. It also is not a tortured methodology to say that different OSs use different memory mappings or for that matter, the memory mapping at any one time can be different from the past time in Windows depending on how things were loaded.

    Since the bootcd version runs I would maybe try disabling everything possible using MSconfig. If that doesn't help, I would create a new version of Windows on a spare disk and only load TI and try it.

    Since the Windows version can't validate the archive it means that TI cannot read the archive file and recreate the checksums contained inside the archive file. So something is causing an improper data transfer or checksum calculation in memory. It could be TI software not handling your hardware properly, of course, but you have had this problem with multiple versions. I still recommend checking large file transfers under Windows as mentioned in my previous post.
     
  11. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    Yes, I agree completely. It is a VERY reasonable assumption.
    We've discussed this before. I deal with multigigabyte files every single day (video files). That's almost ALL that I deal with. When I stated this before, someone responded by stating that a few missed bytes won't be noticed when dealing with video, but that person obviously does not understand video files.

    That statement was totally untrue.

    I edit HDV, which uses a "long GOP" format where most frames are dependent on as many as a dozen previous frames. One glitch, and the whole batch of frames becomes corrupt.

    Also, some of my corrupt validations are on smaller backups of TIB files that are less than 4 GBytes. In fact, I have even tried going to a FAT32 USB drive (which can't deal with files larger than 4 GBytes) and I got corrupt files on that as well.

    I do have an ASUS P4B motherboard. I have no idea if that is one of the "bad" boards, but again, I have used this computer almost every day for over five years, most of it video editing and photo editing (I've re-touched over 80,000 photos in that time), and many other things involving lots of disk activity. No problems except for this application (TrueImage).

    If I was the only one having the problem, then I'd accept that it is something weird on my computer. Obviously this isn't even close to being the case, as this thread -- which has become very long, and attracted dozens of other people with the same problem -- has proved. Also, there are MANY other threads where people are reporting the same thing, something I mentioned when I first started this thread, and which has continued unabated in other threads since that time.

    Finally, the hardware is off the hook because the BootCD version of TI works. The hardware didn't suddenly change. QED, the hardware is OK.
    As for gremlins in my computer, I have zero programs loading in the background. In Task Manager, in the Processes tab, under my user name, I have Explorer.exe and that is it. Nothing else (except for the Trueimage processes).
    I agree. That is a poor choice of words on my part. The program is actually incredibly smart and useful and has helped me on more than one occasion track down bad memory chips. Having actually had two bad memory chips, I can tell you that the effects are not subtle at all. EVERYTHING starts to go to hell, and the computer starts rebooting, with blue screens, strange lock-ups, etc. The symptoms of failing to validate a file under just one program do not at all fit the symptoms of bad memory. (Yes, I've seen the reports that a few people have found that bad memory was the problem, but once a person has run MemTest and let it run a few cycles, and nothing is reported, you really have to lay off and start looking elsewhere).

    Go ahead and Google "bad memory symptoms computer" and you will get this page:

    http://www.google.com/search?hl=en&rls=GGLG%2CGGLG%3A2005-23%2CGGLG%3Aen&q=bad+memory+symptoms+computer+

    One of the first links takes you here:

    http://www.pcstats.com/articleview.cfm?articleID=1565

    You can go to any of the hundreds of other sites and they'll all tell you the same thing, all of which agree with my statement above.

    So until the dozens of other people who have reported this problem come forward and say that TI is now working great for them, I think it should be obvious to everyone that the problem lies with Truimage, and not with dozens of completely independent users who all coincidentally have the exact same problem with one and only one program while everything else on their computer runs just fine.
     
  12. tachyon42

    tachyon42 Registered Member

    Joined:
    Dec 26, 2004
    Posts:
    455
    It's not just situations involving multi-gigabyte files although of course the longer time taken in processing larger files gives the chipset bug more time to become apparent.
    The issue is related to PCI latency and affects other motherboards not just the A7V133.
    See http://www.au-ja.org/review-kt133a-1-en.phtml for a discussion of the problem and various motherboards affected.
    Note that the presence of a Creative Soundblaster card is one factor which can exacerbate the problem but it can occur without that card installed.
    I don't believe the latest chipset drivers completely fix the problem when using the A7V133. I've tested using various BIOS settings which affect PCI latency and can certainly change the frequency of corrupt images.
    However, I haven't been able to completely eliminate the problem.
    Unfortunately, my motherboard has recently become unusable due to a problem with the keyboard connector so it's unlikely I'll be doing any further work on this problem.
    I'd suggest anyone experiencing corrupt images should check if they are using any of the motherboards or chipset mentioned at the above link.
    If so, cut your losses - TrueImage is unlikely to work for you.
     
  13. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    Still no solution to this problem, but I now find that I can create and validate backups to my external USB drives, but that now I can no longer create backups to my secondary IDE drive (configured as a slave to my master drive). For years this has always worked. Also, I can't (at the moment) create and validate backups to this internal drive even if I do that backup from the non-Windows software on the Acronis recovery disk.

    So, I am more stumped than ever. There seems to be no consistency in this problem from one computer to the next, from one drive to the next, and now from one day to the next.

    Oh, and my almost six year old computer continues to run perfectly every day, except for this flawed program. No crashes, no blue screens, no hangs, nothing. Just one program, Acronis Trueimage, that has problems.
     
  14. laserfan

    laserfan Registered Member

    Joined:
    Jan 19, 2005
    Posts:
    117
    Haven't followed this whole thread--are you still on v9?

    I'm a long-time user of TI, and have stuck with v7 until a few months ago when I got sucked-in to buying v11 (over some what-turns-out-to-be-marketing-bs "copy pc" feature). In any case v11 (latest, last Nov I think) looked nice, but I immediately started seeing "corrupt archives" with it! WTF!? After a VERY short while of trying to use it, scratching my head, coming here to look around, trying it on a different/better/newer PC, then persisting to have Verify problems, I un-installed the bloody thing. Back on v7, it works great. First time, next time, every time, no problems, no corruption.

    Something is rotten with later versions of Acronis True Image. I feel your pain, johnmeyer (though not THAT much since I have a working v7!). ;)
     
  15. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    I got started with version 9 and am now using version 11. I'd love to get version 7, if that works.
     
  16. tachyon42

    tachyon42 Registered Member

    Joined:
    Dec 26, 2004
    Posts:
    455
    Contact Acronis Support - they used to be fairly cooperative in providing earlier version if one explained that you have a license for a later version that doesn't work and wanted to test the earlier version. If they don't come across then request your money back since what they sold you does not work with your computer.
     
  17. johnmeyer

    johnmeyer Registered Member

    Joined:
    Oct 18, 2005
    Posts:
    51
    I have contacted Acronis about this specific problem on several occasions over the past two years. They did try to help, but it was mostly about running memtest and also doing the backup from the boot media in order to eliminate Windows as the source of the problem.
     
  18. tachyon42

    tachyon42 Registered Member

    Joined:
    Dec 26, 2004
    Posts:
    455
    They may still allow you to try the earlier version.
     
  19. sponna

    sponna Registered Member

    Joined:
    May 22, 2008
    Posts:
    2
    I too have the corrupt archive issus on one of my network machines. I image four machine to a central NAS and all machines except for one are fine i.e. the archives validate. However, for one of the machines, the tib fails validation both under the Windows and CD booted environment. Not good!

    However, here's the odd thing, if I validate the failed archive on the NAS using Acronis on a different machine its fine! Any explanation for that gratefully received.

    I think I'm at the point where I will have to purchase a new drive and try to restore to it from various images - seems the only way I can be truly confident that I can recover if it becomes necessary.

    Thanks
    Dave
     
  20. layman

    layman Registered Member

    Joined:
    May 20, 2006
    Posts:
    280
    I have experienced problems copying large TI image files - both from machine to machine on the network and locally. I can only conclude that, whatever the root cause of the loss of integrity may be (presumably hardware-related), the operating system is culpable for failing to detect the mutation. It appears to me that Microsoft has short-cut integrity checking to speed up certain operations. So, for example, you can defrag a disk containing large TI images and find that one or more of the images becomes corrupted. This really stinks. The root cause is no doubt hardware-related (either memory or disk), but there should be hardware and operating system safeguards against that inevitable kind of failure.
     
  21. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    The validation process causes the archive file to be opened, read and the checksums to be recalculated and compared with the ones stored in the archive file. There are 4000 checksums per gigabyte of archive and they all must agree perfectly or the archive is declared corrupt.

    Anything that will cause the checksums not to agree will trigger the corrupt message and this can be disk hardware, disk cables, RAM, motherboard failures, ... - take your pick and yes, program coding flaws.

    Since the other machines can validate the archive OK says the NAS drive is OK. Since it is a NAS device you can add network card/cables to the above list of possible hardware. However, I'd think that problems with network transfers would show up in the Windows event logger. Good place to look as a starting point anyway.

    You can download memtest86+ from www.memtest.org (note the .org) and let it run for several hours on the problem machine. Overnight is best.

    While I don't think this is the problem, it is always good to run chkdsk X: /r on all your partitions. Replace X with drive letter of the partition being tested.

    You can also check your power supply voltages. Voltmeter is great but they are often displayed in BIOS.

    Try copying the archive file from the NAS drive to an internal drive on the bad machine and then try to validate.
    If it still does not validate, run a free checksum calculator on the archive file on the NAS device and then run it on the copy on the bad PC. They should agree if the copy process was good. Note that this is not a perfect test since bad RAM or other problems could cause the checksum calculator test to fail on the bad PC as well. You could try copying the copied file from the bad PC to a good PC and recheck the checksums of all 3 files.

    In normal operation Windows and virtually all operating systems do not check data written to memory or to disk. The only time errors show up is when the data is read and in the case of a disk, may show up as a CRC error or in the case of memory may cause the machine to do virtually anything that may not even be noticed to a crash. This is done to save time since checking would virtually double the time to do these checks. This is one reason that PCs seem to be OK but fail on TI validates. The 4000 checksums per gigabyte is a pretty stringent test of archive file integrity.
     
  22. thecreator

    thecreator Registered Member

    Joined:
    Feb 12, 2007
    Posts:
    87
    Location:
    Baltimore Co., Maryland USA
    Hi All,

    The image that tests corrupt from a computer may not be corrupt at all. You may simply lack memory and good drivers from DOS. Test from Windows or while running in Windows.
     
  23. seekforever

    seekforever Registered Member

    Joined:
    Oct 31, 2005
    Posts:
    4,751
    It is true that often the image itself is not corrupt and the "corrupt" message should be interpreted to mean that TI cannot read the archive file and recreate the checksums. If the image was written on bad media or there is a program bug then it can be indeed corrupt but I tend to think this is rare since the image creation process is probably one of the few things that is carefully checked by Acronis. I do think running chkdsk from time-to-time before making an image is a good idea just in case some obscure filesystem problem could throw a curve at TI.

    If you are replying to Sponna, the validation was stated to fail in both Windows and in the Linux recovery environment. TI only uses a variant of DOS in the so-called Safe mode. The important thing is that the Validation must be able to succeed in the Linux environment since this environment must be able to run properly to restore the active partition unless you make a BartPE or VistaPE CD or run in the TI Safe mode which tends to run slowly and may not support USB or network devices.
     
  24. Thalassos

    Thalassos Registered Member

    Joined:
    Apr 5, 2008
    Posts:
    6
    My experience is:

    Validation OK, which means: your backup is in order.

    But, when executing the data backup to take over, the application hangs in a black screen, after a hopeful interface and choice start. The disappointment could not be less.

    Thalassos
     
  25. tachyon42

    tachyon42 Registered Member

    Joined:
    Dec 26, 2004
    Posts:
    455
    A couple of years ago I suggested to Acronis that they might want to consider an option during archive creation which reread each sector and matched it to the data being written. I know this would double the archive creation time but I thought it might be a useful option to have when trying to identify the cause of archive corruption. I know it won't distinguish between the various potential causes of corruption during the sector write (RAM, CPU, cable and cable length (timing), motherboard/chipset faults, etc) but if the sector data was displayed for every corrupted sector in the archive then the corrupted bit (or bits), sector addresses, etc might give a clue to a specific users problem, including perhaps an obscure software fault. Of course, if no error was detected during this reread check then the user should be confident that the archive was created correctly. I guess Acronis didn't think it was a worthwhile option and that the user could always do the archive validation check (which really isn't that useful as a diagnostic tool).
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.