Advice Needed: Rollback Rx 10.2 and Delayed NTFS File Corruption?

Discussion in 'backup, imaging & disk mgmt' started by HermitGeek, Jan 21, 2014.

Thread Status:
Not open for further replies.
  1. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    I need some help in determining whether or not a catastrophic NTFS corruption I experienced was related to Rollback Rx 10.2 (Build 2698745870). The Horizon DataSys forum is a ghost town, and the staff there is ill-equipped to help.

    For more than a year, I have a system running Rollback Rx 8.1 on an OCZ Agility 3 SSD running Windows Vista SP1 without any problem (including Rollback Rx). Obviously, because of Vista, TRIM was not active (as far as I know).

    About 6 weeks ago, I cloned the drive to an Intel 520 SSD as part of an upgrade. The new SSD has been pre-tested to be error free. All other hardware in the system remained the same. I also uninstalled Rollback Rx 8.1 and installed Rollback Rx 10.2 during this upgrade.

    For about 4 weeks, the system ran fine, including taking nearly hourly and daily snapshots. However, about a week ago, the first sign of trouble occurs. Suddenly Rollback Rx said its console client was corrupted. Windows' Event Viewer showed repeated and frequent NTFS File Structure corruption (Error ID 130). At that time, I was able to access the Rollback Rx console from boot, so I was still able to revert to a working snapshot.

    However, the same corruption occurred just a few days ago. The system was largely idle during this period. There had been no new software installation or services running in the interim. The corruption was catastrophic. Nearly all of my data files were corrupted. Reverting to older snapshots either failed to correct the corruption or just led to BSOD.

    I don't want to falsely attribute this to Rollback Rx, but I can't think of any other reason for this. I am aware of the long history of file and snapshot corruption with v10 and some earlier builds of v10.2. I don't use any disk defragger. The only defrag I do is snapshot defragmentation using the Rollback Rx console during boot (not even without Windows). Unfortunately, because the system was mission critical, I have since reformatted the entire drive and restored from a backup. I am now using v9.1 instead of v10.2.

    My questions are these:

    1. Is this NTFS corruption issue known with Rollback Rx 10.2 (Build 2698745870)?
    2. If so, why did the corruption only begin to occur after nearly a month of flawless operation? What could have triggered it (again, the system had no new change made)?
    3. Is it possible that v10.2 is somehow incorrectly forcing TRIM to be active on my SSD even though Vista is not supporting it, causing the MBR to corrupt?
    4. Am I safer to use v9.1 with an older OS (Vista) instead of v10.2, because v10.x is based on new code and is not tested properly with Vista?
    5. Any other suggestion I can take to monitor my current system running v9.1 to monitor for the same corruption?
     
  2. MrBrian

    MrBrian Registered Member

    Joined:
    Feb 24, 2008
    Posts:
    6,032
    Location:
    USA
  3. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Thanks for the tip, MrBrian.

    Any idea if my theory about Rollback Rx and why it happened. I have been an user of Rollback Rx for many years. This was my first use of v10.2, so it is very important for me to find out if the software is no longer reliable.
     
  4. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    HG, I follow the RBrx community pretty close and to date, have never heard of such a problem during the rollout of v10.x... lots of other problems (some of which I've had myself) but not that one.

    I would get v10.2 off that system for a while and see, over time, if the problem continues.
     
  5. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Thank you for your advice, TheRollbackFrog.

    Can you elaborate the part of the issue I reported for which you are not aware of previous reports? Is it the "delayed" occurrence of the event? I am aware of the multiple MBR issues with Rollback Rx v10.x. The NTFS corruption of recently written files and the widespread nature is consistent that somehow part of the MBR was corrupted, rather than a sector-level corruption (such as a bad drive).

    Also, it is possible that v10.x has some underlying incompatibility with an older OS such as Vista SP1 (as in my case)? Rightly, much of the attention with the v10.x problems were related to Windows 7 and mostly Windows 8. Has there been reports of issues with Vista specifically?

    Lastly, is it possible that there is some goofy thing going on with TRIM on my Intel 520 SSD, even though Vista does not support TRIM? Could this discordance cause problem?

     
  6. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    The fact that it waited a month before failures started to occur makes me believe RBrx had little to do with the problem. That problem is what caused RBrx to say its CONSILE client was corrupted.

    The timing looks more like hardware infant mortality... that's why I suggested removing RBrx for a bit. If it is infant mortality, you'll start to see it without RBrx.

    TRIM is an OS function only. VISTA does not support TRIM. Any TRIM command issued by RBrx (and those are even questionable) to a non-TRIM OS will fall into a black hole, never to be acted upon.

    Feel free to do this but I wouldn't assume v10.x is not tested properly with Vista. Their test suite does cover all the OSes they claim to support.

    I would run an occasional CHKDSK on a RBrx FREE system first to see if things are degrading without RBrx. Then do the same with v9.1, then v10.2. CHKDSK will surely pick up the beginnings of such a problem.

    You mentioned "cloning" of your system to an SSD but didn't mention the tool used. Are you sure your partition alignment on the SSD is on a 2048kB boundary... that's necessary for SSDs.
     
  7. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    I cloned from one SSD to another SSD, using Partition Magic to do a RAW SECTOR TO SECTOR copy. However, I do not know how to check alignment. Can you clarify? If there is an error, could this be the problem and how do I fix it? Note that the WHOLE drive has only 1 single partition which I had expanded using the Disk Management tool in Windows Vista to fill the entire drive.

    I have since reformatted the drive and restore a backup to it. If it is infant mortality issue, would I expect the problem to recur immediately?

     
    Last edited: Jan 22, 2014
  8. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    Most partition tools allow you access to the PROPERTIES of the partition. Once the partitions are displayed, select the one you're are interested in and look for a properties option... it should tell you what you need to know. Don't select the DISK itself, select the partition. It's properties should include the starting sector #. If that number is divisible by 2048, then you're properly aligned. A misalignment usually only causes speed issues, not memory destruction.

    Not necessarily. With MLC NAND memory (the type most likely used by the SSD), it may develop over a short time. Keep an eye on it.
     
  9. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Until that occurs, is there a tool like CrystalDiskInfo that allows me to see if the SSD is "running" into trouble already, much like SMART for HDD?

    Also, I just did some reading following your suggestion. It seems that the lack of proper alignment only "slows" the SSD performance and would not explain the corruption issue I experienced. Is this correct?

     
    Last edited: Jan 22, 2014
  10. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    CrystalDiskInfo should tell you most of what you need to know, although some of its SMART values aren't very relevant. The important SMART value is ID#5, the "Retired Block Count." This should be ZERO or very low at this poiont in the SSD's life.

    That is correct... that's what I mentioned at the end of my previous message.
     
  11. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Would evidence that the Retired Block Count is high be supporting your theory of infant mortality (that is, problem with the SSD itself) rather than Rollback Rx?

     
  12. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    That SMART value in an SSD is used to keep track of the SSD's NAND blocks (groups of bytes, or "pages") that have become unusable and must be taken out of service. This number should go up over time (a long time) as all the NAND blocks (including spares) are put into use and eventually wear out and must be taken out of service. Eventually all NAND blocks become "worn out" and at that time your SSD device becomes basically a READ ONLY device as writing can no longer be done to the worn blocks.

    This process occurs over time, and the total time involved is proportional to the amount of writing done to your SSD device. In what I would call a "normal" system usage pattern, this process might take from 7-9 yrs. If your system is involved in a heavy WRITING environment (constant data conversion, compression... anything that causes lots of writing in the system), that time period will be significantly reduced (4-5 yrs?).

    Since your SSD is fairly new, that number should probably be ZERO at this stage and not going up anytime soon... unless there's an internal problem with the device. If so, the device may be retiring those NAND blocks at a faster pace than should normally be expected.

    Rollback RX should have no effect on what's happening inside that SSD except its questionable use of TRIM through the Windows system. This usage, if it's even happening at all (the product has been questioned significantly in this area since the introduction of SSDs and has never been definitely answered), will not degrade your SSD.
     
  13. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Do you recommend that I use SpinRite to do a diagnostic on my Intel SSD drive (the one that I have since restored to a backup and am currently using without any issue)? Would this allow SpinRite to find troublesome areas on the SSD and "lock them out"? If so, as the Intel SSD already has Rollback Rx 9.1 installed, do I need to remove Rollback before I can use SpinRite safely?
     
  14. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    I would not use SpinRite on an SSD, especially if you have it run the diagnostic mode where it re-writes all the sectors... this would age the SSD tremendously with unnecessary WRITE operations. A READ ONLY pass won't hurt anything but make sure it's not trying to correct the data if the block being tested comes up with an error... you may see a erroring drive with it but that's all you should see with it. Any drive "reader" will do the same thing as far as causing errors if the NAND cells are bad... it'll just give you a health indication, that's all.

    SSDs manage themselves very well at the controller level, including necessary garbage collection (don't ask... it's a long explanation :D ). If there's errors, it will react accordingly and properly.

    You didn't mention what the SMART ID#5 data was telling you... is it ZERO?
     
  15. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Yes, it is ZERO.

    Having this said, this reading is done just now, after I had reformatted the drive and restored the data to it, and long after the file corruption error that may or may not be related to Rollback Rx. Does this reading help with the troubleshooting and whether or not it is a Rollback Rx issue?
     
  16. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    This reading is valid from the beginning of the SSDs life to the end... no amount of formatting, reimaging, reloading will cause this to change back to ZERO if it was non-ZERO... it's a LIFE number.

    If it's ZERO, there's a good chance your drive is just fine and your problem is elsewhere, although I know not where.

    Keep an eye on your system with that occasional ChkDsk operation to see if it's starting again. All I can comment on is your problem description has never been reported or discussed in any of the RBrx forums, nor has it been mentioned in any of the product's release notes. This would lead me to believe that it may not be RBrx related, although no one can guarantee that.

    Wish I could help more...
     
  17. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Oops, TheRollbackFrog, it appears that I have been reading the wrong values.

    In CrystalDiskInfo, under ID 05 (which describes it as Re-Allocated Sector Count), the current value is 100, not 0. The 0 is listed under Threshold.

    Am I looking at the correct number?


    Actually, to the contrary, TheRollbackFrog, I am immensely grateful for your advice about this problem.


     
    Last edited: Jan 24, 2014
  18. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    Sounds like an old version to me. Get a fresh copy of CrystalDiskInfo v6.0.4 Standard Edition and run it again. Check all the header/health information to be sure you're looking at the right disk.

    ID#5 should be called "Retired Block Count" (unless you're running it on an HDD). SSD "thresholds" are usually around 3 or 4, and the RAW value should be 0 if things are well.
     
  19. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    I was using this version already. Still, I redownloaded and reinstalled the client, just to make sure.

    There is no row labeled "Retired Block Count". Here are the names of the first few rows:

    ID Attribute Name
    03 Spin Up Time
    04 Start/Stop Count
    05 Re-Allocated Sector Count
    09 Power-On Hours Count
    0C Power Cycle Count
    AA Available Reserved Space
    AB Program Fail Count
    AC Erase Fail Count

    Can you install this same version and check on this? Am I looking at the correct data columns and rows?



     
  20. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    That's the SMART values you get when checking an HDD. Does your CrystalDiskInfo header correctly identify your SSD as the disk it's testing? The "Disk" TAB should allow you to test whichever disk you'd like... maybe it's mixed up.

    What's the Manufacturer/Model of disk you're testing?
     
  21. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
    Yes, it is correctly identifying the SSD in the info header. It is the right one.

    I ran that test on another Intel SSD. It is an Intel 160GB SSD (330 Series).
     
  22. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    Sounds like CrystalDiskInfo is having a problem identifying certain SSDs.

    When I run it on OCZ SSDs I see SSD-related SMART categories.

    Can you post a screenshot of CDI's summary screen when you run it on your Intel SSD? I'd like to see the descriptions of all the fields and their RAW values.
     
  23. HermitGeek

    HermitGeek Registered Member

    Joined:
    May 1, 2013
    Posts:
    20
    Location:
    US
  24. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    HG, turns out that Intel doesn't use anything even close to something "standardized" as far as SMART data is concerned... BUT, their E9 SMART value is valid, and yours says there's no wear at this time.

    See this article for mention of this...
     
  25. TheRollbackFrog

    TheRollbackFrog Registered Member

    Joined:
    Mar 1, 2011
    Posts:
    3,056
    Location:
    The Pond - USA
    There are also descriptions of each ID value starting on Page #12 of this manual.
     
Loading...
Thread Status:
Not open for further replies.