Advice Needed: Rollback Rx 10.2 and Delayed NTFS File Corruption?

HermitGeek · Jan 21, 2014

I need some help in determining whether or not a catastrophic NTFS corruption I experienced was related to Rollback Rx 10.2 (Build 2698745870). The Horizon DataSys forum is a ghost town, and the staff there is ill-equipped to help.

For more than a year, I have a system running Rollback Rx 8.1 on an OCZ Agility 3 SSD running Windows Vista SP1 without any problem (including Rollback Rx). Obviously, because of Vista, TRIM was not active (as far as I know).

About 6 weeks ago, I cloned the drive to an Intel 520 SSD as part of an upgrade. The new SSD has been pre-tested to be error free. All other hardware in the system remained the same. I also uninstalled Rollback Rx 8.1 and installed Rollback Rx 10.2 during this upgrade.

For about 4 weeks, the system ran fine, including taking nearly hourly and daily snapshots. However, about a week ago, the first sign of trouble occurs. Suddenly Rollback Rx said its console client was corrupted. Windows' Event Viewer showed repeated and frequent NTFS File Structure corruption (Error ID 130). At that time, I was able to access the Rollback Rx console from boot, so I was still able to revert to a working snapshot.

However, the same corruption occurred just a few days ago. The system was largely idle during this period. There had been no new software installation or services running in the interim. The corruption was catastrophic. Nearly all of my data files were corrupted. Reverting to older snapshots either failed to correct the corruption or just led to BSOD.

I don't want to falsely attribute this to Rollback Rx, but I can't think of any other reason for this. I am aware of the long history of file and snapshot corruption with v10 and some earlier builds of v10.2. I don't use any disk defragger. The only defrag I do is snapshot defragmentation using the Rollback Rx console during boot (not even without Windows). Unfortunately, because the system was mission critical, I have since reformatted the entire drive and restored from a backup. I am now using v9.1 instead of v10.2.

My questions are these:

1. Is this NTFS corruption issue known with Rollback Rx 10.2 (Build 2698745870)?
2. If so, why did the corruption only begin to occur after nearly a month of flawless operation? What could have triggered it (again, the system had no new change made)?
3. Is it possible that v10.2 is somehow incorrectly forcing TRIM to be active on my SSD even though Vista is not supporting it, causing the MBR to corrupt?
4. Am I safer to use v9.1 with an older OS (Vista) instead of v10.2, because v10.x is based on new code and is not tested properly with Vista?
5. Any other suggestion I can take to monitor my current system running v9.1 to monitor for the same corruption?

MrBrian · Jan 21, 2014

HermitGeek said:

5. Any other suggestion I can take to monitor my current system running v9.1 to monitor for the same corruption?
Click to expand...

Create custom event triggers in Vista Task Scheduler

HermitGeek · Jan 22, 2014

Thanks for the tip, MrBrian.

Any idea if my theory about Rollback Rx and why it happened. I have been an user of Rollback Rx for many years. This was my first use of v10.2, so it is very important for me to find out if the software is no longer reliable.

TheRollbackFrog · Jan 22, 2014

HermitGeek said:

Thanks for the tip, MrBrian.

Any idea if my theory about Rollback Rx and why it happened. I have been an user of Rollback Rx for many years. This was my first use of v10.2, so it is very important for me to find out if the software is no longer reliable.
Click to expand...

HG, I follow the RBrx community pretty close and to date, have never heard of such a problem during the rollout of v10.x... lots of other problems (some of which I've had myself) but not that one.

I would get v10.2 off that system for a while and see, over time, if the problem continues.

HermitGeek · Jan 22, 2014

Thank you for your advice, TheRollbackFrog.

Can you elaborate the part of the issue I reported for which you are not aware of previous reports? Is it the "delayed" occurrence of the event? I am aware of the multiple MBR issues with Rollback Rx v10.x. The NTFS corruption of recently written files and the widespread nature is consistent that somehow part of the MBR was corrupted, rather than a sector-level corruption (such as a bad drive).

Also, it is possible that v10.x has some underlying incompatibility with an older OS such as Vista SP1 (as in my case)? Rightly, much of the attention with the v10.x problems were related to Windows 7 and mostly Windows 8. Has there been reports of issues with Vista specifically?

Lastly, is it possible that there is some goofy thing going on with TRIM on my Intel 520 SSD, even though Vista does not support TRIM? Could this discordance cause problem?

TheRollbackFrog said:

HG, I follow the RBrx community pretty close and to date, have never heard of such a problem during the rollout of v10.x... lots of other problems (some of which I've had myself) but not that one.

I would get v10.2 off that system for a while and see, over time, if the problem continues.
Click to expand...

TheRollbackFrog · Jan 22, 2014

HermitGeek said:

My questions are these:

1. Is this NTFS corruption issue known with Rollback Rx 10.2 (Build 2698745870)?
2. If so, why did the corruption only begin to occur after nearly a month of flawless operation? What could have triggered it (again, the system had no new change made)?
Click to expand...

The fact that it waited a month before failures started to occur makes me believe RBrx had little to do with the problem. That problem is what caused RBrx to say its CONSILE client was corrupted.

The timing looks more like hardware infant mortality... that's why I suggested removing RBrx for a bit. If it is infant mortality, you'll start to see it without RBrx.

HermitGeek said:

3. Is it possible that v10.2 is somehow incorrectly forcing TRIM to be active on my SSD even though Vista is not supporting it, causing the MBR to corrupt?
Click to expand...

TRIM is an OS function only. VISTA does not support TRIM. Any TRIM command issued by RBrx (and those are even questionable) to a non-TRIM OS will fall into a black hole, never to be acted upon.

HermitGeek said:

4. Am I safer to use v9.1 with an older OS (Vista) instead of v10.2, because v10.x is based on new code and is not tested properly with Vista?
Click to expand...

Feel free to do this but I wouldn't assume v10.x is not tested properly with Vista. Their test suite does cover all the OSes they claim to support.

HermitGeek said:

5. Any other suggestion I can take to monitor my current system running v9.1 to monitor for the same corruption?
Click to expand...

I would run an occasional CHKDSK on a RBrx FREE system first to see if things are degrading without RBrx. Then do the same with v9.1, then v10.2. CHKDSK will surely pick up the beginnings of such a problem.

You mentioned "cloning" of your system to an SSD but didn't mention the tool used. Are you sure your partition alignment on the SSD is on a 2048kB boundary... that's necessary for SSDs.

HermitGeek · Jan 22, 2014

I cloned from one SSD to another SSD, using Partition Magic to do a RAW SECTOR TO SECTOR copy. However, I do not know how to check alignment. Can you clarify? If there is an error, could this be the problem and how do I fix it? Note that the WHOLE drive has only 1 single partition which I had expanded using the Disk Management tool in Windows Vista to fill the entire drive.

I have since reformatted the drive and restore a backup to it. If it is infant mortality issue, would I expect the problem to recur immediately?

TheRollbackFrog said:

You mentioned "cloning" of your system to an SSD but didn't mention the tool used. Are you sure your partition alignment on the SSD is on a 2048kB boundary... that's necessary for SSDs.
Click to expand...

TheRollbackFrog · Jan 22, 2014

HermitGeek said:

I cloned from one SSD to another SSD, using Partition Magic to do a RAW SECTOR TO SECTOR copy. However, I do not know how to check alignment. Can you clarify?
Click to expand...

Most partition tools allow you access to the PROPERTIES of the partition. Once the partitions are displayed, select the one you're are interested in and look for a properties option... it should tell you what you need to know. Don't select the DISK itself, select the partition. It's properties should include the starting sector #. If that number is divisible by 2048, then you're properly aligned. A misalignment usually only causes speed issues, not memory destruction.

HermitGeek said:

I have since reformatted the drive and restore a backup to it. If it is infant mortality issue, would I expect the problem to recur immediately?
Click to expand...

Not necessarily. With MLC NAND memory (the type most likely used by the SSD), it may develop over a short time. Keep an eye on it.

HermitGeek · Jan 22, 2014

Until that occurs, is there a tool like CrystalDiskInfo that allows me to see if the SSD is "running" into trouble already, much like SMART for HDD?

Also, I just did some reading following your suggestion. It seems that the lack of proper alignment only "slows" the SSD performance and would not explain the corruption issue I experienced. Is this correct?

TheRollbackFrog said:

Not necessarily. With MLC NAND memory (the type most likely used by the SSD), it may develop over a short time. Keep an eye on it.
Click to expand...

TheRollbackFrog · Jan 22, 2014

HermitGeek said:

Until that occurs, is there a tool like CrystalDiskInfo that allows me to see if the SSD is "running" into trouble already, much like SMART for HDD?
Click to expand...

CrystalDiskInfo should tell you most of what you need to know, although some of its SMART values aren't very relevant. The important SMART value is ID#5, the "Retired Block Count." This should be ZERO or very low at this poiont in the SSD's life.

HermitGeek said:

Also, I just did some reading following your suggestion. It seems that the lack of proper alignment only "slows" the SSD performance and would not explain the corruption issue I experienced. Is this correct?
Click to expand...

That is correct... that's what I mentioned at the end of my previous message.

HermitGeek · Jan 23, 2014

Would evidence that the Retired Block Count is high be supporting your theory of infant mortality (that is, problem with the SSD itself) rather than Rollback Rx?

TheRollbackFrog said:

CrystalDiskInfo should tell you most of what you need to know, although some of its SMART values aren't very relevant. The important SMART value is ID#5, the "Retired Block Count." This should be ZERO or very low at this poiont in the SSD's life.
Click to expand...

TheRollbackFrog · Jan 23, 2014

HermitGeek said:

Would evidence that the Retired Block Count is high be supporting your theory of infant mortality (that is, problem with the SSD itself) rather than Rollback Rx?
Click to expand...

That SMART value in an SSD is used to keep track of the SSD's NAND blocks (groups of bytes, or "pages") that have become unusable and must be taken out of service. This number should go up over time (a long time) as all the NAND blocks (including spares) are put into use and eventually wear out and must be taken out of service. Eventually all NAND blocks become "worn out" and at that time your SSD device becomes basically a READ ONLY device as writing can no longer be done to the worn blocks.

This process occurs over time, and the total time involved is proportional to the amount of writing done to your SSD device. In what I would call a "normal" system usage pattern, this process might take from 7-9 yrs. If your system is involved in a heavy WRITING environment (constant data conversion, compression... anything that causes lots of writing in the system), that time period will be significantly reduced (4-5 yrs?).

Since your SSD is fairly new, that number should probably be ZERO at this stage and not going up anytime soon... unless there's an internal problem with the device. If so, the device may be retiring those NAND blocks at a faster pace than should normally be expected.

Rollback RX should have no effect on what's happening inside that SSD except its questionable use of TRIM through the Windows system. This usage, if it's even happening at all (the product has been questioned significantly in this area since the introduction of SSDs and has never been definitely answered), will not degrade your SSD.

HermitGeek · Jan 23, 2014

Do you recommend that I use SpinRite to do a diagnostic on my Intel SSD drive (the one that I have since restored to a backup and am currently using without any issue)? Would this allow SpinRite to find troublesome areas on the SSD and "lock them out"? If so, as the Intel SSD already has Rollback Rx 9.1 installed, do I need to remove Rollback before I can use SpinRite safely?

TheRollbackFrog · Jan 23, 2014

HermitGeek said:

Do you recommend that I use SpinRite to do a diagnostic on my Intel SSD drive (the one that I have since restored to a backup and am currently using without any issue)? Would this allow SpinRite to find troublesome areas on the SSD and "lock them out"? If so, as the Intel SSD already has Rollback Rx 9.1 installed, do I need to remove Rollback before I can use SpinRite safely?
Click to expand...

I would not use SpinRite on an SSD, especially if you have it run the diagnostic mode where it re-writes all the sectors... this would age the SSD tremendously with unnecessary WRITE operations. A READ ONLY pass won't hurt anything but make sure it's not trying to correct the data if the block being tested comes up with an error... you may see a erroring drive with it but that's all you should see with it. Any drive "reader" will do the same thing as far as causing errors if the NAND cells are bad... it'll just give you a health indication, that's all.

SSDs manage themselves very well at the controller level, including necessary garbage collection (don't ask... it's a long explanation ). If there's errors, it will react accordingly and properly.

You didn't mention what the SMART ID#5 data was telling you... is it ZERO?

HermitGeek · Jan 23, 2014

Yes, it is ZERO.

Having this said, this reading is done just now, after I had reformatted the drive and restored the data to it, and long after the file corruption error that may or may not be related to Rollback Rx. Does this reading help with the troubleshooting and whether or not it is a Rollback Rx issue?

TheRollbackFrog · Jan 23, 2014

HermitGeek said:

Yes, it is ZERO.

Having this said, this reading is done just now, after I had reformatted the drive and restored the data to it, and long after the file corruption error that may or may not be related to Rollback Rx. Does this reading help with the troubleshooting and whether or not it is a Rollback Rx issue?
Click to expand...

This reading is valid from the beginning of the SSDs life to the end... no amount of formatting, reimaging, reloading will cause this to change back to ZERO if it was non-ZERO... it's a LIFE number.

If it's ZERO, there's a good chance your drive is just fine and your problem is elsewhere, although I know not where.

Keep an eye on your system with that occasional ChkDsk operation to see if it's starting again. All I can comment on is your problem description has never been reported or discussed in any of the RBrx forums, nor has it been mentioned in any of the product's release notes. This would lead me to believe that it may not be RBrx related, although no one can guarantee that.

Wish I could help more...

HermitGeek · Jan 24, 2014

Oops, TheRollbackFrog, it appears that I have been reading the wrong values.

In CrystalDiskInfo, under ID 05 (which describes it as Re-Allocated Sector Count), the current value is 100, not 0. The 0 is listed under Threshold.

Am I looking at the correct number?

Actually, to the contrary, TheRollbackFrog, I am immensely grateful for your advice about this problem.

TheRollbackFrog said:

This reading is valid from the beginning of the SSDs life to the end... no amount of formatting, reimaging, reloading will cause this to change back to ZERO if it was non-ZERO... it's a LIFE number.

If it's ZERO, there's a good chance your drive is just fine and your problem is elsewhere, although I know not where.

Keep an eye on your system with that occasional ChkDsk operation to see if it's starting again. All I can comment on is your problem description has never been reported or discussed in any of the RBrx forums, nor has it been mentioned in any of the product's release notes. This would lead me to believe that it may not be RBrx related, although no one can guarantee that.

Wish I could help more...
Click to expand...

TheRollbackFrog · Jan 24, 2014

HermitGeek said:

Oops, TheRollbackFrog, it appears that I have been reading the wrong values.

In CrystalDiskInfo, under ID 05 (which describes it as Re-Allocated Sector Count), the current value is 100, not 0. The 0 is listed under Threshold.

Am I looking at the correct number?
Click to expand...

Sounds like an old version to me. Get a fresh copy of CrystalDiskInfo v6.0.4 Standard Edition and run it again. Check all the header/health information to be sure you're looking at the right disk.

ID#5 should be called "Retired Block Count" (unless you're running it on an HDD). SSD "thresholds" are usually around 3 or 4, and the RAW value should be 0 if things are well.

HermitGeek · Jan 27, 2014

I was using this version already. Still, I redownloaded and reinstalled the client, just to make sure.

There is no row labeled "Retired Block Count". Here are the names of the first few rows:

ID Attribute Name
03 Spin Up Time
04 Start/Stop Count
05 Re-Allocated Sector Count
09 Power-On Hours Count
0C Power Cycle Count
AA Available Reserved Space
AB Program Fail Count
AC Erase Fail Count

Can you install this same version and check on this? Am I looking at the correct data columns and rows?

TheRollbackFrog said:

Sounds like an old version to me. Get a fresh copy of CrystalDiskInfo v6.0.4 Standard Edition and run it again. Check all the header/health information to be sure you're looking at the right disk.

ID#5 should be called "Retired Block Count" (unless you're running it on an HDD). SSD "thresholds" are usually around 3 or 4, and the RAW value should be 0 if things are well.
Click to expand...

TheRollbackFrog · Jan 27, 2014

HermitGeek said:

I was using this version already. Still, I redownloaded and reinstalled the client, just to make sure.

There is no row labeled "Retired Block Count". Here are the names of the first few rows:

ID Attribute Name
03 Spin Up Time
04 Start/Stop Count
05 Re-Allocated Sector Count
09 Power-On Hours Count
0C Power Cycle Count
AA Available Reserved Space
AB Program Fail Count
AC Erase Fail Count

Can you install this same version and check on this? Am I looking at the correct data columns and rows?
Click to expand...

That's the SMART values you get when checking an HDD. Does your CrystalDiskInfo header correctly identify your SSD as the disk it's testing? The "Disk" TAB should allow you to test whichever disk you'd like... maybe it's mixed up.

What's the Manufacturer/Model of disk you're testing?

HermitGeek · Jan 28, 2014

Yes, it is correctly identifying the SSD in the info header. It is the right one.

I ran that test on another Intel SSD. It is an Intel 160GB SSD (330 Series).

TheRollbackFrog · Jan 28, 2014

HermitGeek said:

Yes, it is correctly identifying the SSD in the info header. It is the right one.

I ran that test on another Intel SSD. It is an Intel 160GB SSD (330 Series).
Click to expand...

Sounds like CrystalDiskInfo is having a problem identifying certain SSDs.

When I run it on OCZ SSDs I see SSD-related SMART categories.

Can you post a screenshot of CDI's summary screen when you run it on your Intel SSD? I'd like to see the descriptions of all the fields and their RAW values.

HermitGeek · Jan 29, 2014

Here is a screencap of the Intel SSD. I have made certain that the header is correctly displaying the SSD so that the program is polling the correct SMART data:

http://imagizer.imageshack.us/v2/605x389q90/577/1ved.png

TheRollbackFrog · Jan 29, 2014

HG, turns out that Intel doesn't use anything even close to something "standardized" as far as SMART data is concerned... BUT, their E9 SMART value is valid, and yours says there's no wear at this time.

See this article for mention of this...

TheRollbackFrog · Jan 29, 2014

There are also descriptions of each ID value starting on Page #12 of this manual.

Log in or Sign up

Advice Needed: Rollback Rx 10.2 and Delayed NTFS File Corruption?

HermitGeek Registered Member

MrBrian Registered Member

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

TheRollbackFrog Imaging Specialist

Log in or Sign up

Advice Needed: Rollback Rx 10.2 and Delayed NTFS File Corruption?

HermitGeek Registered Member

MrBrian Registered Member

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

HermitGeek Registered Member

TheRollbackFrog Imaging Specialist

TheRollbackFrog Imaging Specialist

Useful Searches