Delayed write failed - Returnil ?

Discussion in 'Returnil releases' started by ecotech, Mar 9, 2009.

Thread Status:
Not open for further replies.
  1. ecotech
    Offline

    ecotech Registered Member

    Hello!

    Recently I replaced Deepfreeze with Returnil on all my public computers. For some time I was running Returnil on some, as testing, and since it worked perfectly, I implemented it on all of them. For now, it is the personal edition, until I get the budget for the big version.

    Since I had to get to every pc, I also did the necessary updates. Installed Returnil, and thought I was in the clear. Then, all hell broke loose.

    I started getting random "delayed write failed blabla bla bla" messages. Of course, I went through all the troubleshooting steps, and finally I got to the uneasy possibility that Returnil might not like some update from MS that was recently installed.

    Any of you have ideas in this direction?

    Software and hardware configuration is the same on all the problem computers:

    XP Home, with SP3, and all updates up to last week

    Openoffice, browsers, the usual public computer stuff.

    Hardware is Dell Optiplex GX270, 40 gb hdd, 512 mb ram.

    --------------------------------------------------------------------------

    For now, I am very willing to allow for the possibility that there is some malware that can generate "delayed write failed" as a side-effect, but I was unable to catch it or find relevant information regarding this.

    Bottom line: help?

    PS: I am unable to recreate the errors all the time. Sometimes it shows up pretty fast, then later it takes hours of computer use.
  2. crofttk
    Offline

    crofttk Registered Member

    Subscribing to tune in. I had exactly same behavior but could not reproduce it easily enough that I felt I could afford time to get involved. Have meantime switched it to protection off while following the forum and waiting for chance to address it. I can't promise I can take time to troubleshoot extensively but I'll be interested in any suggestions Coldmoon or anyone else has.

    I also was not certain it wasn't malware but tend to think it's either Returnil itself OR a bad interaction with other software.

    Just as a note for Coldmoon, this is on a Dell 710m laptop with Sandboxie, NIS 2009, and TrojanHunter 5 running as security.
  3. ecotech
    Offline

    ecotech Registered Member

    Further testing showed that the delayed write error is much more likely to occur when Returnil is set to disk caching.

    Also it has something to do with my sync software - I use SyncBack to send back to a central repository all files/folders created on desktop, every 3 minutes, via ftp. Disabling that also decreases the frequency of the error, but it does not dissapear permanently, so there is still a bug somewhere.

    I am saying that its probably some MS update that caused this because the computers that have Returnil, Syncback, but are NOT up to date, have no errors whatsoever :) Problem is, that not updating is not an option in the long run...
  4. Coldmoon
    Online

    Coldmoon Returnil Moderator

    Hi ecotech and crofttk,
    Can you post the exact text of the error message(s) and when they happen? Is there any correlation to a particular program or activity?

    Thanks
    Mike
  5. Coldmoon
    Online

    Coldmoon Returnil Moderator

    Can each of you try increasing the size of your disk cache and let me know if this helps?
  6. crofttk
    Offline

    crofttk Registered Member

    Thanks, Mike.

    I'll need some time to deinstall from my wife's laptop and install on mine (all same security and most other software) and then I can get back to active monitoring. If it doesn't occur on mine, I need more time to put back on hers and remonitor problem, but that's problematic as she needs her machine for school and I am forbidden to set her up with the least bit of complications!:doubt: Ironically for now, my ultimate goal is to do so by running Returnil protection on her machine.

    Also, I need to update to latest build. Soooo, I will keep up with the thread and contribute as time/resources permit.
  7. ecotech
    Offline

    ecotech Registered Member

    Error usually occurs - altough not a rule -

    1. During normal use - word processor, browser, maybe some streaming audio

    2. When syncing with our central repo - via ftp, very small files, maybe a few megs of data, at the most

    3. Just sitting there, as in the picture below. Computer is not doing anything, just unlocked for use (these are public pc's), and then the error just pops up. Checked the exact time, in the Event Manager, it did not coincide with any other software doing anything.

    System cache was set to 2000 mb, I think that is enough for a computer with 512 mb ram, 2.8 ghz cpu.

    The errors started showing up since 2 weeks ago, after the last update cycle for the OS.

    If I disable write caching, the error shows up in C:\$Mft, but its still there.

    Attached is an image of an actual error happening.


    Damn, the best part of it all is that the systems that I was too lazy to update, work perfectly :))

    Attached Files:

    • S8.PNG
      S8.PNG
      File size:
      54.3 KB
      Views:
      28
    Last edited: Mar 11, 2009
  8. Coldmoon
    Online

    Coldmoon Returnil Moderator

    The error is indicating that the disk cache is full and as a result Windows cannot write to what it thinks is the actual system partition. Are you performing scanning or backup activities during this time?
  9. tekie
    Offline

    tekie Registered Member

    Where have I seen that error before? hmm.

    I've only seen it in 4 situations:

    1. Malicious software
    2. Micro$oft Update
    3. USB device was disconnected before it could finish writing data
    4. A bad sector on the hard drive

    What I would do?

    Run a disk check to make sure your file system is intact.

    I suppose you could uninstall the last couple of updates.
    Microsoft has been known to release "bad updates".
    They will release a "fixed update" in case this has happened. (eventually)
    (I'd say that this is probably the culprit, judging by the info you have given)

    Disconnect all USB devices to see if you still have the problem.

    I would run this program to check for malware:

    http://www.safer-networking.org/en/spybotsd/index.html
    Just install the "spybot program" - then update it .. after the scan/fix - uninstall it.

    Uninstall programs to find out which one is causing it. :shifty:

    Since you are experiencing this on multiple computers .. I'd say, bad sectors is a very low possiblity here.

    -

    EDIT:

    I just did some research on the web on "delayed write failed".

    A myriad of causes and effects .. oh my!

    Anyways .. there's a program I found that can clear and reset the "System Event Log" in the event that the file is corrupt.

    As always, use at your own risk:

    http://www.murphey.org/fixevt.html

    Just double-click on it and it's done. Make sure you do it in the real system and reboot.

    -
    Last edited: Mar 11, 2009
  10. ecotech
    Offline

    ecotech Registered Member


    Yes, there is a sync utility that sends via ftp all doc / img related files to a central repo. This is running on all computers, including those that do not have any errors.

    Also the error also occurs on systems that are doing absolutely nothing, so I can not tie it into some load issue.


    @Tekie
    The error is not hardware related, changed hard-drives, power units, cables...you know, the basic maintenance/troubleshooting routine. It is not malware either, I went through the system with everything I could think of. Its not usb related either.

    So, its some damn M$ Update. Which one...I can not tell. Problem is, that I really don't know what each update was about, to try and uninstall the relevant one...and by trial and error, it would take a whole lotta time I don't have.

    Still, this issue should be looked into, since for me its pretty obvious that disk cache+some obscure MS update+returnil personal edition = trouble.
    Last edited: Mar 12, 2009
  11. Coldmoon
    Online

    Coldmoon Returnil Moderator

    We are looking into this and every report we get. In this case however, we need to know if increasing the cache size on the systems you have that are effected helps resolve or mitigate this.

    Thanks
    Mike
  12. tekie
    Offline

    tekie Registered Member

    ecotech,

    It's pretty obvious at this point that SysEvent.Evt is the problem.

    It's most likely corrupt.

    That log file contains all the system events that happened to your computer until you clear it. You can do that, either with the utility link I provided above, or do it manually.

    Right-Click on My Computer, left-click Manage, Event Viewer

    * Application
    * Security
    * System

    Right-Click on each of the above and choose "Clear all Events"

    Make sure you do this in the REAL SYSTEM and Reboot Computer.

    If that still doesn't get it - then you'll have to delete it manually using a Live CD.
  13. tekie
    Offline

    tekie Registered Member

    image for above post:

    cm.png
  14. ecotech
    Offline

    ecotech Registered Member

    What I did:

    I split the problem computers into three groups:

    First group: Sysevent cleared, all MS updates left into place. Got one error in about 24 hours.

    Second group: Sysevent left alone, MS updates for the last 3 weeks uninstalled. Got no errors in the last 24 hours.

    Third group: Sysevent cleared, MS updates uninstalled. Now, I'll reinstall them one by one, attempting to see if I can reproduce the problem, maybe see which one is to blame.
  15. tekie
    Offline

    tekie Registered Member

    Thanks, we'll anxiously await your findings.

    -
  16. ecotech
    Offline

    ecotech Registered Member

    As I have tons of unused hardware on stock, slated for recycling, I expanded the "research" a little bit, since I wasn't doing anything of interest in the weekend anyway :) Dug up a few older Dell / HP / IBM computers from the garage, nice shiny XP on all of them, updated it to the very latest in everything, then installed Returnil and held my breath...

    Wouln't you know it, the only one that screamed in pain was the Dell. And even those stopped when I uninstalled the last 3 weeks worth of updates from MS.

    So, it looks like a pretty complex set of circumstances are needed to replicate this problem:

    1. Dell computer (remember, crofttk had the same issue on a Dell laptop)

    2. Some application to crunch on the hard drive ( like a sync tool - altough its not causing the issue, it helps)

    3. Returnil set to disk caching (even though sometimes it freaks out with mem caching too, but its 90% less frequent compared to disk mode)

    4. A weird set of updates from MS ( as far as I've been able to tell, its not any one update, but the collective of 3 or 4 - will come with the exact KB_xxxxxx numbers)

    5. ATA hard drive ( some of my "newer" GX models came with sata on board - so I stuck in them sata drives - wouldn't you know it, no errors with sata )

    Will update on this ....
  17. crofttk
    Offline

    crofttk Registered Member

    Oh, wow!

    Thank you for all the hard work ecotech!. Well, it's hard work for ME carving out the appropriate time for it. The "irony", let's call it, of this situation is she hasn't got the experience with these things that I do and needs unfailing reliability. So, I really want the security/stability that Returnil can provide between the times when I can update her security programs. With her though, it's once bitten, twice shy, so I need to wait for opportunities when she doesn't need the laptop.

    I checked and her hard drive is IDE, not SATA. A chance should come soon for me to reinstall latest Returnil build and I will then see if I can reproduce the problem.

    I'll report back with my observations too, once I get caught up here.
  18. Coldmoon
    Online

    Coldmoon Returnil Moderator

    Hi,
    Please watch your cache to see what might be happening while you have protection on. This could give an indication as to what may be filling the available space. To do this:

    1) Open the RVS GUI and click the Preferences button to open the settings menus
    2) Click the "Others" tab and then uncheck the option to hide tray icon hints if activated and close the GUI
    3) Hold one of your SHIFT keys down and then double click the tray icon. This will result in a bubble message from the tray icon with a message similar to the following:

    Watch this at predetermined time intervals and then see what process are active and writting to the disk with either great frequency or large file sizes.

    Mike
  19. crofttk
    Offline

    crofttk Registered Member

    OK, I finally got the subject laptop cleaned up and reinstalled the latest build of Returnil, enabling protection in disk cache mode, no virtual partition. It sat overnight, probably just background streaming updates from NIS2009 going on and not much else. Nothing of note in event logs.

    Before leaving to run errands this morning, I decided to stress system by running PefrectDisk2008 defrag on system drive (I'm aware defrag is recommended to be done BEFORE install of Returnil) because I wanted to at least force the delayed write failure to occur.

    Sure enough, the failures were occurring repeatedly when I came back even though PD2008 had finished the defrag. Tons of entries in the system events log. I won't get into specific verbage of messages yet unless Coldmoon wants them specifically at this point.

    Right now it is taking a very long time to shutdown and reboot "nicely". Seems to make sense to me if alot of (virtual?) disk writes have occurred which have to be processed somehow before shutdown.

    NEXT, I'll run awhile without any defrag or other stressing to make sure all is quiet (other than letting NIS2009 stream update). If all is quiet overnight, then I will try turning OFF write caching for this IDE harddrive and see if defrag gives same failures. This is just doing what I KNOW can induce the failures.

    If the outcome is, "Well, silly, don't defrag with protection on!" then I can understand and cope with that.

    Meantime, unless Coldmoon suggests other diagnostic steps, I will monitor availability of disk cache as Coldmoon has outlined above during my futher steps and provide that info with subsequent reports.

    BTW, laptop finally completed shutdown and reboot as I finished writing this, about a 5 minute total to get shut down.
  20. Coldmoon
    Online

    Coldmoon Returnil Moderator

    If this is what you are doing, then I strongly recommend not doing this as it makes no sense in theory and can cause unexpected issues as you have noted. Never, ever attempt to defrag your system partition with RVS protection on.

    Mike
  21. crofttk
    Offline

    crofttk Registered Member

    OK, got it. I found out that a simple chkdsk /F reboot did not fix the corruption that resulted in the above "test", it was infinitely looping through reboot chkdsk and could NOT clear the drive's dirty flag. I found I had to boot from the WinXP setup CD and enter recovery console and run chkdsk /p to get the drive to flag clean.:( SOOOOO, yes, MY BAD... If I get bad marks for trying it, hopefully I get a little credit for fixing the mess before my wife got home.:ninja:

    This foolish behavior is NOT what caused the original problem in the first place , however. I know you had advised not to defrag after I first performed the original install, so I never did try to defrag the original install. This was just an ill advised way for me to try to force it.

    SO, back to square one on my investigation. Returnil protection is now ON and has run another 3 hours with no problems or bad system events. I will just let things alone overnight now and check status tomorrow afternoon or so.
  22. crofttk
    Offline

    crofttk Registered Member

    OK, 24 hours have passed with protection ON. No delayed write failures. Disk cache availability shows 3090 MB / 4096 MB.

    Will this 3090 MB just continue to go down and then I have to reboot before it runs out?

    Unless Coldmoon tells me NONO!, I believe I'll go ahead and start WindowsLiveSync back up and see how that goes for next 24 hours.

    P.S. Only using WLiveSync for non-system partition files. I'll have to see if I get any problems from LiveSync AppData or regstry entries being static on the system drive.
  23. Coldmoon
    Online

    Coldmoon Returnil Moderator

    Yes, the available space decreases until it reaches zero. At this point you will get a delayed write error or a low disk warning from Windows. Take a "pulse" (check space available in the cache as described perviously in the thread) at regular intervals while the sync is running to determine:

    1) Is cache space being used?
    2) If #1 is yes, then check the available cache space with greater frequency
    3) Determine if you have allocated sufficient space in the disk cache

    Part of the learning curve with ISR involves watching what happenes so you can tweak the cache to perfrom optimally in your specific environment and is the major reason why the cache size can be adjusted when needed.

    Mike
  24. crofttk
    Offline

    crofttk Registered Member

    OK, thanks very much for that info, Mike. I think, considering that, I am better off using Returnil for my own purposes of fooling with "the dark side" then. For the purposes of fail-safing my wife's laptop, it looks like using FD-ISR in Freeze mode may be a better bet.
  25. ecotech
    Offline

    ecotech Registered Member

    My problems are pretty much fixed. Right now, in my case, I consider it resolved, since it shows up only in a unique combination of hardware/software/update condition.
Thread Status:
Not open for further replies.