Event ID:6004 - A driver packet received from the I/O subsystem was invalid.

Discussion in 'ESET NOD32 Antivirus' started by dwood, Jan 30, 2008.

Thread Status:
Not open for further replies.
  1. jhurrell

    jhurrell Registered Member

    Joined:
    Nov 2, 2006
    Posts:
    10
    I've just logged in to see if the v3 issue on DCs was fixed, but I'm surprised to see this long thread still going in here. I too saw the infamous 6004 event ID errors, along with the inability to browse network shares and the blocking of workstations logging on.

    The final straw came when I had to power off the server following the network share blocking issue and on reboot the system hung at "applying settings" for 3 days (it would respond to pings, but would serve nothing else on the domain - logons, DNS, DHCP, etc.). In the end I bit the bullet and flattened the system and started from scratch - 2 very long days of work.

    I have rolled the server and workstations back to v.2,7 and I haven't had any more issues which I am very glad about.

    The system is a Dell 1900, 1x Quad core Xeon 2.3Ghz, 2GB RAM, SAS Raid controllers with 2x 500GB drives in Raid1. Using Windows SBS 2003, with 10 workstations on Vista Business.

    I hope that a positive outcome is forthcoming soon for you guys still struggling with reboots etc... you have my sympathies.
     
  2. MidSpeck

    MidSpeck Registered Member

    Joined:
    Apr 24, 2007
    Posts:
    30
    I'm curious if those who have PM'ed Marcos and tried this have had any success.
     
  3. MidSpeck

    MidSpeck Registered Member

    Joined:
    Apr 24, 2007
    Posts:
    30
    I have four (4) Windows 2003 Servers running EAVBE 3.0. One of the four has the problem where it ends up locking up and becoming unresponsive when the real-time scanner is enabled. The other ones have not ever frozen entirely, but I do see some 6004 errors in the System event log from time to time.

    ********
    The server which locks up contains the following hardware:
    Windows Server 2003 R2 Standard SP2
    SATA drives with default Microsoft drivers dated 10/1/2002. There is no hardware RAID going on, but I have dynamic drives enabled doing some mirroring. NIC is a Broadcom NetXtreme Gigabit ethernet with Broadcom driver (10/31/2005). This server is a DC and serves files. No other major server applications running. I guess I should note that an instance of Data Deposit Box runs at night and the Veritas Backup Exec 9.1 Remote Agent is also running.
    ---
    A server that has not ever locked up on me completely, but has some 6004 errors:
    Windows Server 2003 Standard SP2
    Dell computer with SCSI disks attached to a PERC 4/SC RAID controller. Drivers from Dell dated 12/11/2003. No dynamic disks. NIC is a Broadcom NetXtreme 5721 Gigabit with a Broadcom driver (6/19/2004). This server is a DC, serves files, and printers. It also has several applications installed (not SQL). Runs a copy of NovaBACKUP.
    ---
    Another server that has not ever locked up on me completely, but has some 6004 System errors:
    Windows Server 2003 Standard SP2
    Dell computer with IDE disks (on Intel 82801EB) drivers (4/11/2003). Dynamic disks are used for mirroring partitions. NIC is an Intel PRO/1000 MT with Intel drivers dated 8/14/2003. This server is a DC, serves files, and printers. Runs Microsoft SQL 2005 Express Edition. No backup agent directly on the system (just gets files through the shares).
    ----
    Last server with no problems yet (not even 6004 errors -- it's only been in service for about 2 weeks):
    Windows Server 2003 R2 x64 Standard
    Has a NVIDIA nForce RAID controller in use with NVIDIA driver dated 7/10/2006. No dynamic disks set up in Windows. NIC is NVIDIA nForce (on motherboard) with same NVIDIA driver date as the RAID. This server is a DC and serves files. No other applications or agents run on this system.
    ********

    So, I guess I have more success than some since it's really just the first server I listed. Just as a shot in the dark, how many other people have BackupExec or the remote agent on their servers experiencing problems?
     
  4. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    I've got another Veritas backup agent (NetBackup) running.

    I've been in touch with Marcos, and have sent ESET a couple of memory dumps and a log file generated by the special eamon.sys file they have made available for testing.

    I sincerely hope that those of you who are seeing these issues will take the trouble to work with ESET to gather memory dumps and eamon logs for them. I had to argue myself blue in the face to get our production people to allow me to take a server out of the loop strictly for testing purposes. It has severely affected production, but we feel we have no choice.

    The rest of our network is using 2.70.39 and is running perfectly. The version 3 EAV (3 different builds) were all disastrous for us. I hope ESET can fix it. They need your help.
     
  5. STI

    STI Registered Member

    Joined:
    Feb 25, 2008
    Posts:
    10
    All my blocked servers have BackupExec V11d running...

    Also no probs with V2.7x
     
  6. techie007

    techie007 Registered Member

    Joined:
    Jan 2, 2008
    Posts:
    125
    Location:
    Ontario, Canada
    I installed NOD32 (642) onto two new system last week. Both Windows 2000 PRO SP4. Similar in configuration (basic P4 workstations), one's a few months newer than the other.

    I'm getting 6004 errors on one of them, but not the other.

    What can I do to help figure this out? I'd like to know before I roll out the next 30+ machines, I've not seen this on other installs up until this one system -- server or otherwise.
     
  7. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Hi,

    The Event ID 6004 error may be, or may not be, part of the more serious problem, the loss of various pieces of server functionality.

    On my servers the symptoms are:

    1. Loss of RDP sessions.
    2. Loss of physical console sessions.
    3. Eventual loss of clients to connect to shares on the server.

    The only way to fix the symptoms is to force the server to shut down by manually holding in the power button, and then rebooting.

    The way you test a system to see if it has the problem is simply to log on to it, open a Windows Explorer window, and open a share on the local server by navigating to

    \\server_name\share_name

    I have noticed that the issue is much more likely to happen after having made changes in EAV settings of various kinds -- like Advanced Heuristics, etc.

    DO NOT do this on a production server that has active clients that are depending upon it for critical operations. If you do manage to cause the issue, the server may not be of any use to anyone until it has been rebooted.

    If you do find that you are able to invoke the problem on one or more of your systems, then you should communicate with ESET. They can give you instructions (already posted in one or more threads on this forum) for using a logging eamon.sys driver and for forcing a memory dump from the keyboard after invoking the error condition. They have been trying to get end users like us to collect memory.dmp and eamon.log files for them so that they can find a resolution.

    In the meantime, I strongly recommend NOT installing version 3 of the software on any critical assets. I am using 2.70.39 on my production domain, and it works flawlessly.

    Good luck!

     
  8. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Hi guys,

    Well, I return after the long bank-holiday weekend to continue the work started on Thursday. I am rolling my servers back to 2.7 but, I am going to try to leave the workstations at 3.0.642. The issue has completely gone away from the rolled back servers so far, so I'm happy about that. I PM'd Marcos on Thursday re: the AMON driver, is this the logging driver you speak of CrookedBloke? I have had no response as yet, but there has been a four day bank holiday weekend in the middle!

    EDIT: Also, whilst I remember! I have no Veritas/Symantec agents running on/against the servers in question.
     
  9. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Hello, Colditzz!

    Yup, the logging driver replaces the standard eamon.sys file. It creates an eamon.log file in the root of the system drive on the server. Marcos told me to expect it to slow the server down, but I noticed no difference in server behavior between the two different drivers.

    I thought it was kind of funny that you used the terms "on/against" when speaking of Veritas/Symantec agents. "Against" is definitely the term I'd use for their stuff these days.

    :D

    Let's hope ESET's developers are able to find some solutions. There are a LOT of people looking for replacements for Symantec (and other) anti-malware software.

     
  10. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    I totally agree with that statement...

    Could you PM me the eamon.sys driver and I'll start the logging as soon as I can, I've still had no reply from Marcos... I've setup a mini-test environment specifically for this issue, so I'll hammer away at it with robo/xcopy scripts and hope for a failure!!

    I hop ethey can fix it as I don't wish to return to Symantec/Sophos, etc... and not just because of the scale of the re-deployment :eek:
     
  11. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Colditzz, you've got a PM.

    :)
     
  12. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Thank you CrookedBloke :)
     
  13. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    I have just received notification that a module within 3.0.642 has been upgraded, I am assuming it is the Anti-virus and anti-spyware module: 1108 (20080325). Is this the fix we have been waiting for? Marcos, or anyone else from ESET, please en-lighten us...
     
  14. tomha

    tomha Registered Member

    Joined:
    Mar 25, 2008
    Posts:
    27
    Hi guys, another helping hand needed?

    My name is tomha and i joined this forum because of similar problems with a customers 2k3 sbs and NOd32 3.0.642.0(loosing network connectivity, unable accessing shares, painful slow logons).

    I couldn´t send a PM to Marco (PM System not available), so i can offer to try to collect a memory dump from the faulty server via this thread.
    You can contact me via email at thomas.hajek [at] medv.at

    Best Regards
    tomha
     
    Last edited by a moderator: Mar 25, 2008
  15. Marcos

    Marcos Eset Staff Account

    Joined:
    Nov 22, 2002
    Posts:
    14,456
    The problem has been narrowed down to the User profile hive cleanup service and the way it works. We are trying to make a workaround for that. In the mean time, run regedt32.exe and try adding ekrn to the exclusion list by editing the appropriate registry key as shown below:
     

    Attached Files:

    • uphc.png
      uphc.png
      File size:
      26.5 KB
      Views:
      2,508
  16. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Thank you for the update Marcos, however... I have some bad news. We don't run that service on our servers, yet we still have the same issue. I have searched the location you specify for the reg key, just to double check, but it does not exist...
     
  17. STI

    STI Registered Member

    Joined:
    Feb 25, 2008
    Posts:
    10
    Normally this service is running only on terminalservers. it was not running on all our crashed servers. o_O
     
  18. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    That may explain it then, I don't have any servers running Terminal Services, even though they are all configured for Remote Desktop Access, this is very rarely used, so I am checking all of my servers, when I have a few spare minutes here and there...
     
  19. tomha

    tomha Registered Member

    Joined:
    Mar 25, 2008
    Posts:
    27
    Maybe the wrong way. On my customers SBS 2003 there is no uphc service installed, too.

    @ Marcos: If you need another Memory dump from a hung up sbs 2003, i can provide one. unfortunately i cannot send a PM to you(The private messaging system is currently unavailable.
    ).
     
  20. MidSpeck

    MidSpeck Registered Member

    Joined:
    Apr 24, 2007
    Posts:
    30
    Hello all,

    I hope your day is going well. I am writing to update my last post and possibly give a few more clues.

    My new server has begun having problems, but it is unique in the following ways -- so they probably don't have anything to do with the server shares disappearing problem:
    1) It does not have any of the Veritas daemons running on it.
    2) There is no error 6004, or 3019 warnings in the system logs.

    The system stops responding to shares after heavy network load using the shares. Just like my first server... both of them are fine unless they get a lot of network usage.
    I don't have Terminal Services installed on any servers, but I do use the Remote Desktop for Administrators -- so that's basically the same in that regard. Interesting note: Once over the weekend, I tried to remote in to this server that had "hung." I get the initial login screen, but after typing in my credentials, it wouldn't go anywhere. I had some time, so I left it and it eventually did log me in after 10 minutes or so. The shares were still down, but I was able to get it to do a clean reboot. The event log showed nothing out of the ordinary.

    Oh, as a side note to respond to Marcus' post just above here: I don't have installed/run Microsoft's UPHClean service on any of the servers.

    I can't recreate the problem as easily as CrookedBloke, but it seems to happen when 1) the Real-time file scanner is on (default settings), 2) there are heavy access to file shares on the machine.
     
  21. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Just a quick note regarding what has been posted today in this thread.

    I'm thinking that UPHClean has little or nothing to do with the problem, though it is easy to see why it would appear to be involved. This service was designed by Microsoft to force badly behaving processes to relinquish handles upon the user profile hive. I do happen to have UPHClean installed on all of my servers. That's for two reasons. Microsoft recommends it very highly for servers configured like mine, and we had to use it with our previous antivirus software, Symantec AV Corporate Edition. If we ran SAV CE without UPHClean installed we wound up with user profile hive issues on every logout from the servers.

    But, as is being reported here, the problem with servers becoming unresponsive occurs whether or not UPHClean is installed. I would expect a memory dump with any system with UPHClean installed to indicate an issue between EAV version 3 and UPHClean. That would probably be because some function with EAV version 3 is not releasing the user profile hive properly when a log off is attempted. That does NOT mean that UPHClean is the culprit. It means that EAV is not relinquishing its handles on the user profile hive at log off time.

    My best guess.

    Disclaimer: I used to do a LOT of fairly sophisticated software development. Now for the part that proves I'm not really fully qualified to make my guesses. That was back in the day of machine language on 6502/65C02 and Zilog chips. Uh, 20+ years ago.

    :D

    But really, I'm thinking that what we're seeing is a LOT more basic than a problem with an add-on like UPHClean.
     
  22. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Good morning all, I have been trying the eamon logging driver on my test network for the last couple of days, I've tried pushing several hundred GB of data through it, from it, to it to no avail, I get a few 6004 errors, but not as many as I was experiencing on the live network :(. CrookedBloke, I think I have done exactly as you have done to cause the system hang, connect via Remote Desktop, connect via UNC to itself, copy data, disconnect? If I can't cause this hang on the test network, I may have to consider re-installing it on a live server temporarily, not my preferred option, but I think we need a memory dump from a server with no UHC service installed as well...
     
  23. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Actually I didn't even have to copy any files to/from the shares in order to cause the remote session hang. All I had to do was browse the shares!

    I did notice that it seemed as though a hang was more likely to occur after changing EAV settings, too. I'm not sure how this plays into the whole picture. It's very hard for me to get the time to do proper testing making single, incremental changes on the test server and retesting to see what changes in the system's behavior.

    Good luck with your continuing efforts to gather data for ESET. I'm going to see if I can get a little time to work on this today, too.
     
  24. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    OK, thank you for the reply, hopefully I'll be able to cause the system to hang soon, there's only so many times I can throw data at shares!!
     
  25. shadowpuk

    shadowpuk Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    4
    I just rolled back all our machines (server/workstations) to v2.7
    no more 6004 and 3019 errors :)...I hope my servers won't hang anymore (RDP, console, shares, etc).

    I think so that working hard with network shared files (read/write) caused the hangs.
     
    Last edited: Mar 31, 2008
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.