Event ID:6004 - A driver packet received from the I/O subsystem was invalid.

Discussion in 'ESET NOD32 Antivirus' started by dwood, Jan 30, 2008.

Thread Status:
Not open for further replies.
  1. guest

    guest Guest

    Yes, I uncheck Network drives from the Media to scan section for real-time protection.
     
  2. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Hi, Marcos. You have a PM from me.

    Thanks, guest, for the additional information.
     
  3. goran_larsson

    goran_larsson Registered Member

    Joined:
    Jan 25, 2008
    Posts:
    51
    Location:
    Stockholm, Sweden
    Hi there mayt, sorry but this problem still exist in 3.0.642 altho I had it installed since the second day it was released. So I wouldn't consider it to be fixed just yet. many if not all of our clients computers can show the same message.

    But I can't say I really seen this message on a server yet mostly client, server do however suffer from the mrxsmb problem if you are logged on to the server and using network shares, when client access it clients get the error message of 6004 and most of the 3019's.


    Event Type: Error
    Event Source: EventLog
    Event Category: None
    Event ID: 6004
    Date: 2008-03-14
    Time: 18:51:48
    User: N/A
    Computer: XXX-YY-XX-YYY
    Description:
    A driver packet received from the I/O subsystem was invalid. The data is the packet.
     
    Last edited: Mar 17, 2008
  4. goran_larsson

    goran_larsson Registered Member

    Joined:
    Jan 25, 2008
    Posts:
    51
    Location:
    Stockholm, Sweden
    It's really difficult getting a memory dump out of this on a server because theres absoluteley no warning or anything (in the event log or in the log of nod32) til it actually halts, we have had this on 3 servers sofar 2 of them were domain controllers and we were forced to uninstall ESET. The third one was a file server only and I did manage to do 2 sysinspector files from it before I rebooted it, being a file server in one of our branch offices I really didn't wan't to force it to crash remoteley to do a memory dump, my best guess is that somehow nod32 slowly kills the rpc services, beginning with shares becoming unavailable, rdp unavailable til it finally is so hung you cannot even access it with the local console.

    Also the 6004 and 3019 is probably not the same as they appear mostly on clients and not servers unless you are activeley logged on to that server using network shares from it.

    Regards Göran
     
  5. mps_surcouf

    mps_surcouf Registered Member

    Joined:
    Mar 5, 2008
    Posts:
    33
    Hi Crookedbloke

    I have windows 2003 sp2 here installed 3.0.642 as we talked about.

    2 test shares each with 1 gig of test data.
    I have been browsing the shares locally using rdp for about 5 mins and getting the corresponding 3019 erros (normal for connecting to netwqork shares locally). No effects so far.

    Any tips on how I can repro this do I have to do it for a long tim

    If we can repro this eset should be able to fix it ASAP.

    Cheers

    Mike
     
  6. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Hello, mps_surcouf.

    You may already have reproduced the problem without (yet) knowing it. The surprise comes -- if it's going to come -- when you try to log off of the session. At least that has been the behavior on my systems.

    As you say, the 3019 messages are warnings, not errors -- and most of them are normal. With 3.0.642 I have seen only the normal ones, not the ones generated when getting access to shares via means other than Explorer.

    The Event ID 6004 messages, on the other hand, are errors. Previous to this round of testing I saw those on all systems just prior to the loss of ability to communicate with the console, either locally or via RDP.

    When I tried 3.0.642 with new driver and dll files made available by Marcos for testing I saw no unusual 3019 messages, and no 6004 messages at all. But the system did refuse to let me log off the remote session. At this point I had to simply close the RDP session. Logging back on to the affected server was not possible until it was forced to shut down manually (power switch) and rebooted.

    That issue, right there, is why I haven't been able to get a memory dump. Once the condition has been reached, whether invoked locally or remotely, the system simply won't react to the keyboard. No way to make it blue screen.

    This has happened on every WS2000 SP4, WS2003 SP2, and WS2003 R2 SP2 server I've tried it on. It takes only a couple of instances of browsing to cause the effect.

    Back in December with earlier builds of the version 3 EAV software it took a couple of weeks before we saw the loss of control effect, but I saw the 6004 errors right from the beginning. May not be directly related.

    BTW, is your WS2003 SP2 system a member of a domain? Not sure whether or not domain membership has anything to do with the problem, but all of my systems have been members of an AD domain.
     
  7. mps_surcouf

    mps_surcouf Registered Member

    Joined:
    Mar 5, 2008
    Posts:
    33
    Just to clarify

    Yes my machine is a memeber of a domain.
    I am testing with 3.0.642 no new driver.

    Can I confirm the logging off hang occurs consistenly with the new driver?

    Thanks

    Mike
     
  8. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Yes, the "hang" in communications with the console occurs consistently with the new driver and dll and with the standard ones, too.

    However, it took a couple of weeks -- back in the beginning -- for the behavior to start occuring. The behavior occurred, however, between the time 3.x was installed and the next time Microsoft updates were applied. This leads me to believe that the behavior is not related to Microsoft security updates. But over time the affected machines have become more and more sensitive to this issue. So that, by now, when I remove 2.70.39 and install any 3.x version the hangup will occur the very first time I log off (after browsing shares with Windows Explorer). It is quite consistent.

    And bizarre.
     
  9. mps_surcouf

    mps_surcouf Registered Member

    Joined:
    Mar 5, 2008
    Posts:
    33
    Hi

    Can't break it at the moment
    Will leave it in place for a while and try in another week.
    Will post back

    I have the feeling this issue may not be related to the errors in the event log.
    It sounds like a separte issue netirely (a more serious one).
    Could probably do with a thread on its own.

    Cheers

    Mike
     
  10. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    That's actually very good news. At least it means that there are some server configurations that aren't affected. It has been hard to judge by responses on this forum because, of course, people are less likely to seek a support forum to express satisfaction than they are to express disgruntlement.

    That being said, I have to wonder if this is a driver conflict of some kind -- one that, perhaps, affects only certain types of hardware. All of my servers are Dell PowerEdge rack mount systems (2550 to 2850) with PERC controllers for the arrays. At least some of these designs have been known to exhibit issues with various software utilities, though it has been quite a while since I've heard of a new problem along those lines. They had to do a massive set of BIOS updates on some of these servers about 5 years ago (IIRC) to fix some of those issues.

    And I agree about this problem possibly not being related at all to the Event ID 6004 error, but that is exactly where better communication from Eset would have been helpful to those of us out here in the hinterlands. Unless I'm mistaken they have been loosely referring to a group of symptoms as "the network browsing issue" and had declared it "fixed" in 3.0.642. This problem could certainly be referred to as a "network browsing issue", but it also isn't fixed -- at least not for me and for several other people reporting the same problem here and elsewhere. Bug fixes should, IMO, be documented very precisely as to the exact nature of the bug and its symptoms and the exact nature of the fix.

    I have hopes that I may be able to find a way to get a memory dump on this, but it hasn't been easy to do so far -- with the non-responsiveness of the system apparently defeating the ability of the CrashOnCtrlScroll
    registry edit to force a crash -- though, to be fair, I'm wondering if this could also have been due to some fumbling I was doing with the KVM / keyboard at the time of the test.

    Thank you for your efforts on this. If you wouldn't mind doing so, I'd appreciate it if you'd post the general nature of the hardware you were using for the test. Maybe, if those who are having the problem and those who are not will post their general hardware specs, we might see some sort of pattern. It's a long shot, perhaps, but I'd rather leave no stone unturned. It would be VERY useful to me to be able to move to version 3 of the software before the next big rollout here.

     
  11. mps_surcouf

    mps_surcouf Registered Member

    Joined:
    Mar 5, 2008
    Posts:
    33
    Hi Crooked bloke

    I am testing on

    Dell poweredge 2650 PERC 3Di controller dual xeon cpu with 2 Gb ram
    The os is windows server 2003 sp2 (slipstreamed for easy install in one go). No specific drivers are loaded. Everything is using the drivers included with windows server 2003 sp2.

    The emvedded NIC is broadcomm netxtreme gigabit adapter only one interface is connected other is disabled.
    The network driver is microsoft 2.91.0.0 dated 01/10/2002.

    Did you try a different netowrk driver to see if the behaviour changes. Might be worth a go and shouldnt be too much disruption (1 reboot). Forgive the suggestion if you already tried all this.

    Mike
     
  12. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Well, there goes my little theory. That system is probably identical to two of the servers I'm running that had the failure under Windows 2000 Standard Server SP4. Same array controller, same processors, same network adapter. None of my systems are multi-homed, and all have their unused network adapters disabled in DM. I have, indeed, tried different network drivers but am very happy to hear any suggestions you may have because I've done about everything I can think of to try to diagnose this. (Not that I've had a lot of opportunity to actually work on it because we have no "test" servers. Everything here is a production server.)

    The only difference, offhand, that I can see is that your installation was a slipstreamed installation. All of mine are old installations and had the SPs applied over time. (The WS2000 servers started at SP2.) As a matter of fact, the two WS2003 servers experienced some bizarre behavior when SP2 was applied. I had a hard time getting them back up afterward. One of them was demoted from DC to serve as a file server, the other is a DC and file server. It's the plain file server that I'm trying to use as a test system now, but it's darned hard to get any time from the production people.

    BTW, when I say "production" I mean exactly that. These things are running production lines. Most of them -- not the test system -- are running SQL 2000 (Don't get me started.) They are not doing e-mail and regular chores that most folks think of when they're talking about production servers. They are handling calls for gigabytes worth of data on an almost constant basis. They run smoothly and with no problems whatsoever when I'm running NOD32 2.70.39 on them. Matter of fact, I can abuse them in that configuration by eliminating all of the recommended exclusions and performing in-depth scans with NOD32 during production. But I can't even run EAV 3.x on them at all without seeing some pretty serious issues.

    I inherited this domain six years ago, and it had a lot of problems. I had to manually edit AD to eliminate issues caused by improper DC removals and stuff like that. I don't have any idea whether or not that might be having an impact on this issue.

     
  13. mps_surcouf

    mps_surcouf Registered Member

    Joined:
    Mar 5, 2008
    Posts:
    33
    Hi Crooked bloke

    I think the main difference is the volume of file serving that you are doing.
    Unfortunately it is going to be hard for me to replicate that without flooding my network with traffic. All I can do is see how the test system performs over time but as it isnt doing any real work I may not be able to contribute much to the issue,

    Mike
     
  14. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    I greatly appreciate your efforts and your input. And, I would point out that, though my problems with unresponsive servers didn't initially start until a couple of weeks after installing EAV 3.x, nowadays I seem to be able to invoke the problem within a few minutes of installing the software again -- and this would be on a server that is NOT serving files. It's just sitting there. It's almost as though there's a "leftover" effect from the previous installations and error conditions.

    Again, thanks for your interest. It is valuable to me to know that it may be possible for this version of EAV to become useable to me. I would much rather have its admin features available to me, as opposed to those of the previous version.

     
  15. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Hello all, I've recently registered on this forum (due to my recent issues with Nod32). I have 10,000 licenses for Nod32, my servers are mainly Windows 2003 Ent R2 SP1 x86 or x64 versions, with a few Windows 2003 R2/2000 Std x86 machines, roughly 100 DC's, fileservers providing DFS Replication/Namespaces, roaming profiles and several server specific shares to roughly 100 remote sites ranging from 40 to 2000 people per site.

    I have (unfortunately) recently upgraded all of our machines to v3 :mad:, rolling back is not an option, due to the time involved and the fact that the software, for some reason, will not automatically upgrade itself as it has in the past (any reason for this?). I was wondering, Marcos (or any other ESET employee), if you guys are actually getting close to a solution for this issue? I have several servers requiring physical reboots per day requiring me, or members of my team to travel to the site to perform the reboot. I have Network scanning disabled on all workstations & servers. Disabling real-time protection is not really an option (as CrookedBloke said) it would pretty much render the software pointless. I would be interested in trying the AMON driver you mention Marcos, and I will PM you to that effect.

    I moved to Nod32 after using SAV for many years (forced to due to licensing) and then trying Norton Enterprise AV, I found Nod32 to be everything it said on the tin, and most impressively, my users noticed the difference in workstation responsiveness, please please please, fix this issue ASAP, return Nod32 to it's former glory...

    EDIT: Unfortunately the PM system is currently un-available, I will check back later and attempt to PM you then Marcos.
     
  16. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Hi, Colditzz.

    Welcome to our veil of tears. ;)

    I had a terrible time trying to collect a memory dump for the folks at ESET, but I finally managed it. The problem was due to a specific configuration issue with our KVM.

    Marcos has been extremely helpful in providing guidance to me, and we have worked through the initial process of getting data for the ESET developers. Please persist in your efforts to work with Marcos. The more data we can provide them the better.

    I have been surprised to see posts from a couple of folks now who have not been able to duplicate the issue. My problem has been that there are no instances of servers in my organization which were NOT having the responsiveness issue!

    There has to be some common thread among those of us who have this issue. We should, perhaps, start a separate thread to try to discover just what those commonalities might be. I know of people who have similar hardware and OS configurations to mine who do NOT see the problem. Right now I'm wondering if they are using Veritas NetBackup. That utility has been a pain for us, and doesn't work reliably on some of our servers. It's the only functional issue we have with any of these servers now that we're back on NOD32 version 2.

    I, too, was hugely relieved to switch from the Symantec product (SAV CE) to NOD32 2.70.39. The difference it made here was nothing less than astonishing. I, too, went forward with the version 3 rollout (TWICE!) for this domain because of what I surmised would be greatly improved admin features. But, unlike you, I've rolled back to 2.70.39 (also twice). I truly had no choice.

    If I have to stick with 2.70.39, I'll just try to grin and bear it. At least it works very well for its intended primary purpose, and I can count on it to not beat up my domain. I hope ESET will support it fully until they've got version 3 up to snuff.
     
  17. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Hi CrookedBloke,

    I'm glad you have managed to get a .dump file created to pass to ESET, I hope that manage to find the issue... I have quite a varied hardware base, but I would say that even the newest servers we have are being affected in the same manner, these servers are 2 x Quad Xeons, 4/8GB RAM, Windows 2003 Enterprise R2 x86 SP1 version. I will gladly help as much as I can to get this issue resolved, I haven't put a call in to ESET as I see Marcos, et al seem to respond here more often than the tech support at ESET!! Unfortunately, the PM system is still unavailable on this board, so I am unable to request the AMON driver from Marcos, but I will keep checking...
     
  18. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Hi,

    Just wanted to let you know that I'm not having any issues using the PM system this morning. What type of error (if any) are you seeing when you try to PM someone?

    I have had problems with it before when using really restrictive security settings in IE7 (on Vista) and just changed the allowances slightly. When I was having that problem I would get the prompt for a user name and password, if I wasn't already logged on to the forum, and then submission of the message would fail.

    Thank you again for taking interest in helping ESET with this issue. The market is absolutely flooded with TERRIBLE antivirus software right now. I was really hoping that ESET's share of the market would benefit from the woes of Symantec and others. But the current crop of problems with EAV 3 mean that, when I get queried for a recommendation, I can only specify 2.70.39. If ESET can fix this puppy there are a lot of companies wanting and needing to find alternatives to the software that is crippling their operations.
     
  19. shadowpuk

    shadowpuk Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    4
    I'm happy to see that I'm not alone in this situation. I have the V3 version on all of our machines and I also have the 2 events everywhere (6004, 3019).
    Last thursday, the servers became unresponsive and the console was unavailable (hanging!). I had to to perform a forced shutdown and restart them...it was like "SYN FLOOD" attack!
    I'll be looking closely to this thread until a patch comes...either Microsoft or Eset...
     
  20. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    I'm getting a The private messaging system is currently unavailable. response when I try to PM Marcos, or when selecting 'send a new message'... I am currently looking at a server which is refusing to allow access to SYSVOL & NETLOGON shares... But I can logon to the server remotely...

    Ok, I have just (literally whilst typing this) got my shares back on-line with Nod32 still installed and without rebooting the server. I killed the ekrn.exe process, it was using over 80MB in RAM, it immediately restarted itself (as expected) RAM usage dropped to 20MB ish, then climbed to 28MB when I attempted to browse \\%computername%\netlogon (purposely to see if the local shares were responsive again) the files/folders were immediately presented without hanging the 'My Computer' window.

    I will be back shortly, unfortunately this is a live server, so I am unable to force a dump file to be created.
     
  21. guest

    guest Guest

    Do you have scanning exclusions set? Microsoft recommendes several folders under SYSVOL be excluded from any real-time scanners.

    In case this info helps; All my servers are generation 4 or 5 HP ProLiant DL380's and DL360's, a few ML350's and ML570's all with Smart Array controllers.
     
  22. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Hi guest,

    Yes & no!! I did have exclusions set, I had set them at install, however, I have also added extra exclusions via a config update task, but for some reason, this has removed all of my exclusions on all of my servers o_O so I now need to put them all back in again :'(, I am right in assuming this works via a config update yes? Otherwise, I'm going to be busy! Yesterday, I updated two servers that were having performance issues to 3.0.642.0, the servers seem better behaved today and I am not currently receiving the 6004 errors, but the 3019 warnings remain, expected. I now have the PM functionality available, so I will PM Marcos re the AMON driver in a few minutes.
     
  23. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Scrub my last comment, both servers have just dropped out again.....
     
  24. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Ok, servers are being rolled back as we speak. w/s staying at 3.0.642 for the moment. 5 Servers have needed impromptu reboots today...
     
  25. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    I'm sorry to hear about this. What version are you rolling back to? I had to go back to 2.70.39 on my domain. I have kept one member server (not a DC or in critical use for now) with 3.0.642 on it for testing. I've sent two memory dumps and an eamon.log collected with the special eamon.sys logging driver. I hope you and others will be able to collect this type of data and send it to ESET, too. I'm really hoping they'll be able to fix it. I have a LOT of work pending that is awaiting some type of outcome on this. If I have to stay with 2.70.39 it is going to radically alter the way I proceed in setting up a very critical new domain.
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.