Event ID:6004 - A driver packet received from the I/O subsystem was invalid.

Discussion in 'ESET NOD32 Antivirus' started by dwood, Jan 30, 2008.

Thread Status:
Not open for further replies.
  1. Marv Gordon

    Marv Gordon Registered Member

    Joined:
    Nov 2, 2007
    Posts:
    59
    I'm also able to force the 3019 errors but my symptoms are a little different. Just put a new file server into production in our VMWare ESX environment.

    Running nod32 v3 .642, backup exec 11d agent, Vizioncore VSS agent. This is our main file server.

    4 times in the last two days since the conversion all clients on the network have lost connectivity. The server (W2K3 Enterprise R2 fully patched) I can get to the server via the VMCenter interface. The performance tracker shows no Network traffic. If i try to restart the Server service, it times out.
    I reboot the server (it's not totally locked up) and all is well.

    I've removed V3 and will see what happens.

    Can we download and run V2 standalone with our username/password combo?

    THanks
     
  2. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Yup. Go to ESET's downloads location to get "purchased" software. Select the business option. Scroll down the list. You're looking for version 2.70.39. Your user name and password will work for downloading it and activating it.

    It's a good thing that version 2 is still available and supported. I think it's unlikely you'll have any problems with it.
     
  3. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    I am still unable to replicate this issue on my test network :(, are ESET getting any closer to a fix with this problem?
     
  4. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    I surely do wish I knew what difference in configuration (That's bound to be what it is, I suppose.) between our systems accounts for the difference in experience with this software. I've seen the problem on every single server I've installed 3.x upon, and you can't get the problem to happen.

    It would also be interesting to know what percentage of WS2000/WS2003 users are seeing the problem.

    ESET folks have been curiously quiet -- haven't heard any more by PM or e-mail.
     
  5. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    I had the issue on some live servers, well nearly all 120 of them, but I can't get it to happen on my test network! I did notice that when I took the version to 3.0.642, an update applied from ESET to one of the scanning modules, since that update went on, the workstations and my servers on the test network have been behaving themselves, better... The 6004 error does still appear, just not as often as it was before, if you'd like me to try your config, then attach it to a pm (if that's allowed here) and I'll gladly try it... As a side note, I still have version 3.0.642 running on my 64-bit servers, as they have not - yet - thrown a wobbler!!
     
  6. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Now I find that to be very interesting. You see, my "test servers" have consisted of servers which were running as production servers, but which have been taken out of service and reserved for testing. And, of course, I saw the problem on my active production servers when I still had 3.x installed on them.

    This makes me wonder if there could be some sort of "threshold" phenomenon at work here -- as though the issue might require some sort of history of AD replication and/or DFS replication. (All of my servers have been, or are, members of distributed file systems.) But, frankly, the problems I've had have been many and varied -- so much so that it's darned hard to figure out what is going on. Very frustrating problem.

    When I was wondering about the significance of configuration, I was referring to the general hardware and software configuration of the servers, not to the configuration of the AV software. One reason I'm not particularly concerned about the EAV configuration is that I have successfully got the test server I'm using currently to show exactly the same symptoms with all of the aggressive settings turned off. In point of fact, the settings seem to make no difference whatsoever in the behavior -- other than the fact that turning off real-time protection entirely seems to reduce the likelihood of a server failure by a great deal, though I'm not absolutely sure that it results in a complete cessation of the problem.

     
  7. shadowpuk

    shadowpuk Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    4
    Do you think it is related to the Windows Small Business Server edition?
    My file server is a Windows SBS 2003 and I had the problem before I got back to v2.7 on every machines...I see a lot of users in this thread that also have a sbs server...Maybe SBS is the common pointo_O
     
  8. jgsouthard

    jgsouthard Registered Member

    Joined:
    Jul 13, 2006
    Posts:
    10
    I've seen it on Windows XP Professional SP2, with peer-to-peer networking to file shares on a linux-based NAS box.
     
  9. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    It's not strictly an SBS issue. I had the problem on ALL of my production servers, and none of them are SBS. The are WS2000 SP4 through WS2003 R2 SP2.
     
  10. shadowpuk

    shadowpuk Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    4
    Ok then...forget it :)
    Thanx!
     
  11. STI

    STI Registered Member

    Joined:
    Feb 25, 2008
    Posts:
    10
    Yesterday I installed V3 on one of our servers which is not so important for our daily business. This seems to be the first server without problems here.
    One of the difference to the other, problematic servers seems to be the number of simultaneously opened files. The crashed server all have some dozen files open.
    Could that be the reason o_O
     
  12. m_ellis

    m_ellis Registered Member

    Joined:
    Apr 4, 2008
    Posts:
    1
    Okay,

    I have been watching these kind of threads for some time now and I guess it's time I pitched in.

    I run 20-odd servers in a University department and we took the step of upgrading to v3 a while back (straight in at 3.0.621).

    Our servers are mostly the same hardware but not all. They are all running Windows Server 2003 SP2. (R2 in some cases).

    I have had zero problems with the majority but terrible problems with three of them. The three in question DO have something in common. They have open file shares that in in constant use. These three machines exhibited the complete lock-up problem almost straight away.

    Two of them are part of a file serving cluster and lasted no time at all. After a forced reboot, they would last another indeterminate length of time - based mostly on amount of remote file access going on. The other also has open file shares that are well used but I would not class as heavy use at all. It also exhibited the slow down and then complete lock-up.

    All of the other servers are web servers, DNS servers, domain controllers, database servers etc. and do NOT exhibit the problem at all.

    The bulk of our machines are HP DL380 G5's (including two of the three). The other one of the three is an HP ML370 G4. All have the Symantec BackupExec 12 agent on them (but this is common to the other servers also).

    I also have been waiting quite a while now for a solution to something that I thought was a show-stopper. Indeed, I can't quite see how this one got through testing as my own experience suggests that putting this version on a Windows file server and then hitting it with requests flattens the box in fairly short order...

    Mark
     
  13. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Just for our information -- was this done with 3.0.642.0? 3.0.650.0?

    I can tell you that I have had a half-dozen "brainstorms" associated with various patterns I thought I was seeing in the behavior of my afllicted servers. The matter of how heavy the access to network shares on these systems was certainly did seem to be one of the determinants of how long it would take before I would see symptoms.

    But there also seem to be other factors in play. For instance, just changing the real-time protection settings (in either direction, more stringent or less stringent) seems to make loss of server responsiveness to console / remote sessions (my main symptom here) more likely to occur.

    Also, there seems to be what I think of as a "threshold" effect with respect to the history of a server. I have had to reconfigure all of my active production servers with version 2.7. But I still have one server that I reserve exclusively for testing this version 3 problem. That server was actually one of the least affected early on in the game -- probably because its configuration is the simplest (NOT a DC, simply set up as a member server running WS2003 SP2 with a few network shares which are NOT being accessed by normal client loads). Yet, as time goes on, it has become easier and easier to cause this server to become non-responsive. There was a brief reprieve from this downward slide when I installed 3.0.642.0. It became a little harder to cause symptoms. But causing loss of a remote desktop session has gradually become just a matter of browsing network shares on the same server from within the RDC connection -- or changing real-time protection settings in EAV. If I remove EAV v3 and install NOD32 v2.7 on it I can hammer that server as hard as any of the others with no ill effects. If I go back to version 3, I see the problems again immediately.
     
  14. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Yup, that's the thing that really bothers me. It's almost as though testing consisted of installing EAV on a couple of clean Windows servers and watching briefly to see if they would crash. It's hard to believe that anyone actually tried this software out on a real world network that was doing actual -- you know -- work. Yet you report that many of your servers are not affected adversely. That actually makes me feel a little better. But it also tells me that ESET needs to be MUCH more aggressive about their testing. They can hardly expect guys like us to beta test antivirus software on our production domains. So, they have to do that sort of thing themselves -- or perhaps they will have to make special arrangements with long-term customers to help them out with the testing logistics in exchange for more favorable licensing terms.

    There are some mighty big players (like Symantec) that are putting out some AWFUL antivirus software right now. ESET needs to be shining right now if they want to be important in the enterprise market. They couldn't pick a better time to be looking good, and they couldn't pick a worse time to be looking bad.

    As for me, I'm not a brand loyalist. I'm a pragmatist. I want to use what works best right now. But I'm having a really hard time finding antivirus software that can run on these servers without bringing them to their knees. There are days when I consider running without any antivirus protection at all. And that's exactly what I'd be doing right now if support for 2.7 dried up suddenly.

    I have never had a single minute's downtime due to any type of malware infestation on this network, nor has malware ever been detected on any of its systems. But I've had a ton of downtime due to misbehaving antivirus applications, Symantec being the worst but with a (dis)honorable mention going to EAV v3. When the cure is worse than the disease, you take the disease.

    I'm really stupefied that we haven't seen something that looks like an all-out effort by ESET to deal with this. And, for all I know, there may be furious activity going on behind the scenes. But I've been baffled by their lack of forthright communication about these problems as much as I've been disappointed in the new product.
     
  15. goran_larsson

    goran_larsson Registered Member

    Joined:
    Jan 25, 2008
    Posts:
    51
    Location:
    Stockholm, Sweden
    Still the 6004 error seems more frequent on clients rather than servers, in our case we don't run terminal services which means no user is typically logged on to the server doing normal filesharing tasks etc, I'm not sure the complete lockups are even related to this since I still have to see a client do a complete lockup like a server och domaincontroller !

    Fact, the 3019 and 6004 warning and errors were never or rareley seen in the event log of the clients or servers while after installing nod32 v3 they are rather the rule than the exeption.

    A software that generates warnings or errors that is either wrong or faulty is usually a problem because it will make me focus less on other problems running these systems which can make it a real problem.

    Please fix this once for all, its been almost 5 months now that we have had these issues and were still without any real time protection on our servers because of the lockup problems which makes me think that I will eventually have to replace nod32 since it cannot do the task it was supposed to do to begin with.

    Regards Göran
     
  16. STI

    STI Registered Member

    Joined:
    Feb 25, 2008
    Posts:
    10
    The server is running V 3.0.643.0. Until today it has no problems. But it has no shares, it is only running two squid tasks.

    Has anyone tested the new version o_O


     
  17. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    Just to be sure that there's no misunderstanding about what I'm reporting --

    The servers I'm talking about are not running in terminal server mode. The remote sessions I'm talking about are admin connections via RDC, and the Event ID 6004 errors were being generated when an admin browses network shares on the server to which he is connected via RDC. This is obviously not what comprises our normal operations but is, rather, a situation seen daily during routine maintenance. Our clients access shares the usual way.

    The 3.0.642.0 iteration of EAV has NOT caused any more 6004 error messages, BUT the loss of remote session connectivity, reduced server responsiveness to requests from clients, etc. has continued. It appears that EAV does not release the user profile hive correctly when logging off after certain sets of actions (like browsing network shares) have been taken. An ESET person has stated that they believe that the problem lies in an interaction between EAV and the UPHClean service. I am running UPHClean on my servers. However, there are folks reporting here who say that they do not have UPHClean installed on their servers, and that they are seeing the same symptoms.

    I, for one, have seen far fewer issues running EAV on client operating systems, though there have been some annoyances. I'm not seeing problems with any of my systems (except the test rig) right now because all of them are running NOD32 2.70.39, which does its job very well.

     
  18. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    STI: I am hoping to get a chance to test 3.0.650.0 on my test server this week. My testing of it on a client system showed me that the client side issues I have with it on that particular client system have not been fixed. Those issues are, however, unusual ones which are probably not going to affect most users. (For instance, EAV 3.x has no way that I can find to allow proper operation of Privoxy and Tor. NOT an issue for most corporate client systems -- or for most home users, for that matter.)
     
  19. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Has anybody tried the 3.0.650 version as yet, I have just downloaded it, but am a little nervous about installing it!!

    Ooops, should have refreshed the page before posting! I'll try 3.0.650 on my w/station in a minute as I have rolled it back to default 3.0.642 with standard drivers, and eamon.sys file, the 6004 errors have returned immediately... I'll post back in a few with my findings on the client side with 3.0.650...
     
  20. ShadowInc

    ShadowInc Registered Member

    Joined:
    Apr 1, 2008
    Posts:
    9
    I've been following this thread and I thought I'd chime in. I have version 3.0 installed on all my clients, 12 total, running XP sp2, and my SBS 2003 Sp2 server. I have received a few event IDs 6004, but I've experienced 0 lockup problems with advanced heuristics and network drives unchecked in the real-time system protection. Before I did this I noticed strange CPU usage (25%) by ekrn.exe at odd times, but no lockups. I can browse network shares locally and have no problems with remote desktop connections. The server serves as everything, DNS, DHCP, File server, SQL server, Exchange server, domain controller, etc. It is not the RAS for nod32. That is on a separate machine. I do not currently have an update mirror set up, all the clients update over the internet. With these problems I've read about, I'm a little worried that I should roll back to 2.7 from 3.0.650, but I haven't had any noticeable problems since adjusting the real-time settings. I just thought I would give my situation and hope that it might help a solution be found :).
     
  21. ShadowInc

    ShadowInc Registered Member

    Joined:
    Apr 1, 2008
    Posts:
    9
    Also, I don't have UPHClean running on the server.
     
  22. guest

    guest Guest

    I have on a 2003 R2 SP2 server, I didn't get the usual errors/warnings in the event log. Then again, my server stopped logging anything except for the Security log...
     
  23. Colditzz

    Colditzz Registered Member

    Joined:
    Mar 19, 2008
    Posts:
    46
    Well, the 6004 errors are still there for me, I've read plenty of posts stating BSOD's after installing this version, it's on my laptop currently, has been since I posted earlier, I've had no BSOD, yet... But, the errors are still present...
     
  24. mickhardy

    mickhardy Registered Member

    Joined:
    May 16, 2005
    Posts:
    140
    Location:
    Australia
    Ditto.

    SBS 2003 with XP SP2 clients. All clients are running ESS 3.0.640 and all clients log this error regularly. My computer on the same Network is running ESS 3.0.650 and logs the error regularly. The Server is running Nod32 2.7 with XMON and does not log this error.

    I did try EAV V3 on the Server ages ago when it was first released and experienced a complete lockup at the time. There were ISA 2004 loop back errors and after a small amount of research, when I realised XMON wasn't supported or available I removed V3. I haven't had V3 anywhere near the Server since then.

    All clients update from an ERA mirror on the Server.
     
  25. CrookedBloke

    CrookedBloke Registered Member

    Joined:
    Oct 15, 2007
    Posts:
    110
    SOME IMPROVEMENT?

    I tested EAV 3.0.650.0 on my test system today. Test system is a member server on an AD 2003 (R2) domain, with the OS of the server itself being WS2003 SP2 (NOT R2).

    This server has been used to test each of the iterations of version 3 of EAV, and it has been very easy, previously, to get it to show the symptoms of loss of RDP session by simply logging in remotely and browsing the server's own network shares via the UNC convention (\\servername\sharename). Early on in testing my systems would give Event ID 6004 errors in their system logs before losing the remote session. With the advent of 3.0.642.0 the 6004 errors stopped appearing, but the loss of the remote session and general loss of responsiveness of the server still occurred.

    Marcos had me gather some memory dumps at various stages, and then he asked me to perform a registry edit (adding ekern.exe to the user exclusion list parameter of uphclean) before testing again. With 3.0.642.0 my system still became unresponsive.

    However, today with 3.0.650.0 I was unable to cause the system to become unresponsive through any of the usual means -- until I removed ekern.exe from the exclusion list!

    I have not had a chance to really bang away at the server yet. I hope to get a chance to try again to evoke a failure this afternoon.

    This development is exciting, if a little confusing.

    1. IF the problem is caused by a conflict with UPHCLEAN, why are people who do not even have UPHCLEAN installed on their systems reporting the same problem?

    2. IF the problem is NOT caused by a conflict with UPHCLEAN, why am I NOT seeing the problem if I exclude ekern.exe in UPHCLEAN's parameters?

    3. Why are some people continuing to see Event ID 6004 with these later version 3 builds of EAV, while others (including me) do NOT still see that error? And why would everyone continue to see to see the remote session and share availability issues regardless of whether or not we see the error messages?

    This is really weird. I hope someone besides me can test this latest version of EAV on a DC without UPHCLEAN installed at all. I also hope someone can test EAV on a DC with UPHCLEAN installed, and with and without the registry exclusion for ekern.exe applied. I can't make my test system a DC right now due to network constraints, or I might try to do some of this additional testing. As it is, I'm just going to try to hammer the daylights out of it this afternoon in its current configuration (set up as a member server with UPHCLEAN installed but with the exclusion for ekern.exe added to the registry).
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.