Evaluating AV Positives

Discussion in 'other anti-virus software' started by Diver, Jan 29, 2008.

Thread Status:
Not open for further replies.
  1. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    Today I took a DVD with many thousands of files on it that were collected over a 2 year period. This collection was not of known malware, but contained many files of dubious origin or likely hack tool designation. It was scanned with 5 AV's that were set up on real and virtual machines.

    23 different files produced warnings.
    16 of these warnings were unique, that is only one of the five AV's gave a warning.

    Of the 16 unique warnings 8 were for unusual run time packers. That does not mean the file is malware, only that it has a characteristic shared with malware. One AV picked up 5 of these, another found 3. Notice there was no overlap among the two AV's on this issue.

    Three of the unique hits were hack tools or gray-ware, although I felt the description was not adequately clear.
    That left 5 unique hits for which there is no explanation. They could be malware or false alarms, the later being more likely in my mind.

    Of the 7 non unique hits, only one was identified by 3 AV's, the remaining 6 by only 2 AV's. That means out of 23 detections only one item was flagged by a majority of the 5 AV's!

    Two of the AV's found 2 items each, none of these were unique. The interesting thing here is one of these two AV's is generally considered to be hot stuff and the other is considered to be mediocre around here.

    One found 7 items, 2 of which were unique, 5 non unique.

    One found 8 items, 6 unique and 2 non unique, but 5 of the unique hits were packers and this feature could have been turned off.

    One of the AV's had 13 hits of which 8 were unique and 5 non unique.

    The question is, with such discrepancies, how does one know if a hit is malware or a false alarm? If you upload it to Jottis, what criteria do you use with highly inconsistent results? Should every file with an unusual run time compressor be pitched? Are 5 AV's enough? I am starting to wonder.

    Based on this performance, could one draw any conclusions about the quality of detection of any of these AV's. I am not completely sure that I can.
     
  2. s4u

    s4u Registered Member

    Joined:
    Oct 24, 2007
    Posts:
    441
    I'm guessing you are not going to tell us what AV's you were using?
     
  3. jrmhng

    jrmhng Registered Member

    Joined:
    Nov 4, 2007
    Posts:
    1,268
    Location:
    Australia
    This actually illustrates the issue with whitelisting. What is a system of trust we can use to ensure the files we are executing is not malware.
     
  4. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    The 5 AV's were:
    Nod32
    Mcafee
    Symantec
    Bitdefender 10 Free
    Avira Free

    I am avoiding saying how each performed because of the A vs B policy around here.

    Remember, only one out of 23 suspicious files was recorded by 3 out of 5, and the remaining results looked very random to me.
     
  5. lucas1985

    lucas1985 Retired Moderator

    Joined:
    Nov 9, 2006
    Posts:
    4,047
    Location:
    France, May 1968
    For us non virus analysts, the hunting is certainly hard. But there are plenty of things to do to obtain a pretty accurate assesment of a given file. How has it landed on my disk? Does a Google search return a result (and what result) when looking for its checksum? Is it runtime packed? What do ThreatExpert/Norman Sandbox say about it? Does it behave normally when loaded in a VM with tools (TCP View, Process Explorer, EULAlyzer, a classic HIPS, etc)?
    What do Kaspersky's virus analysts say about it?
    Here is a little example :)
     
  6. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    That is a very good summary. Believe me, all this stuff got to my disk via the midnight software service, so to speak. None the less, the low level of overlap in detections leads me to believe there are a lot of false positives, many more than show up in published tests.
     
  7. lucas1985

    lucas1985 Retired Moderator

    Joined:
    Nov 9, 2006
    Posts:
    4,047
    Location:
    France, May 1968
    I tend to agree here. When you start to get out of the "common software base", FPs tend to appear anywhere.
    This also counts for behaviour blockers. I wonder how many FPs a hardcore gamer with lots of anti-cheats and trainers is seeing nowadays. Grayware is growing at an explosive rate.
    The test beds for FPs are too small, IMO.
     
  8. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    Yesterday I ran the same 5 AV's with another DVD of non mainstream software. Perhaps a pattern is beginning to emerge.

    I would have to say that as a general rule the messages given in AV positives are cryptic at best. It varies. Symantec will identify gray ware with greater clarity than the others and it gives a check box to make an exception for a lot of it. Bitdefender Free gives lots of packer detections, but they are fairly easy to identify after you have seen a few. Gray ware hits on Bitdefender free were particularly cryptic. Avira allows unusual packer detection to be switched on or off, a valuable feature IMO. Nod32 had the fewest number of detections each time by far. All were confirmed by at least one other AV. On the second round Nod 32 had only one detection confirmed by 3 other AV's. Either Eset has some particular genius for avoiding FP's, or they are missing a bunch of baddies.

    Product total detections/unconfirmed detections

    Nod32 1/0
    Symantec 6/1
    Mcafee 4/2
    Avira 6/1
    Bitdefender 7/4

    Total items: 14
    Items detected 4 times: 1
    Items detected 3 times: 3
    Items detected 2 times: 3
    Unconfirmed items: 7

    Notes: Symantec had 4 clearly identified gray ware detections that were eliminated from the numbers above. Of Bitdefender's 4 unconfirmed detections 3 were packers and one was gray ware, but not clearly identified as not a virus. There was one archive containing 5 related detections by Symantec and 2 by Avira. I counted this item as a single detection for each as the additional detections were dll's that needed the potentially infected executable to run.

    Since I do not have the facilities to clearly determine if any detected file is malware, no inference may be drawn as to the quality of any of the 5 products.

    edited to correct data entry errors.
     
    Last edited: Jan 31, 2008
  9. solcroft

    solcroft Registered Member

    Joined:
    Jun 1, 2006
    Posts:
    1,639
    Both, actually.
     
  10. Coolio10

    Coolio10 Registered Member

    Joined:
    Sep 1, 2006
    Posts:
    1,124
    Thanks for running the test. Shows whats the point of detection if it can't present it correctly.

    Each AV is taking different approaches i suppose. Symantec is trying to detect a wider range of viruses while most other av's try to identify the most popular type of viruses. NOD32 has really changed their ways, they used to receive Advanced+ on av-comparitives and other tests but now it is low but heuristics have improved highly.

    According to your results solcroft is correct. NOD32 had no fp's but also did not get many detections.

    Mcafee and bitdefender are in the same group but bitdefender obviously has better detection rates.

    Avira has extremely good detection rates but the problem is the false positives and explaining more about the virus.
     
  11. solcroft

    solcroft Registered Member

    Joined:
    Jun 1, 2006
    Posts:
    1,639
    Actually his results tells us nothing regarding detection rates, since they aren't even malware. He happens to be correct, but not because his test results showed us so.
     
  12. flyrfan111

    flyrfan111 Registered Member

    Joined:
    Jun 1, 2004
    Posts:
    1,224
    Yep, even a broken clock is right once a day, sometimes twice.
     
  13. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater

    I don't have a large enough test set of known malware to make a valid detection rate test. That is not the point. This is more about AV behavior, notification messages and the difficulty of evaluating positives. That is what th tests demonstrate. However, I did not expect such a large variation in results on this relatively small set so the result is a bit of a surprise to me. No product rankings or detection rates were given, nor were any intended.

    Thanks for trolling me anyway.
     
    Last edited: Jan 31, 2008
  14. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    This afternoon I ran AVG free against the second test set.

    Results were 12/6 detections/unconfirmed or unique detections. This also raised Avira from 7/1 to 7/0 and increased the total list to 20 items. One of the hits was the same one that Nod32 found, which gave that sample 5 detections out of 6 scanners. Nothing else was detected by more than 3 out of 6 scanners.

    One oddity of AVG is it gives some positives a red warning and others a gray warning. The descriptions gave no other hint as to the severity and there was no way to correlate this with any other product. My conclusion is that red and gray warnings should be regarded equally. Bitdefender Free had the next highest count of unconfirmed/unique detections but they could be evaluated if one has the experience to know Bitdefender flags many packed files without regard to what they actually do.
     
    Last edited: Jan 31, 2008
  15. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater

    Another classic troll comment.
     
  16. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    After taking another look at the data, I did notice there was a 5 out of 6 correlation between Symantec and Avira. There were no other correlations that I could see.

    At least for me, this process has opened more questions than it solved. Lucas1985 made an interesting comment about FP's when one gets out of the common software base. Perhaps I want to see how close to the edge I can get without falling off.
     
  17. Frisk

    Frisk AV Old-Timer

    Joined:
    Jan 28, 2008
    Posts:
    31
    Location:
    Iceland
    The thing is, when you scan this type of "borderline" or "gray" software, you will get some alerts from AV programs that use heuristic scanning, but those results don't really say anything about the ability of the programs to detect "real" malware.

    Now, you did not include my F-prot, but I would guess that the number of "detections" it would have gotten would have increased as the heuristic level increased - which also correlates with the increase in false positives in general.
     
  18. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater

    With Avira there was no change on the first DVD when going from medium (default) to high heuristics. There is a significant change when enabling detection of unusual packers.

    For me part of the evaluation process is dealing with known gray software. I don't mind if an AV picks up some well known password cracker provided it gives a clear indication that it is a "potentially unwanted program" or PUP rather than identifying it as a "trojan". Only Symantec gave a clear message on these. Where I see the problem is with stuff that is gray to begin with, say game cheats, but may have been infected to spread real malware in much the same way that many "free" screen savers are. The cheat will in fact work, but if your AV gives a false positive on it because it used some oddball packer, then you are in a fix over whether to use it or not. If the suspicious program is something that can be run in a sandbox, that is one way out.
     
  19. flyrfan111

    flyrfan111 Registered Member

    Joined:
    Jun 1, 2004
    Posts:
    1,224
    Those "oddball" packers as you call them are generally used to pack malware, they are used because of how they pack data makes it exceptionally difficult to scan with AV software. There is no shortage of legitimate packing/compression software, so legitimate software has very little reason to use such oddball packers, hence the reason they are not that common to begin with
     
  20. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    The oddballs are also used to prevent reverse engineering. Depends on what you mean by "legitimate". To me that means no undisclosed in "layman's terms" privacy problems and no unexpected actions. In that case Vista is not legitimate, as it phones home like crazy, XP calls sa.microsoft.com every time you search your machine, and Adobe Photoshop was just caught doing the same thing "to improve the customer experience". You can add Intervideo WinDVD to the list, and it will refuse to run if blocked by at least 2 firewalls that I know of. How about every program that packages either Google or Yahoo tool bars in their installer with the default action to install the crap, just in case you are in a hurry, or not too careful. When the so called legitimate players are doing things like that, its no wonder we don't have laws to deal with all the nonsense that is going on.
     
  21. flyrfan111

    flyrfan111 Registered Member

    Joined:
    Jun 1, 2004
    Posts:
    1,224
    Another reason they are used almost exclusively by malware creators.

    Not sure how worrying that a crack or cheat may be infected compares to installing Google's toolbar, most products I see installing it clearly ask if you wish to install it. Short answer is to pay attention during software installations. Yes, there should be a few more laws regulating software, honesty and integrity in the workplace would also suffice however.
     
  22. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    I just want accurate detection without someone else's ideas of morality as an excuse for anything less. As it is, I have reason to believe that some software publishers are submitting non-malware keygens and patches to AV labs with the hope of getting these things into the signature databases. If I was in the software business, its what I would do, among other countermeasures.
     
  23. lucas1985

    lucas1985 Retired Moderator

    Joined:
    Nov 9, 2006
    Posts:
    4,047
    Location:
    France, May 1968
    I agree here. I wonder if avoiding FPs so much it's really hurting ESET's detection rates.
    I believe that grayware will be a huge headache to viruslabs if it isn't already.
     
  24. Diver

    Diver Registered Member

    Joined:
    Feb 6, 2005
    Posts:
    1,444
    Location:
    Deep Underwater
    I reran McAfee today and added Clam AV.

    McAfee score changed to 3/0 (no unconfirmed hits) and PUPPUA detections were well marked.

    Clam AV scored 13/8.

    I have been excluding clearly marked potentially unwanted programs/applications (PUP/PUA). Clam marks unusual packers PUA.Packed and stuff like password crackers PUA.Tool. I don't think this is enough to exclude the PUA.Tool category, but if you did the score would be 11/6. If the two packers were removed, 9/4. However, both of these pua.packed detections were confirmed. Clam has a new catgory called exploit. These appeared to be FP's picking up the firewall signatures from various Symantec products. There were three of these.

    After including ClamAV, there were two items with 4 out of 6 scanners going positive.

    Bitdefender had 4 packer detections, none confirmed.

    The inconsistency in so called packer detections is troubling.
     
Loading...
Thread Status:
Not open for further replies.