AV-comparatives Retrospective/ProActive Test - November 2008

killua24 · Nov 30, 2008

Result:
http://www.av-comparatives.org

NOD32, only one who got ADVANCED+

very nice... ^^

doktornotor · Nov 30, 2008

killua24 said:

NOD32, only one who got ADVANCED+

very nice... ^^
Click to expand...

Ok, so a product that catches 25% more malware but produces 6 more FPs (out of 45.000+ samples) is a lot worse... Hmmm, another day, another broken test. Have a nice day.

swagman · Nov 30, 2008

maybe drweb could get a good score in the test

ASpace · Nov 30, 2008

doktornotor said:

Ok, so a product that catches 25% more malware but produces 6 more FPs (out of 45.000+ samples) is a lot worse...
Click to expand...

Re. the award - ESET can be unhappy , too , because of the previous test (the August 2008 one) . Just 0.3% below the 97% barrier but with less FP and got just Advanced award . Products with higher detection rate but more FPs got Adv+ award. In my opinion , if were about to judge this way , in the August test , only Symantec had to take the Adv.+ award because one can't have high detection rate but FP and also take Adv.+ , right ? Symantec had high detection rate and almost no FP - this is for Advanced + .

So , finally and again AV-Comparatives "realised" how important is to have small amount of FPs.

doktornotor · Nov 30, 2008

HiTech_boy said:

So , finally and again AV-Comparatives "realised" how important is to have small amount of FPs.
Click to expand...

I guess I need to make my point more clear... I don't care about a particular product's rating and ranking here but about the general methodology. 7 and 14 FPs is few, while 17 is many? And 62 is still many? Why? Because AV Comparatives said so? How about using some percentage values to get this into a proportion instead of such arbitrary and misleading ratings? How exactly is a product worse than one that detects 1/4 of malware less (which translates into 10.000+ samples missed) just because of 3 more FPs? This doesn't reflect real world usage patterns and experience in any way.

ASpace · Nov 30, 2008

doktornotor said:

This doesn't reflect real world usage patterns and experience in any way.
Click to expand...

Pretty much no test reflects real world experience

doktornotor said:

How about using some percentage values to get this into a proportion instead of such arbitrary and misleading ratings?
Click to expand...

This will again not show real world . 1 FP more or less or 1% more or less (this could well happen to any product/vendor) . For e.g. a product has always done perfect but this time it gets 0.1 % less and falls into lower cathegory . Why ? (note - the next time it will be the other vendor)

Osaban · Nov 30, 2008

doktornotor said:

Ok, so a product that catches 25% more malware but produces 6 more FPs (out of 45.000+ samples) is a lot worse... Hmmm, another day, another broken test. Have a nice day.

Click to expand...

They all seem to follow Virus Bulletin's pattern of assessing performance based on FPs, but then they expect Avira to clean up all the misses from the winners.

ablatt · Nov 30, 2008

doktornotor said:

I guess I need to make my point more clear... I don't care about a particular product's rating and ranking here but about the general methodology. 7 and 14 FPs is few, while 17 is many? And 62 is still many? Why? Because AV Comparatives said so? How about using some percentage values to get this into a proportion instead of such arbitrary and misleading ratings? How exactly is a product worse than one that detects 1/4 of malware less (which translates into 10.000+ samples missed) just because of 3 more FPs? This doesn't reflect real world usage patterns and experience in any way.
Click to expand...

Doktornotor is right. How do they decide what constitutes many vs. few false positives? How can 14 be few and 17 then enough to put you out the Advanced+ category? The effect the number of false positives has on your overall ranking has to be re-calculated in a more realistic way and based on the number of samples and number of detections.

I would way rather have a FEW more false positives to get 25% better detection any day of the week.

doktornotor · Nov 30, 2008

HiTech_boy said:

This will again not show real world . 1 FP more or less or 1% more or less (this could well happen to any product/vendor) . For e.g. a product has always done perfect but this time it gets 0.1 % less and falls into lower cathegory . Why ? (note - the next time it will be the other vendor)
Click to expand...

Well, using % will not do anything wrt the "doesn't reflect real world" issue but will give a hell lot more meaningful values and comparison -- i.e., show that with majority of products most users won't ever be affected by any FPs and show that a difference between the current "few" and "many" ratings is just useless for comparing the products.

Noone with a clear mind should judge product as inferior based on whether it produced a FP on 0.000029% of some weird-assed samples collection as opposed to 0.000023% for a competitive one. Yet AV Comparatives urges the reader to look at their overall ratings instead of the actual figures.

Firecat · Nov 30, 2008

doktornotor said:

Noone with a clear mind should judge product as inferior based on whether it produced a FP on 0.000029% of some weird-assed samples collection as opposed to 0.000023% for a competitive one. Yet AV Comparatives urges the reader to look at their overall ratings instead of the actual figures.
Click to expand...

For you and me, FPs aren't important. Most of the time we know what they are and whether it is a false detection and how and when do we get it fixed.

For the average user, that is not the case. He/she has his/her AV find a virus and then chooses to delete or quarantine it. Then suddenly his/her favourite program doesn't work anymore. It will cause undue confusion and inconvenience.

It's not just limited to that - because average users will not mind it too much - but there are considerations in some corporates where they absolutely must minimize false positives in order to maximize productivity. Mind you that there are other corporates where paranoid security is more preferred, but if, say there are mission critical computers somewhere - they may not want to risk losing data or time. For such scenarios FPs should be considered, and the average rating helps in showing how effective an AV is at heuristic/retrospective detection, with minimum inconvenience.

Maybe AV-comparatives should make this more clear but in the end AV-comparatives also does say that one should try the product and use whatever he/she likes. So I don't see a real issue here.

Pedro · Nov 30, 2008

I can agree that percentages should be used in FPs. Absolute values are not ideal imo.

Kosak · Nov 30, 2008

Heh, how can you say that e.g. 200 is 100% and 7 is 3,5%, when you don't know, how many false positives antivirus'll announce? You only know, how many files you gave to test package and how many were signed as false positive.

You cannot say that ADVANCED+ represent only proactive detection, because it represent rate between proactive detection and false positives.

doktornotor · Nov 30, 2008

Kosak said:

Heh, how can you say that e.g. 200 is 100% and 7 is 3,5%, when you don't know, how many false positives antivirus'll announce? You only know, how many files you gave to test package and how many were signed as false positive.
Click to expand...
Code:
# of FPs / # of harmless samples * 100 = FPs percentage rate
That's a whole lot meaningful figure to compare the products' FP rate which is actually useful for the users, unlike this absolute number nonsense.

Kosak · Nov 30, 2008

When you make more sensitive AV, you will achieve better detection. When you mix FPs and detection, you'll get rate. Few, many..., I think that it is calculated according the number of all AVs in test.

You can read about false positives at ESET's website in Blog part.

BlueZannetti · Nov 30, 2008

doktornotor said:

That's a whole lot meaningful figure to compare the products' FP rate which is actually useful for the users....
Click to expand...

This is a situation in which numeric stats really don't cut it and some type of clustering based rank ordering is more in line with reality. In the original document, the rank ordering went: very few (1)/few (7-14)/many (17-62)/very many (117). Personally, I would have done a somewhat different split: very few (1-3)/few (4-9)/average (10-3/many (39-50)/very many (50+), which is basically derived from a distance based clustering from the "mean" (with all values except 117, which is treated as an outlier).

However, even here the cluster cutoffs are rather arbitrary, so it's a bit of a toss as to how to break things down, as is the assessment of a penalty on the test to yield a final rating. For example, McAfee has never really had the impression of a magnet for false positives, yet one they did experience a few years ago still sticks in my mind due to the absolute havoc it caused. Some types of false positives are a whole lot more of an issue than others....

Blue

doktornotor · Nov 30, 2008

BlueZannetti said:

This is a situation in which numeric stats really don't cut it and some type of clustering based rank ordering is more in line with reality. In the original document, the rank ordering went: very few (1)/few (7-14)/many (17-62)/very many (117). Personally, I would have done a somewhat different split: very few (1-3)/few (4-9)/average (10-3/many (39-50)/very many (50+), which is basically derived from a distance based clustering from the "mean" (with all values except 117, which is treated as an outlier).
Click to expand...

Well, indeed they don't cut the ranking issue... See, I'm more after providing meaningful numbers to users so that they can draw their own conclusions from those (as opposed to the "ignore the numbers and look at the rating" suggestion by AV Comparatives crew). Still, the above suggestion (I suppose the rating scale would be recalculated on a per test basis) would be better and less arbitrary.

BlueZannetti said:

Some types of false positives are a whole lot more of an issue than others....
Click to expand...

Yeah, the quality of the samples being testing in this way is another issue with this kind of tests, and their test lacks any sort of "severity" weight for particular FPs.

De Hollander · Nov 30, 2008

So Nod32 got the only advanced+ because off less FP's

Looking at 1 and 4 weeks detection, Avira, Kaspersky, and GData

bellgamin · Nov 30, 2008

Kosak said:

Heh, how can you say that e.g. 200 is 100% and 7 is 3,5%, when you don't know, how many false positives antivirus'll announce?
Click to expand...

Obviously not a math major.

IMO, what doktornothot refers to is a RATIO of FPs to the total number of items in the sample. Whatever you call it, I fully agree with his comments.

killua24 said:

.....only one who got ADVANCED+

very nice... ^^
Click to expand...

It's NOT so nice if you get snagged by one of the undetected 46%. But -- even if you do get infected -- take comfort in the fact that it wasn't done by a nasty old False Positive.

C.S.J · Nov 30, 2008

ive always said the FP thing should be scrapped from altering results and percentages should be given.

if the FP test is of a million files, then 10-20-100 or whatever, are not that bad.

the way it is now, makes alot of companys seem like FP-machines, when truthfully, i dont think any of them are, maybe fortinet with heuristics on max would be, but all the others are respectable if its tested against a million files or so.

how many files are in the FP test, have i missed it?

dawgg · Nov 30, 2008

FP should still be kept IMO, shows if AVs go wild with detections or not... higher FPs = more laxed detections = worse detection engine...

Whats better,
AV with high detections + low FPs
AV with high detections + high FPs

I think the main problem is (with which I agree to a certain extent) is AVC's boundaries for Very Few, Few, Many FPs, why x is classed as Few and x+1 be classed as Many... (although of course, FPs should still be taken into account when giving a rating).

At least AVC gives the raw numbers and its upto people if they want to read the depths of the information or not.

I also agree with Blue... Some types of false positives are a whole lot more of an issue than others....
An AV may have 20 FPs which detect programs which hardly anyone uses, whereas another AV may have 1 FP which causes everything to go titsup... which is worse?

Then take time into consideration... the AV which had 1 FP quickly fixed the detection (although the damadge is done to many), whereas the AV with 20 FPs may not have fix the FPs because it may not yet know its got a FP and it may not have affected anyone...

Which AV would you rather have?... 1 FP, 1 major cockup or 20 FPs, 0 major cockups?

Of course, we all know AVs have FPs, some worse than others, some more than others, some both, but we cant rely on only looking at the numbers in AVC, it only gives FPs at a moment in time. Think outside the box. Even an AV with 0 FPs in AVC can easily screw up thousands of computers.

Trust your instincts and remain cautious and a low FP rate in AVC doesnt mean users can rely on their AV and a high FP rate doesnt mean the AV is more likley to cause damadge. It may give an indication of quality control, but a FP can still seep through good quality control and cause problems

Pedro · Nov 30, 2008

BlueZannetti said:

This is a situation in which numeric stats really don't cut it and some type of clustering based rank ordering is more in line with reality. In the original document, the rank ordering went: very few (1)/few (7-14)/many (17-62)/very many (117). Personally, I would have done a somewhat different split: very few (1-3)/few (4-9)/average (10-3/many (39-50)/very many (50+), which is basically derived from a distance based clustering from the "mean" (with all values except 117, which is treated as an outlier).
Click to expand...

It's always difficult to settle on this ranking criteria, so i don't think i can actually criticize IBK for that.

But i do think that, when you observe percentages rather than absolute values, you get a much more meaningful perspective. Perhaps then it's easier to rank them?

Sportscubs1272 · Nov 30, 2008

I believe AV-Comparatives should have a follow up test on how fast a company fixes the false positives and the severity of them.

maddawgz · Nov 30, 2008

Antivir up there again

acr1965 · Dec 1, 2008

So they rate advanced+, advanced, etc respective of how the av's did against each other?

dw2108 · Dec 1, 2008

Will Twister and Rising AV WITHOUT HIPS enabled, jump into testing?

Dave

Log in or Sign up

AV-comparatives Retrospective/ProActive Test - November 2008

killua24 Registered Member

doktornotor Registered Member

swagman Registered Member

ASpace Guest

doktornotor Registered Member

ASpace Guest

Osaban Registered Member

ablatt Registered Member

doktornotor Registered Member

Firecat Registered Member

Pedro Registered Member

Kosak Registered Member

doktornotor Registered Member

Kosak Registered Member

BlueZannetti Registered Member

doktornotor Registered Member

De Hollander Registered Member

bellgamin Registered Member

C.S.J Massive Poster

dawgg Registered Member

Pedro Registered Member

Sportscubs1272 Registered Member

maddawgz Registered Member

acr1965 Registered Member

dw2108 Registered Member

Log in or Sign up

AV-comparatives Retrospective/ProActive Test - November 2008

killua24 Registered Member

doktornotor Registered Member

swagman Registered Member

ASpace Guest

doktornotor Registered Member

ASpace Guest

Osaban Registered Member

ablatt Registered Member

doktornotor Registered Member

Firecat Registered Member

Pedro Registered Member

Kosak Registered Member

doktornotor Registered Member

Kosak Registered Member

BlueZannetti Registered Member

doktornotor Registered Member

De Hollander Registered Member

bellgamin Registered Member

C.S.J Massive Poster

dawgg Registered Member

Pedro Registered Member

Sportscubs1272 Registered Member

maddawgz Registered Member

acr1965 Registered Member

dw2108 Registered Member

Useful Searches