Dont trust the av comparatives results

Kees1958 · Mar 10, 2010

Guys,

I know I started a thread which laughed at the stupidity of SOME of the you tube testers (trying to shoot themselves in the foot and complain they manage to do so).

But let's give Dr Pan_k (intruiging nickname: Am I suppoed to guess the _ ?) some cred's. An enthousiast did have the same observation as an expert. Best answer for me was the fact that the AV;s are used with high heuristics. Some heuristics go crazy when they find out one (usually suspicious characteristic of an executable), so that explains it for me.

To the OP: better next time you state that you noticed something and ask for experiences of others or explanation. The responses are usually more open/friendly when you do not attache a conclusion on such small test samples.

Regards Kees

dr pan k · Mar 10, 2010

yday night i checked what is probably the best considered av comparative test results of 2009 and gold winner in fp detection had as low as 20 fp out of a huge selection of samples. Its simply hard to believe those results when u bump into some fp for that same product in an every day use of the vt. How can they have only 20 fp's and i get several just by ordinary checking the stuff i downloaded ?? i got some innocuous files tagged like trojan and worm....

Maybe it is to blame the autoit scripts generally or maybe some of the engines are low tuned or high tuned and in real enviroment they perform a lot better than the vt web site. Yet again after this i simply dont trust the findings of the comparatives as i used to.

Kees1958 said:

(intruiging nickname: Am I suppoed to guess the _ ?)

Regards Kees
Click to expand...

as for the nick, it actually represents who i am through a pronunciation game and makes fun of the social role in it

Tnx to all for the replies

dawgg · Mar 10, 2010

dr pan k said:

by reading the replies of the other members it seems as if i am the only one who doesnt believe those fantastic % published all over the net
Click to expand...

Believe them or not, IMO, they're true for those specific set of samples and settings used to perform the test at that specific time.
Again, not necessarily same settings and engine as that on VT.

You just wont see the same results if you're going through a newer sample of malware, lets say, found in the last 24hrs. You need to look at dynamic tests get a more realistic view of this - although the AVs also use other modules of protection in these tests rather than only scanner, its still the most realistic IMO.

The details are there with the statistics, you need to read and analyse what the statistics show, not only the %.

dr pan k · Mar 10, 2010

dawgg said:

You just wont see the same results if you're going through a newer sample of malware, lets say, found in the last 24hrs. You need to look at dynamic tests get a more realistic view of this - although the AVs also use other modules of protection in these tests rather than only scanner, its still the most realistic IMO.
Click to expand...

The point is that i didnt went over the net to gather some fresh out of the box samples. i simply checked the files as i was downloading them for personal use. this means that these were older than a few days and i didnt tried to manipulate them in any way. they were not zipped or nothing. If 98% of detection with les than 20 or 30 fp's were true or even close to reality this wouldnt hae happened.

what u suggest is that i should not take seriously in consideration the static tests and pay attention only to dynamic ones..
in the dynamic test several products had more than 90% of detection with none to 1 fp.... though the data are not as detailed as other tests this still is not even close to my personal experience.

and one last thing. if im not wrong in this specific test the team of experts used some 100 samples. statistically speaking my 20 or so is not so far away, though they concern only one category of apps.

ps: i believe that the specific av testing team is doing a pretty decent job since they usually publish complete stats. this was not the case of the dynamic test. I would be very interested to read what someone whos directly involved in this kind of testing has to say..

kwismer · Mar 10, 2010

dr pan k said:

yday night i checked what is probably the best considered av comparative test results of 2009 and gold winner in fp detection had as low as 20 fp out of a huge selection of samples. Its simply hard to believe those results when u bump into some fp for that same product in an every day use of the vt. How can they have only 20 fp's and i get several just by ordinary checking the stuff i downloaded ?? i got some innocuous files tagged like trojan and worm....
Click to expand...

the test uses a more representative sample of all files everywhere, rather than just autoit scripts. i mentioned this before but i guess the significance was overlooked. your sample selection is incredibly biased when you only use autoit scripts. autoit scripts may be (i can't conclusively say they 'are') more prone to triggering false alarms than other file types. when you only use 1 type of file you're virtually guaranteed to get a different false positive rate (either higher or lower) unless you happen by chance to choose a file type that magically has exactly the same false positive rate as the average.

dr pan k said:

Maybe it is to blame the autoit scripts generally or maybe some of the engines are low tuned or high tuned and in real enviroment they perform a lot better than the vt web site. Yet again after this i simply dont trust the findings of the comparatives as i used to.
Click to expand...

while it's true that all comparative tests need to be taken with a grain of salt, your experiences and more importantly your interpretation of those experiences that you've presented here are far more suspect.

kwismer · Mar 10, 2010

dr pan k said:

The point is that i didnt went over the net to gather some fresh out of the box samples. i simply checked the files as i was downloading them for personal use. this means that these were older than a few days and i didnt tried to manipulate them in any way. they were not zipped or nothing. If 98% of detection with les than 20 or 30 fp's were true or even close to reality this wouldnt hae happened.
Click to expand...

yes it would. i'm sorry to say but you are repeatedly demonstrating a complete lack of understanding of statistics and sampling bias.

as such i'm going to explain it in terms that everyone can understand.

lets say you have a field with 100 sheep in it. 50 of the sheep are black and 50 are white. the black sheep are all on the west side of the field and the white sheep are all on the east side. if i go to the west side and start counting out 20 sheep i'm going to find only black ones. should i then question the assertion that only 50% of the sheep are black when my experience shows 100% are black? no because my sample is too limited - not necessarily by size, but by other factors that affect how closely (or not) my sample matches the entire population.

by only using autoit scripts you have done precisely the same thing. false positives are not uniformly distributed across all file types - some get more than others and by only using a single file type your sample is incredibly biased instead of being representative.

dr pan k · Mar 10, 2010

kwismer said:

yes it would. i'm sorry to say but you are repeatedly demonstrating a complete lack of understanding of statistics and sampling bias.

as such i'm going to explain it in terms that everyone can understand.

lets say you have a field with 100 sheep in it. 50 of the sheep are black and 50 are white. the black sheep are all on the west side of the field and the white sheep are all on the east side. if i go to the west side and start counting out 20 sheep i'm going to find only black ones. should i then question the assertion that only 50% of the sheep are black when my experience shows 100% are black? no because my sample is too limited - not necessarily by size, but by other factors that affect how closely (or not) my sample matches the entire population.

by only using autoit scripts you have done precisely the same thing. false positives are not uniformly distributed across all file types - some get more than others and by only using a single file type your sample is incredibly biased instead of being representative.
Click to expand...

your example has nothing to do with the hole question. the post speaks clearly of autoit scripts, and doesnt take into consideration other forms of possible malware. i am not running some av comparative and u simply dont want to get it. the av's are tested for autoscripts and therefore the final results contain a percentage of both detected and non autoit samples. i am not comparing my personal experience to a complete test. i simply realized that after a random control, cause this is what it is, the % of detection and fp's is far from being close to the % published generally speaking.

as for my stats knowledge i prefer not to make any comments.. you or anybody else can pm and i will be delighted to give u some of the files i used, and try them out yourselves.

kwismer · Mar 10, 2010

dr pan k said:

your example has nothing to do with the hole question. the post speaks clearly of autoit scripts, and doesnt take into consideration other forms of possible malware.
Click to expand...

the example was to explain why your false positives differ from those in the test. as such it has nothing to do with any form of possible malware and only pertains to file types. the false positive testing in professional tests use more than just autoit scripts. they use many other file types. their results are supposed to generalize to the population of all clean files. on average you can expect X% of false positives across the entire population, but for any particular sub-population of file types the actual false positive rate may be higher or lower than X%.

dr pan k said:

i am not running some av comparative and u simply dont want to get it.
Click to expand...

you are comparing your results (your fp results) to those in an av-comparative and you simply don't want to get that that implies your results are comparable to an av-comparative.

dr pan k said:

the av's are tested for autoscripts and therefore the final results contain a percentage of both detected and non autoit samples. i am not comparing my personal experience to a complete test. i simply realized that after a random control, cause this is what it is, the % of detection and fp's is far from being close to the % published generally speaking.
Click to expand...

and my example demonstrates why your false positive rate is so different than the ones in the professional tests. your 'random control' was biased - there are those who would question whether it was even random at all because it only contained autoit scripts. a random number generator that always produces numbers that start with 1 is a rather suspect random number generator.

johnyjohn · Mar 10, 2010

Hi,

Here are interesting articles about this subject :

http://www.viruslist.com/en/weblog?weblogid=208188011

http://www.theregister.co.uk/2010/02/10/kaspersky_malware_detection_experiment/

http://www.viruslist.com/en/weblog?weblogid=208188015

kwismer · Mar 10, 2010

johnyjohn said:

Hi,

Here are interesting articles about this subject :

http://www.viruslist.com/en/weblog?weblogid=208188011

http://www.theregister.co.uk/2010/02/10/kaspersky_malware_detection_experiment/

http://www.viruslist.com/en/weblog?weblogid=208188015
Click to expand...

actually - kaspersky gaming the system with intentional false positives is a different subject.

dr pan k · Mar 10, 2010

johnyjohn said:

Hi,

Here are interesting articles about this subject :

http://www.viruslist.com/en/weblog?weblogid=208188011
Click to expand...

this is an interesting conclusion from the above mentioned article:

If we get rid of static on-demand-tests with their mass of unvalidated samples, the copying of classifications will at least be significantly reduced, test results will correspond more closely to reality (even if that means saying good bye to 99.x% detection rates) and in the end everyone will benefit: the press, the users and of course us as well.

NoIos · Mar 10, 2010

This thread has become a joke!

biscuits · Mar 12, 2010

Dear dr pan k

Dude, the service that VT provides is not the same as the service the actual AVs give. VT uses the engine and the virus signatures but you must also consider that all AVs have particular settings w/c VT doesn't have.

Also please understand how AV comparatives came up with their percentages and what they mean. They are testing a large amount of samples of different types.

Let us put it this way. AV comparatives' samples compose of x, y, z files while the files you uploaded the past days are x1, x2, x3 files. In AV-comparatives' test, an AV product detected x file as a trojan while its actually not (an FP). Y and Z were detected as malwares (w/c were real malwares). That gives the AV product a rating of 66.66%. Now, if all the samples were x files (w/c i think is your situation) the AV will most probably get a rating of 0%.

If I am not mistaken, you are pretty much uploading apps (apps that are very similar) that you downloaded from websites that provide illegal copies of those apps so you are using VT to know if the files are clean.

dawgg · Mar 12, 2010

Seems to me we're going round in circles, we have tried to explain it, but have not got far.

Log in or Sign up

Dont trust the av comparatives results

Kees1958 Registered Member

dr pan k Registered Member

dawgg Registered Member

dr pan k Registered Member

kwismer Registered Member

kwismer Registered Member

dr pan k Registered Member

kwismer Registered Member

johnyjohn Registered Member

kwismer Registered Member

dr pan k Registered Member

NoIos Registered Member

biscuits Registered Member

dawgg Registered Member

Log in or Sign up

Dont trust the av comparatives results

Kees1958 Registered Member

dr pan k Registered Member

dawgg Registered Member

dr pan k Registered Member

kwismer Registered Member

kwismer Registered Member

dr pan k Registered Member

kwismer Registered Member

johnyjohn Registered Member

kwismer Registered Member

dr pan k Registered Member

NoIos Registered Member

biscuits Registered Member

dawgg Registered Member

Useful Searches