AV-Comparatives: Whole product dynamic test

Pleonasm · Dec 22, 2009

BlueZannetti said:

Actually, since the large scale tests employ the same products and largely the same pool of malware, the results are not independent in a purely statistical sense.
Click to expand...

However, please keep in mind that these tests were conducted by different personnel, in different countries, at different times, with different malware samples, using PCs configured differently. Thus, I maintain that they are in fact “independent” in a statistical sense.

“The condition for statistical independence is that the outcome of one event does not affect the outcome of the other” (see here). It’s difficult to imagine that the outcome of the Dennis Technology Lab test affected the outcome of the AV-Test assessment, or that the outcome of the AV-Test assessment affected the outcome of the AV-Comparatives test.

To use an analogy, three tosses of a fair coin are statistically independent, even if using the same coin and even if tossed by the same person (and, even if you get the same result -- e.g., three “heads” -- in a row).

BlueZannetti · Dec 22, 2009

Pleonasm said:

However, please keep in mind that these tests were conducted by different personnel, in different countries, at different times, with different malware samples, using PCs configured differently. Thus, I maintain that they are in fact “independent” in a statistical sense.

“The condition for statistical independence is that the outcome of one event does not affect the outcome of the other” (see here). It’s difficult to imagine that the outcome of the Dennis Technology Lab test affected the outcome of the AV-Test assessment, or that the outcome of the AV-Test assessment affected the outcome of the AV-Comparatives test.

To use an analogy, three tosses of a fair coin are statistically independent, even if using the same coin and even if tossed by the same person (and, even if you get the same result -- e.g., three “heads” -- in a row).
Click to expand...

With all due respect, you're missing a lot of the technical details.

Let's revisit the coin toss. Every time I flip a fair coin, there's an equal probability of heads or tails being seen. Viewed in a time series format, state n+1 (heads/tails being seen) is independent of state n (heads was seen, for example). That independence of sequential outcomes actually doesn't hold for the case being discussed here.

If I have a file and scan it with antimalware product X, I either get an alert or not. It's a simple two state result as with the coin. However, if I pass product X onto another individual and ask them to scan the same file, I know the outcome of that experiment with absolute certainty (assuming no updates, same settings employed, etc.). These test outcomes are not independent, but are completely correlated.

Now, the actual AV product testers do not blithely pass around the same file set (i.e. results are not completely correlated), but if they are using widely circulating malware or a testbed that comprises a large fraction of the existing active malware files, results are expected to be highly correlated since the testbed sample sets will be highly correlated.

If the test bed population overlap between distinct testers is negligible, you're quite right, the results may be independent in a statistical sense. Even here one needs to recognize that some types of sampling bias (i.e. large numbers of a reasonably homogeneous family of malware that are covered by some products and not others due to, for example, geographic localization) can have significant impact on the results.

However, if the testbeds employed display a high degree of overlap, results won't be independent. If I scan a set of files and you scan the same set of files, we will obtain identical results. If these files sets have (just to toss out a number) common membership by 90% of the set, the results probably won't be identical, but they will be highly correlated.

I'm ignoring the time dependence, and do realize that could serve to decorrelate the results. What neither of us really know is how highly correlated the various testbeds are, nor do we know how well these testbeds represent an unbiased sampling of the complete population of malware files available, nor do we know what the characteristic time is required to completely decorrelate membership in the various testbeds employed.

In other words, there are a lot of unknowns and the situation is fairly complicated. Given the signature database sizes reported (500,000 - 1,000,000 or so), and the number of files used in various comprehensive testbeds (often a similar range of magnitude), rough agreement between large scale tests should be expected since they are all directly characterizing a substantial fraction of the pertinent population.

Blue

Macstorm · Dec 22, 2009

Fuzzfas said:

Anyway, the king (Avira) is dead, as of this month. All hail the new king! (Norton).
Click to expand...

ROTFL.. and please don't forget the other "new king" Dennis "laboratories" the best AV testing organization ever created

dennis dennis dennis and counting...

Osaban · Dec 22, 2009

Fuzzfas said:

Anyway, the king (Avira) is dead, as of this month. All hail the new king! (Norton).
Click to expand...

Proactive Detection of New Samples (total 23,237):
Avira= 17,282 detections
Norton= 8,465 detections , it missed 8,817 samples

~ Removed Direct PDF Link as per AV-Comparatives Request - See Main-Tests page for the actual PDF ~

Dynamic Test
Norton 99/100
Avira 97/100, it missed 2 samples

~ Removed Direct PDF Link as per AV-Comparatives Request - See Dynamic Test page for the actual PDF ~

Yes, All hail the new lilliputian king! (Norton).

Fuzzfas · Dec 23, 2009

Macstorm said:

ROTFL.. and please don't forget the other "new king" Dennis "laboratories" the best AV testing organization ever created

dennis dennis dennis and counting...
Click to expand...

I don't count Dennis labs (i can setup a "Fuzzfas labs" if Symantec is willing to pay me), i count AV compar. and AVtest.org ,where Avira scored lower than 90%.

Proactive Detection of New Samples (total 23,237):
Avira= 17,282 detections
Norton= 8,465 detections , it missed 8,817 samples

~ Removed Direct PDF Link as per AV-Comparatives Request - See Main-Tests page for the actual PDF ~

Dynamic Test
Norton 99/100
Avira 97/100, it missed 2 samples

~ Removed Direct PDF Link as per AV-Comparatives Request - See Dynamic Test page for the actual PDF ~

Yes, All hail the new lilliputian king! (Norton).
Click to expand...

1) I don't count on demand tests anymore. They are so much yesterday and not "real world".

2) I don't count AV comps only. I count AVtest.org results too. Add the results , you come up with a clear Norton win. Dennis labs is just a "bonus" , but i leave that to Pleonasm.

3) I love teasing those who get upset about test results.

4) Merry Xmas!!! Who cares who's the king! I certainly don't! In the last 2 years, i 've been using Twister and Vipre. Do i look like i care or believe enough in tests to influence what i will use?

P.S: I don't like big corporations, including Norton. I like the dark horses.

Osaban · Dec 23, 2009

Fuzzfas said:

Merry Xmas!!! Who cares who's the king! I certainly don't! In the last 2 years, i 've been using Twister and Vipre. Do i look like i care or believe enough in tests to influence what i will use?
Click to expand...

Merry Christmas to you too! if you think that what one states in its signature reflects real life, you are a bit naive. The Internet is the land of the anonymous where the real self of people comes out undisturbed as a consequence of not having an identity.

There are people here claiming to have 7 licences of a company they love to slander at any opportunity they get. It just doesn't make any sense, as your post makes no sense at all, but then again you like teasing people.

Pleonasm · Dec 23, 2009

BlueZannetti said:

If the test bed population overlap between distinct testers is negligible, you're quite right, the results may be independent in a statistical sense.
Click to expand...

This is the key assumption, so let’s look at it more closely...

BlueZannetti said:

What neither of us really know is how highly correlated the various testbeds are, nor do we know how well these testbeds represent an unbiased sampling of the complete population of malware files available, nor do we know what the characteristic time is required to completely decorrelate membership in the various testbeds employed.
Click to expand...

Correct: neither you nor I know the specific degree of overlap among the malware samples employed by each of the tests conducted by Dennis Technology Lab, AV-Comparatives and AV-Test.

The report by Dennis Technology Lab does list the specific malware cases tested; AV-Comparatives describes the test case selection process as “{malicious} URLs were collected by using our own in-house crawler,” and AV-Test used “fresh threats.”

However, with more than 5,000 new web-based malicious threats being created each day (see here), doesn’t it seem highly unlikely that there would exist substantial overlap between the 40 cases used by Dennis Technology Lab, the 100 cases used by AV-Comparatives and the 600 cases used by AV-Test -- especially given (1) the differences among the time periods during which the tests were conducted and (2) the geographic differences among the locations of each organization?

Think about it this way. If each of 5,000 malware samples on any given day has an equal likelihood of being chosen by one of these three testing organizations, then what is the probability that any two organizations will select the same case? Answer: p = 0.000004%. What is the probability that all three organizations will select the same case? Answer: p = 0.0000000008%. Now, in fairness, some malware samples are more prevent than others at any point in time, and have a disproportionally higher level of being selected. Yet, even if you discount this analysis by a factor of 1,000 (or even by 10,000), the soundness of the argument remains.

Again, it is possible that there exists substantial overlap in the cases tested by the three organizations, but it simply does not appear to be a realistic assumption given (1) the lack of any coordinated efforts across the three organizations coupled with (2) the massive population of malware from which the test cases were chosen. I could certainly be wrong, but honestly I would be quite surprised if there was even a 1% overlap of the malware cases tested across the three organizations.

BlueZannetti said:

Given the signature database sizes reported (500,000 - 1,000,000 or so), and the number of files used in various comprehensive testbeds (often a similar range of magnitude), rough agreement between large scale tests should be expected since they are all directly characterizing a substantial fraction of the pertinent population.
Click to expand...

Correct, but remember that the tests conducted by Dennis Technology Lab, AV-Comparatives and AV-Test are not “large scale” (using sample sizes of 40, 100 and 600, respectively). Thus, they are not necessarily representative of the population of all malware, nor do the authors of those tests make that claim. What they do assert, however, is that the test simulates a user’s “real-world” experience and provides a fair comparison of the total protection performance across anti-malware products within the constraints of that simulation.

P.S.: I enjoy our discussions!

Fuzzfas · Dec 23, 2009

Osaban said:

Merry Christmas to you too! if you think that what one states in its signature reflects real life, you are a bit naive. The Internet is the land of the anonymous where the real self of people comes out undisturbed as a consequence of not having an identity.
Click to expand...

That's true. I am not that naive! I will go further and say that some people may not be just enthusiasts, but may actually have interests in "boosting" a product. But contrary to many, i have posted screenshots of my PC several times. So, i know that what i have on my signature, does reflect what i run (well, except for times that i forget to update my signature).

http://img190.imageshack.us/img190/5784/75462826.png

As for my "Twister history":

https://www.wilderssecurity.com/showpost.php?p=1252469&postcount=246

And i have more licenses, that i currently don't use (incl. Twister for which i wait the day they will give x64 support. That's the beauty of lifetime licenses).

There are people here claiming to have 7 licences of a company they love to slander at any opportunity they get. It just doesn't make any sense, as your post makes no sense at all, but then again you like teasing people.
Click to expand...

The bold part is the key. Not people in general. But when it comes to tests.

I have attacked (i don't think it was slander)myself products for which i also have license (althouth 1, not 7), with Rollback coming to my mind as the latest, but that's because i wasn't happy with them. Of course if you have 7 licenses, things may look different.

Bottom line is i come to the forum to have some fun, not to push interests. I don't mind if i sound weird sometimes. It's when i am taking a laugh! It's Xmas, i couldn't care less who wins! But i find the whole story so funny.

P.S: As i said, i don't like big companies, including Norton. They don't care as much for the customer as smaller ones do. Besides, without the small fish, the ocean would be full with only whales. That's ugly to imagine!

But, if you add AV Comps and AV.Test.org, the damn yellow box comes first! Will i buy it?! Not a chance!

Merry Xmas!

Pleonasm · Dec 23, 2009

Fuzzfas said:

Besides, without the small fish, the ocean would be full with only whales.
Click to expand...

Yes, the more variety and competition among anti-malware vendors, the more innovation will occur; and, the more innovation that happens, the more protection against malware we will all enjoy. So, I too hope that the “small fish” continue to flourish....

P.S.: Innovation can come from small, agile companies in an industry; but, it can equally well arise from small, focused teams working within a large enterprise, too.

Fuzzfas · Dec 23, 2009

Pleonasm said:

Yes, the more variety and competition among anti-malware vendors, the more innovation will occur; and, the more innovation that happens, the more protection against malware we will all enjoy. So, I too hope that the “small fish” continue to flourish....

P.S.: Innovation can come from small, agile companies in an industry; but, it can equally well arise from small, focused teams working within a large enterprise, too.
Click to expand...

I am all for competition, no question about it! And yes, innovation can come from big players too.

My issue, is , that for my needs, i don't need the "top dog" antivirus. I haven't seen a real infection for so long that for time to time i enter shadow mode and i try a malware that i know isn't too much dangerous. Just for the pleasure of seeing my AV in action.

If i want to do something dangerous or try some no dvd patch, that even in VirusTotal seems clean, but you can never be sure, i fire up Shadow Defender and see if something weird happens. When i find a properly stable Threatfire version i will probably add it.

The same applies to most Wilders' members. They don't need the "top antivirus". Some don't need an antivirus at all and several are running "naked" with no ill effects.

Myself, i prefer the one that seems run lighter and i like its layout, even if they are sub-par on detection. I even gave to another Wilders' member my F-Secure 2010 free 1 year license so that it's not wasted and i run Vipre which is less capable probably. In the case of Vipre, the price was low enough (Black Friday promo) to make me buy it. In the unlikely event that i get infected, i will restore a Paragon image. Unless i get a rootkit of course that i wont' know it's there.

For "average Joe", going with "top dogs" is most important. Because most likely, he hasn't ever heard of sandboxes, hips, etc and relies exclusively on antivirus for protection. Most people i know, don't even have a backup scanner, just for a second opinion. So it's a "life of death" situation. These are the users that Norton, Kaspersky, etc, have been targeting for years and made their reputation with. And the new Norton seems to be in great shape both in detection and in resource usage (many have claimed so, i 've no reason to doubt it).

This said, when a friend of mine asks me "If i decided to buy an antivirus, which one would you suggest?" , i say "Avira". For the simple fact that in Europe, it costs less than half the price of Norton. In USA things are different with all those "rebates" and continuous promotions. And because, if i say one of the "famous" AVs, i am not helping competition really.

I 've run a few tests of my own and i feel it's a sound advice for performance-cost. That is, if someone doesn't want to go with some of the excellent freebies out there.

Miyagi · Dec 23, 2009

Pleonasm said:

P.S.: Innovation can come from small, agile companies in an industry; but, it can equally well arise from small, focused teams working within a large enterprise, too.
Click to expand...

That's the beauty we see today. I don't want to mention specific names, but there are couple already who made an impact to the company they are working in.

Macstorm · Dec 23, 2009

Fuzzfas said:

I don't count Dennis labs (i can setup a "Fuzzfas labs" if Symantec is willing to pay me), i count AV compar. and AVtest.org ,where Avira scored lower than 90%.
Click to expand...

I don't deny the results from tests of both those organizations while I do not agree with the "methodology" of scoring and rating methods that they employ. As a matter of fact, my confidence to them changed months ago (av-comp specifically) when they decided to "change" their own rules for the "final" score of the products. I know it's not so pleasant for the other giant AV competitors (and they push a lot of $ on testers) to have always the same "king".

BlueZannetti · Dec 23, 2009

Pleonasm said:

However, with more than 5,000 new web-based malicious threats being created each day (see here), doesn’t it seem highly unlikely that there would exist substantial overlap between the 40 cases used by Dennis Technology Lab, the 100 cases used by AV-Comparatives and the 600 cases used by AV-Test -- especially given (1) the differences among the time periods during which the tests were conducted and (2) the geographic differences among the locations of each organization?
Click to expand...

There are a lot of dependencies here - whether all threats are unique vs. derivative, how well they reflect an unbiased estimator of the real population, and so on. Unfortunately, these details are difficult to know.

Think about it this way. If each of 5,000 malware samples on any given day has an equal likelihood of being chosen by one of these three testing organizations, then what is the probability that any two organizations will select the same case? Answer: p = 0.000004%. What is the probability that all three organizations will select the same case? Answer: p = 0.0000000008%. Now, in fairness, some malware samples are more prevent than others at any point in time, and have a disproportionally higher level of being selected. Yet, even if you discount this analysis by a factor of 1,000 (or even by 10,000), the soundness of the argument remains.
Click to expand...

This implies that all are circulating at the same level. Given the various family outbreaks reported over the past couple of years, this is unlikely.

Again, it is possible that there exists substantial overlap in the cases tested by the three organizations, but it simply does not appear to be a realistic assumption given (1) the lack of any coordinated efforts across the three organizations coupled with (2) the massive population of malware from which the test cases were chosen. I could certainly be wrong, but honestly I would be quite surprised if there was even a 1% overlap of the malware cases tested across the three organizations.
Click to expand...

At various times over the past year or so, I've popped over to www.shadowserver.org to take a peek at detection statistics. The step granularity displayed by distinct products (i.e. not using the same embedded engines) was oftentimes quite surprising. For example, on multiple occasions, one suggestion from the pattern of detection statistics was that a handful of malware families dominated the results. That's the type of situation in which substantial overlap with even small sets would be obtained. Does it occur in practice? I have no idea...

Correct, but remember that the tests conducted by Dennis Technology Lab, AV-Comparatives and AV-Test are not “large scale” (using sample sizes of 40, 100 and 600, respectively). Thus, they are not necessarily representative of the population of all malware, nor do the authors of those tests make that claim. What they do assert, however, is that the test simulates a user’s “real-world” experience and provides a fair comparison of the total protection performance across anti-malware products within the constraints of that simulation.
Click to expand...

There is one inexorable reality - tests that attempt to mimic real world situations will invariably need to employ a limited sample set. It's a logistics issue of running the test. There are ways to perform some level of test validation to get a handle on test performance, but that type of result QC is generally not pursued. One of the consequences of attempting to put error bars around the numbers is that the perception of differences tends to get smeared out.

P.S.: I enjoy our discussions!
Click to expand...

Likewise!

Cheers,

Blue

Log in or Sign up

AV-Comparatives: Whole product dynamic test

Pleonasm Registered Member

BlueZannetti Registered Member

Macstorm Registered Member

Osaban Registered Member

Fuzzfas Registered Member

Osaban Registered Member

Pleonasm Registered Member

Fuzzfas Registered Member

Pleonasm Registered Member

Fuzzfas Registered Member

Miyagi Registered Member

Macstorm Registered Member

BlueZannetti Registered Member

Log in or Sign up

AV-Comparatives: Whole product dynamic test

Pleonasm Registered Member

BlueZannetti Registered Member

Macstorm Registered Member

Osaban Registered Member

Fuzzfas Registered Member

Osaban Registered Member

Pleonasm Registered Member

Fuzzfas Registered Member

Pleonasm Registered Member

Fuzzfas Registered Member

Miyagi Registered Member

Macstorm Registered Member

BlueZannetti Registered Member

Useful Searches