AV-Comparatives: Whole product dynamic test

Escalader · Dec 20, 2009

Hello Thread:

Having read the report I like the concept of a whole product test. It is as if our classic layered defences are tested as a security system which is really what we all have.

Their own report acknowledges that the sample size is small. I agree it is too small and if it were up to me I would have NOT granted scores and levels advanced + etc on this basis alone. I suspect if we could get a statistician to assess the results they could show that the sample is so small that the variances from one product to the next are within statistical error. But to their credit AV_C is getting some help on quality control and sample building for their next report.

The other issue I have with this report is the way the Firewall warnings were handled. They treat a warning from a firewall as a failure on the basis that the malware must have already penetrated the setup. This is wrong since a FW warning may only be that, a warning of an attempt to connect that has been blocked. This is NOT a failure. AV_C will no doubt clarify all this in 2010.

IMHO it is very important to use our os and browsers features as part of our defence. Blue has made this point already.

As to the products included and excluded there are 2 that seem missing here which are followed widely here Online Armour and Outpost. There must be a business reason why they are not present, both have suites with AV functions so that can not be the reason they are absent. Maybe they will appear later on.

Happy New Year to all!

Pleonasm · Dec 20, 2009

Page42 said:

What leads the pack today can very easily be out of favor before you know it.
Click to expand...

Or, what leads the pack today could also lead the pack tomorrow. It isn’t necessarily the case that today’s leaders will be tomorrow’s laggards.

In fact, in the absence of any information to the contrary, it seems to make the most sense to predict that those anti-malware solutions exhibiting strong performance will accelerate that lead in the future, because they will enjoy increased market share, more resources, a larger user community for in-the-cloud reputation analysis, etc. Personally, I foresee further consolidation of the marketplace of security vendors, in which the strong get stronger (and the weak disappear or are acquired). Time will tell.

gaslad · Dec 20, 2009

I think dynamic tests such as these are long overdue, but reading the methodology one appreciates the complexity of the task, and how narrow a slice of reality it represents. Extrapolating the results to justify one's own choice of AV as superior is a tad iffy, IMO.

Firstly, mainly the paid suites were tested. I've never been a fan of suites, which seldom present the best of all security components available, all variables considered. I would much have preferred the standalone AVs were tested, much as avast and MSE were, with Windows firewall. AV-C might as well change their name to suites-comparatives, as things stand.

Secondly, the test-beds used IE7, because it is one of the most used browsers. It is certainly not the most secure. I would be cautious of the applicability of these results if you use IE8, or Firefox, etc. The results would likely be far different.

Finally, I was impressed with how well most products did in these tests. And that the free products (not usually tested in the past by AV-C) particularly scored so well.

Pleonasm · Dec 21, 2009

Escalader said:

Their own report acknowledges that the sample size is small. I agree it is too small and if it were up to me I would have NOT granted scores and levels advanced + etc on this basis alone.
Click to expand...

However, it’s important to note the alignment in the pattern of results found by Dennis Technology Lab, AV-Comparatives and AV-Test, each of which used different sample sizes (N=40, N=100 and N=600, respectively) with similar methodologies. This outcome -- achieved by different independent testing organizations -- provides, in my opinion, a very high degree of credence in the findings when considered in total.

gaslad said:

I think dynamic tests such as these are long overdue, but reading the methodology one appreciates the complexity of the task, and how narrow a slice of reality it represents.
Click to expand...

Yes, every test has limitations since, by definition, a test is a controlled experiment. The advantage of control is increasing confidence in the observed differences between products, because all are treated similarly. The disadvantage, however, is that it is an abstraction of reality. The trick is to achieve the right balance.

gaslad said:

the test-beds used IE7, because it is one of the most used browsers.
Click to expand...

Actually, I support this methodological approach. Conducting the test with PCs configured in the most representative fashion seems to be the right way to simulate real-world performance.

gaslad said:

I was impressed with how well most products did in these tests.
Click to expand...

True, no product exhibited dismal performance. Yet, it’s key to also note the considerable differences between the performance levels of the top-tier solutions versus the remainder.

subset · Dec 21, 2009

Pleonasm said:

However, it’s important to note the alignment in the pattern of results found by Dennis Technology Lab, AV-Comparatives and AV-Test, each of which used different sample sizes (N=40, N=100 and N=600, respectively) with similar methodologies. This outcome -- achieved by different independent testing organizations -- provides, in my opinion, a very high degree of credence in the findings when considered in total.
Click to expand...

Don't forget the test from anti-malware-test.com.
http://www.anti-malware-test.com/?q=node/93

Even if the results will not fit your needs.

Cheers

Pleonasm · Dec 21, 2009

subset said:

Don't forget the test from anti-malware-test.com.
http://www.anti-malware-test.com/?q=node/93
Click to expand...

It is unfortunate that some of the products tested (e.g., BitDefender, Norton) were not the most recent 2010 editions, making a comparison of the results for those products to the findings from Dennis Technology Lab, AV-Comparatives and AV-Test impossible.

Pleonasm · Dec 21, 2009

Escalader said:

I suspect if we could get a statistician to assess the results they could show that the sample is so small that the variances from one product to the next are within statistical error.
Click to expand...

One dichotomous way to think about this issue is as follows. For the sake of argument, let’s say that each test has an 80% chance of correctly identifying the top performing anti-malware product and a 20% chance of making an error (i.e., incorrectly identifying the top performer). Then, what is the probability that all three independent assessments would be erroneous? The answer: p = (0.20)*(0.20)*(0.20) = 0.008 = 0.8%; or, stated differently, there is a 99.2% confidence in the collective pattern of results. Thus, even when using very liberal assumptions about the size of the error, the likelihood of all three being wrong in the same way is quite small, indeed.

subset · Dec 21, 2009

Pleonasm said:

Then, what is the probability that all three independent assessments would be erroneous?
Click to expand...

What on earth makes you think that these assessments or testers are independent
Take a look at who pays whom, the stream of cash: investor -> profiteer

AV-C: vendors -> AV-C
AV-Test: vendors (through advertisement) -> publishers/magazines -> AV-Test
Dennis Technology Labs: Symantec -> Dennis Technology Labs

This means that these testers are independent?
Then the earth is a flat disc.
An independent tester is no way (directly or indirectly) financially dependent on vendors of tested products.
Especially for Dennis Technology Labs I would replace the word 'independent' with 'puppets on strings'.
AV-C and AV-Test are just very, very industry-friendly and publisher-friendly testers for a good reason - their revenues.

Cheers

Vladimyr · Dec 21, 2009

Great fun reading everyone.

@ Escalader

Escalader said:

Hello Thread:
The other issue I have with this report is the way the Firewall warnings were handled. They treat a warning from a firewall as a failure on the basis that the malware must have already penetrated the setup. This is wrong since a FW warning may only be that, a warning of an attempt to connect that has been blocked. This is NOT a failure. AV_C will no doubt clarify all this in 2010.

Happy New Year to all!
Click to expand...

Have I misinterpreted you here?
Warning of an outbound connection from a firewall is not a malware detection. If the firewall "knew" it was malware it wouldn't need to ask the user for advice. If it's malware that has become active and is now calling home, how has it not "penetrated the setup"?

cheers

Macstorm · Dec 22, 2009

subset said:

Especially for Dennis Technology Labs I would replace the word 'independent' with 'puppets on strings'.
Click to expand...

And I find it even funnier yet, some ppl that's constantly hammering one's head for bringing here this supposedly "reliable and independent" test in every occasion and trying to put it on the same level with the other well respected AV-Comparatives and AV-Test.org tests.

Go figure

Fajo · Dec 22, 2009

Macstorm said:

And I find it even funnier yet, some ppl that's constantly hammering one's head for bringing here this supposedly "reliable and independent" test in every occasion and trying to put it on the same level with the other well respected AV-Comparatives and AV-Test.org tests.

Go figure
Click to expand...

The only thing I do see tho is the test resaults done by Dlabs and the test results done by AV-corp and AV-Test all seem to pretty much mirror one and other. Independant or not the test seems to come out with close to the same results. Again this kind of testing is somewhat new so the results are hard to swallow for some. It shook up alot of what people thought were the top AV's and what ones were not. It also, blured the line even more between Free and Paid.

If this is going to be the new way of testing, and I for one hope it is. This is going to change how people see AV's, also it should push more company's to work harder in area's they they have been obviously slacking in. All around it seems like a damn good change from a consumer / tech point of view. It's no longer about Just detection boys, now its more about the product and how it functions as a whole.

Fuzzfas · Dec 22, 2009

Fajo said:

Again this kind of testing is somewhat new so the results are hard to swallow for some. It shook up alot of what people thought were the top AV's and what ones were not.
Click to expand...

Well, give them time. They didn't think that some were the top winners without any reason. At least AV Comparatives made a more natural transition. Put your self in the shoes of someone who had Avira bought last month based on AV tests and then in AVtest.org found it average in the new test. You can't totally blame the user. He is just a sheep waiting to be guided.

I find it soooo funny!

Escalader said:

Their own report acknowledges that the sample size is small. I agree it is too small and if it were up to me I would have NOT granted scores and levels advanced + etc on this basis alone.
Click to expand...

To tell you the truth, i prefer this kind of more realistic way of testing (youtubers testers , were the pioneers in this), than the old "on demand" test of a gazillion of malware, out of which some could have been out of circulation or semi-in the zoo and certainly didn't test all product capabilities.

Larger samples are good, but it takes too much time. For me, i 've no trouble giving awards. I 've been saying in this forum for over a year, that the previous tests were unrealistic, because didn't simulate real world situations and well, what do you know, now they say the same thing all testers and their results are different than before in many cases in overall ranking.

Smaller samples are a problem. However, this is relevant. I mean, it depends on what to expect from a test. If you wait for a test to tell you like a gospel who's the absolute best, then it's a big problem. This big problem is more easily seen if you compare the results of AV comp. with Avtest.org. They have some serious differences.

If however, you use the results as a consultation to draw your own conclusions, it's much better now than when it was on demand. I mean, AVtest.org now calls it "REAL WORLD" malware protection. Basically admitting that previously method wasn't real world. So , between a gazillion of samples in unrealistic conditions and fewer samples in more realistic conditions, i would pick the latter.

Do i trust completely these tests? Of course not! It's $ involved. If i had to trust them, then i would have to believe that Avira is a 98% and an 87,7% product at the same time (depending on the test).

I also think that with larger samples, that are "fresh", none would be remotely close to 98%. I mean, do you people still remember "zero day" results?

But they are useful to keep an opinion on relative strenght and whether performance remains stable. (And of course they are useful for vendors).

Fuzzfas · Dec 22, 2009

subset said:

What on earth makes you think that these assessments or testers are independent
Take a look at who pays whom, the stream of cash: investor -> profiteer

AV-C: vendors -> AV-C
AV-Test: vendors (through advertisement) -> publishers/magazines -> AV-Test
Dennis Technology Labs: Symantec -> Dennis Technology Labs

This means that these testers are independent?
Then the earth is a flat disc.
An independent tester is no way (directly or indirectly) financially dependent on vendors of tested products.
Especially for Dennis Technology Labs I would replace the word 'independent' with 'puppets on strings'.
AV-C and AV-Test are just very, very industry-friendly and publisher-friendly testers for a good reason - their revenues.

Cheers
Click to expand...

You 're wasting your time. If you were to come in this forum 2 months ago and put a list of AV ranking like the one of AV.test org (with Avira and F-Secure middle to low), everyone would call you moron/nuts/high on drugs. Now it has become the "natural" evolution of things, instead of making them think "Wait a minute! So all these years, i was reading unrealistic product performance and believing it?!"

Or, do you remember some products that were withdrawing from tests complaining that were unrealistic and were scorned by users saying they were just afraid? And now AV.test comes with "real world" tests. Ooops. I guess the scorned ones were right about that. But who cares.

That's human nature. That's why vendors pay to get tested. If you manage to get the logo, it means cash for you. Even if the test was unrealistic, who cares! People don't care about realism. They care about feeling more secure and tests are the best thing for that.

Fuzzfas · Dec 22, 2009

Anyway, the king (Avira) is dead, as of this month. All hail the new king! (Norton).

Fuzzfas · Dec 22, 2009

bellgamin said:

It gets increasingly convoluted to navigate AV-C's website. Even so, good information -- but "rankings" are markedly different from the recent tests by AV-test.org. Which ranking is more valid, I wonder?
Click to expand...

The rule of thumb is these cases, is "the one that scores higher your favourite AV product".

"Real World" detection is new kid in the block. Magazines will soon follow i suspect. It's natural to have initially different results. The randomness and low number of samples, allow for big variation. It's a bit of a russian roulette.

Now, this variation is bad for business of course, because it raises exactly the question you made. If i were in the business i would contact colleagues to find a way to produce a more "commonly accepted" final ranking, like there was before. It would benefit everyone. I know, i am a diabolical creature without morality. But you can bet i 'd work to find a way to eliminate these big differencies between various tests. They 're bad for testers, they 're bad for vendors, they 're bad for business! If i rank someone 3rd and you rank it 12th and another rank it 6th and another rank it 15th, then who's gonna believe us and who's gonna continue to pay us hoping that people believe us! Now if i rank one 3rd, another ranks him 4th, another 3rd and another 4th, we 're all happy. People don't need to know standard deviation calculation and sample influence on results. They want to see consistent results to trust and buy an AV. And i 'd give them just that.

andyman35 · Dec 22, 2009

Fuzzfas said:

Anyway, the king (Avira) is dead, as of this month. All hail the new king! (Norton).
Click to expand...

I tend to disagree;
The kings (all traditional AV products) are dead.All hail the new kings! Defensewall +Sandboxie

elapsed · Dec 22, 2009

Would rather have a 64bit king myself

andyman35 · Dec 22, 2009

Fuzzfas said:

The rule of thumb is these cases, is "the one that scores higher your favourite AV product".

"Real World" detection is new kid in the block. Magazines will soon follow i suspect. It's natural to have initially different results. The randomness and low number of samples, allow for big variation. It's a bit of a russian roulette.

Now, this variation is bad for business of course, because it raises exactly the question you made. If i were in the business i would contact colleagues to find a way to produce a more "commonly accepted" final ranking, like there was before. It would benefit everyone. I know, i am a diabolical creature without morality. But you can bet i 'd work to find a way to eliminate these big differencies between various tests. They 're bad for testers, they 're bad for vendors, they 're bad for business! If i rank someone 3rd and you rank it 12th and another rank it 6th and another rank it 15th, then who's gonna believe us and who's gonna continue to pay us hoping that people believe us! Now if i rank one 3rd, another ranks him 4th, another 3rd and another 4th, we 're all happy. People don't need to know standard deviation calculation and sample influence on results. They want to see consistent results to trust and buy an AV. And i 'd give them just that.
Click to expand...

You've just pretty much advocated the AMTSO methodology there,lets hope it becomes the standard soon.

andyman35 · Dec 22, 2009

elapsed said:

Would rather have a 64bit king myself
Click to expand...

It'll come one way or another dictated by market forces.

Fajo · Dec 22, 2009

andyman35 said:

It'll come one way or another dictated by market forces.
Click to expand...

Not true for defencewall sense the dev cant pull his head out of his (@#* and make a 64bit version.

Pleonasm · Dec 22, 2009

subset said:

What on earth makes you think that these assessments or testers are independent
Click to expand...

In this context, I am using the word “independent” in a statistical sense. Events are independent if the outcome of one does not affect the outcome of another. When dealing with independent events, the joint probability of two or more occurring is the product of the individual probabilities.

However, I would also argue that the three tests (Dennis Technologies Lab, AV-Comparatives and AV-Test) are independent in the sense that the findings of each are not a result of the “influence” of any anti-malware vendor. Personally, I think it is ludicrous to believe, for example, that Kaspersky “bribed” all three organizations to ensure that it would score well.

Fuzzfas said:

If however, you use the results as a consultation to draw your own conclusions, it's much better now than when it was on demand. I mean, AVtest.org now calls it "REAL WORLD" malware protection. Basically admitting that previously method wasn't real world.
Click to expand...

Correct -- the prior methods were not “real-world,” but only a very rudimentary attempt to very roughly estimate real-world performance, not even taking into consideration all of the functionality of the anti-malware products tested.

Fuzzfas said:

So, between a gazillion of samples in unrealistic conditions and fewer samples in more realistic conditions, I would pick the latter.
Click to expand...

Well said. Given the option of having a high degree of precision on a meaningless outcome (e.g., an on-demand static test with thousands of samples) or a lower degree of precision on a meaningful outcome (e.g., a dynamic real-world test with dozens to hundreds of samples), the choice is clear. It is my impression that users want to know how well a product will perform in protecting them against malware in their day-to-day environment, and only the latter class of tests are relevant to that objective.

Fuzzfas said:

But you can bet I 'd work to find a way to eliminate these big differences between various tests.... They want to see consistent results to trust and buy an AV.
Click to expand...

While the alignment among the best three dynamic real-world tests to-date (Dennis Technology Lab, AV-Comparatives and AV-Test) isn’t perfect across the entire continuum, there is a high degree of consistency among the top-performers (e.g., Symantec and Kaspersky) and fair degree of consistency for the low-performers (e.g., McAfee), with the most variability occurring in the “murky middle,” as one would expect.

Vladimyr said:

Great fun reading everyone.
Click to expand...

Indeed!

andyman35 · Dec 22, 2009

Fajo said:

Not true for defencewall sense the dev cant pull his head out of his (@#* and make a 64bit version.
Click to expand...

Perhaps not but just like natural selection developers will have to evolve or die.I realize that there are fundamental issues for certain software to run on the 64bit platform but these will have to be overcome,especially for an already niche product with a dwindling 32bit market.

BlueZannetti · Dec 22, 2009

Pleonasm said:

In this context, I am using the word “independent” in a statistical sense. Events are independent if the outcome of one does not affect the outcome of another. When dealing with independent events, the joint probability of two or more occurring is the product of the individual probabilities.
Click to expand...

Actually, since the large scale tests employs the same products and largely the same pool of malware, the results are not independent in a purely statistical sense. They should be highly correlated if the same metric is being probed and, largely, they are. You really can't do that joint probability calculation because of this latent correlation.

As for the results in the dynamic test, I did try to provide a somewhat objective and as technically sound as feasible assessment of the noise in the www.av-comparatives.org test protocol earlier in AV Comparatives Detection Statistics - A Crude Meta-analysis.

Based on the results discussed in the thread just mentioned (I know - that's only partially pertinent to the current test), the expectation that intrinsic measurement noise will increase as the dataset is decreased in size, I really believe a somewhat less granular discrimination is suggested - either 2 or 3 levels:

For 2 levels: basically above and below 85% (i.e. Norman and Kingsoft on the lower tier with all other tested products on the upper tier)

For 3 levels: Norman and Kingsoft once again on the lowest tier (< 85%), followed by BitDefender, eScan, Trustport, AVG, McAfee on the second tier (86-91%), with Symantec, Kapsersky Avira, MSE, Avast, G-Data, F-Secure, Eset (> 95%) rounding out the top group.

Although it makes for much less exciting reading, my own impression falls towards the 2 level result. You can get a sense of where other commercial products tend to fall in these two categories by examining rank-ordered results from (just as an example) rankings at Shadowserver.org - but do embrace a very healthy level of skepticism as well when looking at the results available there and look at it more as a transient simple rank ordering exercise at best. Even at this level, there are clear sample granularity issues that emerge from time to time if you look closely at the numbers, i.e. restricted families seemingly dominating the results. That's not all bad since reality seems to play out that way as well, but it does render the outcome much more volatile.

At least that's my read on it...

Blue

Escalader · Dec 22, 2009

Thanks for your post Blue!

I'm noting that the tester themselves are working with a University to improve sampling and quality of their test methodology and sample size.

IF they were happy with current sample sizes why would they do that? It will cost $ to do the work.

I'm happy to wait to 2010 for their results with larger samples. We can all second guess the meaning of those when they emerge. It's a techi blood sport!

Enjoy!

BlueZannetti · Dec 22, 2009

Escalader said:

I'm noting that the tester themselves are working with a University to improve sampling and quality of their test methodology and sample size.

IF they were happy with current sample sizes why would they do that? It will cost $ to do the work.
Click to expand...

The validity of these tests lie or die on the sampling. One is really looking at a fairly complex set of tradeoffs.

First, as the test bed size increases, there are purely logistical problems with assessing the validity of the samples (is it really malware, is it functional, is it a functional but not itself malicious module, and so on). This is a lot easier to do with a small sample set. However, as the sample set size decreases, the likelihood of inducing bias in the testbed increases. For example, what if one family branch that's a minor factor globally dominates a small sample set? The results would be dominated by malware that are not reflective of the extant threats.

This is why some time/effort/money might go to QC and methodology assessments. These are also a couple of reasons that the youtube test videos that one can view aren't really tests in the normal sense of the word. They can be of utility as a functionality and behavior profile of specific products. That's not a quantitative test, but more a look and feel usability evaluation - which I personally believe is of value as well.

Blue

Log in or Sign up

AV-Comparatives: Whole product dynamic test

Escalader Registered Member

Pleonasm Registered Member

gaslad Registered Member

Pleonasm Registered Member

subset Registered Member

Pleonasm Registered Member

Pleonasm Registered Member

subset Registered Member

Vladimyr Registered Member

Macstorm Registered Member

Fajo Registered Member

Fuzzfas Registered Member

Fuzzfas Registered Member

Fuzzfas Registered Member

Fuzzfas Registered Member

andyman35 Registered Member

elapsed Registered Member

andyman35 Registered Member

andyman35 Registered Member

Fajo Registered Member

Pleonasm Registered Member

andyman35 Registered Member

BlueZannetti Registered Member

Escalader Registered Member

BlueZannetti Registered Member

Log in or Sign up

AV-Comparatives: Whole product dynamic test

Escalader Registered Member

Pleonasm Registered Member

gaslad Registered Member

Pleonasm Registered Member

subset Registered Member

Pleonasm Registered Member

Pleonasm Registered Member

subset Registered Member

Vladimyr Registered Member

Macstorm Registered Member

Fajo Registered Member

Fuzzfas Registered Member

Fuzzfas Registered Member

Fuzzfas Registered Member

Fuzzfas Registered Member

andyman35 Registered Member

elapsed Registered Member

andyman35 Registered Member

andyman35 Registered Member

Fajo Registered Member

Pleonasm Registered Member

andyman35 Registered Member

BlueZannetti Registered Member

Escalader Registered Member

BlueZannetti Registered Member

Useful Searches