AV-Comparatives: Whole product dynamic test

Discussion in 'other anti-virus software' started by Baz_kasp, Dec 18, 2009.

Thread Status:
Not open for further replies.
  1. Escalader

    Escalader Registered Member

    Joined:
    Dec 12, 2005
    Posts:
    3,710
    Location:
    Land of the Mooses
    Hello Thread:

    Having read the report I like the concept of a whole product test. It is as if our classic layered defences are tested as a security system which is really what we all have.

    Their own report acknowledges that the sample size is small. I agree it is too small and if it were up to me I would have NOT granted scores and levels advanced + etc on this basis alone. I suspect if we could get a statistician to assess the results they could show that the sample is so small that the variances from one product to the next are within statistical error. But to their credit AV_C is getting some help on quality control and sample building for their next report.

    The other issue I have with this report is the way the Firewall warnings were handled. They treat a warning from a firewall as a failure on the basis that the malware must have already penetrated the setup. This is wrong since a FW warning may only be that, a warning of an attempt to connect that has been blocked. This is NOT a failure. AV_C will no doubt clarify all this in 2010.

    IMHO it is very important to use our os and browsers features as part of our defence. Blue has made this point already.

    As to the products included and excluded there are 2 that seem missing here which are followed widely here Online Armour and Outpost. There must be a business reason why they are not present, both have suites with AV functions so that can not be the reason they are absent. Maybe they will appear later on.

    Happy New Year to all!:D
     
  2. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    Or, what leads the pack today could also lead the pack tomorrow. It isn’t necessarily the case that today’s leaders will be tomorrow’s laggards.

    In fact, in the absence of any information to the contrary, it seems to make the most sense to predict that those anti-malware solutions exhibiting strong performance will accelerate that lead in the future, because they will enjoy increased market share, more resources, a larger user community for in-the-cloud reputation analysis, etc. Personally, I foresee further consolidation of the marketplace of security vendors, in which the strong get stronger (and the weak disappear or are acquired). Time will tell.
     
  3. gaslad

    gaslad Registered Member

    Joined:
    Feb 18, 2007
    Posts:
    117
    Location:
    Toronto, Ontario
    I think dynamic tests such as these are long overdue, but reading the methodology one appreciates the complexity of the task, and how narrow a slice of reality it represents. Extrapolating the results to justify one's own choice of AV as superior is a tad iffy, IMO.

    Firstly, mainly the paid suites were tested. I've never been a fan of suites, which seldom present the best of all security components available, all variables considered. I would much have preferred the standalone AVs were tested, much as avast and MSE were, with Windows firewall. AV-C might as well change their name to suites-comparatives, as things stand.

    Secondly, the test-beds used IE7, because it is one of the most used browsers. It is certainly not the most secure. I would be cautious of the applicability of these results if you use IE8, or Firefox, etc. The results would likely be far different.

    Finally, I was impressed with how well most products did in these tests. And that the free products (not usually tested in the past by AV-C) particularly scored so well.
     
  4. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    However, it’s important to note the alignment in the pattern of results found by Dennis Technology Lab, AV-Comparatives and AV-Test, each of which used different sample sizes (N=40, N=100 and N=600, respectively) with similar methodologies. This outcome -- achieved by different independent testing organizations -- provides, in my opinion, a very high degree of credence in the findings when considered in total.

    Yes, every test has limitations since, by definition, a test is a controlled experiment. The advantage of control is increasing confidence in the observed differences between products, because all are treated similarly. The disadvantage, however, is that it is an abstraction of reality. The trick is to achieve the right balance.

    Actually, I support this methodological approach. Conducting the test with PCs configured in the most representative fashion seems to be the right way to simulate real-world performance.

    True, no product exhibited dismal performance. Yet, it’s key to also note the considerable differences between the performance levels of the top-tier solutions versus the remainder.
     
  5. subset

    subset Registered Member

    Joined:
    Nov 17, 2007
    Posts:
    825
    Location:
    Austria
    Don't forget the test from anti-malware-test.com.
    http://www.anti-malware-test.com/?q=node/93

    Even if the results will not fit your needs. ;)

    Cheers
     
  6. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    It is unfortunate that some of the products tested (e.g., BitDefender, Norton) were not the most recent 2010 editions, making a comparison of the results for those products to the findings from Dennis Technology Lab, AV-Comparatives and AV-Test impossible.
     
  7. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    One dichotomous way to think about this issue is as follows. For the sake of argument, let’s say that each test has an 80% chance of correctly identifying the top performing anti-malware product and a 20% chance of making an error (i.e., incorrectly identifying the top performer). Then, what is the probability that all three independent assessments would be erroneous? The answer: p = (0.20)*(0.20)*(0.20) = 0.008 = 0.8%; or, stated differently, there is a 99.2% confidence in the collective pattern of results. Thus, even when using very liberal assumptions about the size of the error, the likelihood of all three being wrong in the same way is quite small, indeed.
     
  8. subset

    subset Registered Member

    Joined:
    Nov 17, 2007
    Posts:
    825
    Location:
    Austria
    What on earth makes you think that these assessments or testers are independent o_O
    Take a look at who pays whom, the stream of cash: investor -> profiteer

    AV-C: vendors -> AV-C
    AV-Test: vendors (through advertisement) -> publishers/magazines -> AV-Test
    Dennis Technology Labs: Symantec -> Dennis Technology Labs

    This means that these testers are independent?
    Then the earth is a flat disc.
    An independent tester is no way (directly or indirectly) financially dependent on vendors of tested products.
    Especially for Dennis Technology Labs I would replace the word 'independent' with 'puppets on strings'.
    AV-C and AV-Test are just very, very industry-friendly and publisher-friendly testers for a good reason - their revenues.

    Cheers
     
  9. Vladimyr

    Vladimyr Registered Member

    Joined:
    Feb 11, 2009
    Posts:
    461
    Location:
    Australia
    Great fun reading everyone.

    @ Escalader

    Have I misinterpreted you here?
    Warning of an outbound connection from a firewall is not a malware detection. If the firewall "knew" it was malware it wouldn't need to ask the user for advice. If it's malware that has become active and is now calling home, how has it not "penetrated the setup"?

    cheers
     
  10. Macstorm

    Macstorm Registered Member

    Joined:
    Mar 7, 2005
    Posts:
    2,642
    Location:
    Sneffels volcano
    And I find it even funnier yet, some ppl that's constantly hammering one's head for bringing here this supposedly "reliable and independent" test in every occasion and trying to put it on the same level with the other well respected AV-Comparatives and AV-Test.org tests.

    Go figure :cautious:
     
  11. Fajo

    Fajo Registered Member

    Joined:
    Jun 13, 2008
    Posts:
    1,814
    The only thing I do see tho is the test resaults done by Dlabs and the test results done by AV-corp and AV-Test all seem to pretty much mirror one and other. Independant or not the test seems to come out with close to the same results. Again this kind of testing is somewhat new so the results are hard to swallow for some. It shook up alot of what people thought were the top AV's and what ones were not. It also, blured the line even more between Free and Paid.

    If this is going to be the new way of testing, and I for one hope it is. This is going to change how people see AV's, also it should push more company's to work harder in area's they they have been obviously slacking in. All around it seems like a damn good change from a consumer / tech point of view. It's no longer about Just detection boys, now its more about the product and how it functions as a whole.
     
  12. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753
    Well, give them time. They didn't think that some were the top winners without any reason. :D At least AV Comparatives made a more natural transition. Put your self in the shoes of someone who had Avira bought last month based on AV tests and then in AVtest.org found it average in the new test. You can't totally blame the user. He is just a sheep waiting to be guided.


    I find it soooo funny!

    To tell you the truth, i prefer this kind of more realistic way of testing (youtubers testers , were the pioneers in this), than the old "on demand" test of a gazillion of malware, out of which some could have been out of circulation or semi-in the zoo and certainly didn't test all product capabilities.

    Larger samples are good, but it takes too much time. For me, i 've no trouble giving awards. I 've been saying in this forum for over a year, that the previous tests were unrealistic, because didn't simulate real world situations and well, what do you know, now they say the same thing all testers and their results are different than before in many cases in overall ranking.

    Smaller samples are a problem. However, this is relevant. I mean, it depends on what to expect from a test. If you wait for a test to tell you like a gospel who's the absolute best, then it's a big problem. This big problem is more easily seen if you compare the results of AV comp. with Avtest.org. They have some serious differences.

    If however, you use the results as a consultation to draw your own conclusions, it's much better now than when it was on demand. I mean, AVtest.org now calls it "REAL WORLD" malware protection. Basically admitting that previously method wasn't real world. So , between a gazillion of samples in unrealistic conditions and fewer samples in more realistic conditions, i would pick the latter.

    Do i trust completely these tests? Of course not! It's $ involved. If i had to trust them, then i would have to believe that Avira is a 98% and an 87,7% product at the same time (depending on the test).

    I also think that with larger samples, that are "fresh", none would be remotely close to 98%. I mean, do you people still remember "zero day" results?

    But they are useful to keep an opinion on relative strenght and whether performance remains stable. (And of course they are useful for vendors).
     
    Last edited: Dec 22, 2009
  13. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753

    You 're wasting your time. If you were to come in this forum 2 months ago and put a list of AV ranking like the one of AV.test org (with Avira and F-Secure middle to low), everyone would call you moron/nuts/high on drugs. Now it has become the "natural" evolution of things, instead of making them think "Wait a minute! So all these years, i was reading unrealistic product performance and believing it?!"

    Or, do you remember some products that were withdrawing from tests complaining that were unrealistic and were scorned by users saying they were just afraid? And now AV.test comes with "real world" tests. Ooops. I guess the scorned ones were right about that. But who cares.

    That's human nature. That's why vendors pay to get tested. If you manage to get the logo, it means cash for you. Even if the test was unrealistic, who cares! People don't care about realism. They care about feeling more secure and tests are the best thing for that.
     
  14. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753
    Anyway, the king (Avira) is dead, as of this month. All hail the new king! (Norton). :D
     
  15. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753
    The rule of thumb is these cases, is "the one that scores higher your favourite AV product". :D

    "Real World" detection is new kid in the block. Magazines will soon follow i suspect. It's natural to have initially different results. The randomness and low number of samples, allow for big variation. It's a bit of a russian roulette.

    Now, this variation is bad for business of course, because it raises exactly the question you made. If i were in the business i would contact colleagues to find a way to produce a more "commonly accepted" final ranking, like there was before. It would benefit everyone. I know, i am a diabolical creature without morality. :D But you can bet i 'd work to find a way to eliminate these big differencies between various tests. They 're bad for testers, they 're bad for vendors, they 're bad for business! If i rank someone 3rd and you rank it 12th and another rank it 6th and another rank it 15th, then who's gonna believe us and who's gonna continue to pay us hoping that people believe us! Now if i rank one 3rd, another ranks him 4th, another 3rd and another 4th, we 're all happy. People don't need to know standard deviation calculation and sample influence on results. They want to see consistent results to trust and buy an AV. And i 'd give them just that. :D
     
  16. andyman35

    andyman35 Registered Member

    Joined:
    Nov 2, 2007
    Posts:
    2,336
    I tend to disagree;
    The kings (all traditional AV products) are dead.All hail the new kings! Defensewall +Sandboxie :p
     
  17. elapsed

    elapsed Registered Member

    Joined:
    Apr 5, 2004
    Posts:
    7,076
    Would rather have a 64bit king myself ;)
     
  18. andyman35

    andyman35 Registered Member

    Joined:
    Nov 2, 2007
    Posts:
    2,336
    You've just pretty much advocated the AMTSO methodology there,lets hope it becomes the standard soon.
     
  19. andyman35

    andyman35 Registered Member

    Joined:
    Nov 2, 2007
    Posts:
    2,336
    It'll come one way or another dictated by market forces.;)
     
  20. Fajo

    Fajo Registered Member

    Joined:
    Jun 13, 2008
    Posts:
    1,814
    Not true for defencewall sense the dev cant pull his head out of his (@#* and make a 64bit version.
     
  21. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    In this context, I am using the word “independent” in a statistical sense. Events are independent if the outcome of one does not affect the outcome of another. When dealing with independent events, the joint probability of two or more occurring is the product of the individual probabilities.

    However, I would also argue that the three tests (Dennis Technologies Lab, AV-Comparatives and AV-Test) are independent in the sense that the findings of each are not a result of the “influence” of any anti-malware vendor. Personally, I think it is ludicrous to believe, for example, that Kaspersky “bribed” all three organizations to ensure that it would score well.

    Correct -- the prior methods were not “real-world,” but only a very rudimentary attempt to very roughly estimate real-world performance, not even taking into consideration all of the functionality of the anti-malware products tested.

    Well said. Given the option of having a high degree of precision on a meaningless outcome (e.g., an on-demand static test with thousands of samples) or a lower degree of precision on a meaningful outcome (e.g., a dynamic real-world test with dozens to hundreds of samples), the choice is clear. It is my impression that users want to know how well a product will perform in protecting them against malware in their day-to-day environment, and only the latter class of tests are relevant to that objective.

    While the alignment among the best three dynamic real-world tests to-date (Dennis Technology Lab, AV-Comparatives and AV-Test) isn’t perfect across the entire continuum, there is a high degree of consistency among the top-performers (e.g., Symantec and Kaspersky) and fair degree of consistency for the low-performers (e.g., McAfee), with the most variability occurring in the “murky middle,” as one would expect.

    Indeed! :)
     
  22. andyman35

    andyman35 Registered Member

    Joined:
    Nov 2, 2007
    Posts:
    2,336
    Perhaps not but just like natural selection developers will have to evolve or die.I realize that there are fundamental issues for certain software to run on the 64bit platform but these will have to be overcome,especially for an already niche product with a dwindling 32bit market.
     
  23. BlueZannetti

    BlueZannetti Registered Member

    Joined:
    Oct 19, 2003
    Posts:
    6,590
    Actually, since the large scale tests employs the same products and largely the same pool of malware, the results are not independent in a purely statistical sense. They should be highly correlated if the same metric is being probed and, largely, they are. You really can't do that joint probability calculation because of this latent correlation.

    As for the results in the dynamic test, I did try to provide a somewhat objective and as technically sound as feasible assessment of the noise in the www.av-comparatives.org test protocol earlier in AV Comparatives Detection Statistics - A Crude Meta-analysis.

    Based on the results discussed in the thread just mentioned (I know - that's only partially pertinent to the current test), the expectation that intrinsic measurement noise will increase as the dataset is decreased in size, I really believe a somewhat less granular discrimination is suggested - either 2 or 3 levels:
    • For 2 levels: basically above and below 85% (i.e. Norman and Kingsoft on the lower tier with all other tested products on the upper tier)
    • For 3 levels: Norman and Kingsoft once again on the lowest tier (< 85%), followed by BitDefender, eScan, Trustport, AVG, McAfee on the second tier (86-91%), with Symantec, Kapsersky Avira, MSE, Avast, G-Data, F-Secure, Eset (> 95%) rounding out the top group.
    Although it makes for much less exciting reading, my own impression falls towards the 2 level result. You can get a sense of where other commercial products tend to fall in these two categories by examining rank-ordered results from (just as an example) rankings at Shadowserver.org - but do embrace a very healthy level of skepticism as well when looking at the results available there and look at it more as a transient simple rank ordering exercise at best. Even at this level, there are clear sample granularity issues that emerge from time to time if you look closely at the numbers, i.e. restricted families seemingly dominating the results. That's not all bad since reality seems to play out that way as well, but it does render the outcome much more volatile.

    At least that's my read on it...

    Blue
     
  24. Escalader

    Escalader Registered Member

    Joined:
    Dec 12, 2005
    Posts:
    3,710
    Location:
    Land of the Mooses
    Thanks for your post Blue!:D

    I'm noting that the tester themselves are working with a University to improve sampling and quality of their test methodology and sample size.

    IF they were happy with current sample sizes why would they do that? It will cost $ to do the work.

    I'm happy to wait to 2010 for their results with larger samples. We can all second guess the meaning of those when they emerge. It's a techi blood sport!

    Enjoy!
     
  25. BlueZannetti

    BlueZannetti Registered Member

    Joined:
    Oct 19, 2003
    Posts:
    6,590
    The validity of these tests lie or die on the sampling. One is really looking at a fairly complex set of tradeoffs.

    First, as the test bed size increases, there are purely logistical problems with assessing the validity of the samples (is it really malware, is it functional, is it a functional but not itself malicious module, and so on). This is a lot easier to do with a small sample set. However, as the sample set size decreases, the likelihood of inducing bias in the testbed increases. For example, what if one family branch that's a minor factor globally dominates a small sample set? The results would be dominated by malware that are not reflective of the extant threats.

    This is why some time/effort/money might go to QC and methodology assessments. These are also a couple of reasons that the youtube test videos that one can view aren't really tests in the normal sense of the word. They can be of utility as a functionality and behavior profile of specific products. That's not a quantitative test, but more a look and feel usability evaluation - which I personally believe is of value as well.

    Blue
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.