AV-Comparatives: Whole product dynamic test

Discussion in 'other anti-virus software' started by Baz_kasp, Dec 18, 2009.

Thread Status:
Not open for further replies.
  1. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    However, please keep in mind that these tests were conducted by different personnel, in different countries, at different times, with different malware samples, using PCs configured differently. Thus, I maintain that they are in fact “independent” in a statistical sense.

    “The condition for statistical independence is that the outcome of one event does not affect the outcome of the other” (see here). It’s difficult to imagine that the outcome of the Dennis Technology Lab test affected the outcome of the AV-Test assessment, or that the outcome of the AV-Test assessment affected the outcome of the AV-Comparatives test.

    To use an analogy, three tosses of a fair coin are statistically independent, even if using the same coin and even if tossed by the same person (and, even if you get the same result -- e.g., three “heads” -- in a row).
     
  2. BlueZannetti

    BlueZannetti Registered Member

    Joined:
    Oct 19, 2003
    Posts:
    6,590
    With all due respect, you're missing a lot of the technical details.

    Let's revisit the coin toss. Every time I flip a fair coin, there's an equal probability of heads or tails being seen. Viewed in a time series format, state n+1 (heads/tails being seen) is independent of state n (heads was seen, for example). That independence of sequential outcomes actually doesn't hold for the case being discussed here.

    If I have a file and scan it with antimalware product X, I either get an alert or not. It's a simple two state result as with the coin. However, if I pass product X onto another individual and ask them to scan the same file, I know the outcome of that experiment with absolute certainty (assuming no updates, same settings employed, etc.). These test outcomes are not independent, but are completely correlated.

    Now, the actual AV product testers do not blithely pass around the same file set (i.e. results are not completely correlated), but if they are using widely circulating malware or a testbed that comprises a large fraction of the existing active malware files, results are expected to be highly correlated since the testbed sample sets will be highly correlated.

    If the test bed population overlap between distinct testers is negligible, you're quite right, the results may be independent in a statistical sense. Even here one needs to recognize that some types of sampling bias (i.e. large numbers of a reasonably homogeneous family of malware that are covered by some products and not others due to, for example, geographic localization) can have significant impact on the results.

    However, if the testbeds employed display a high degree of overlap, results won't be independent. If I scan a set of files and you scan the same set of files, we will obtain identical results. If these files sets have (just to toss out a number) common membership by 90% of the set, the results probably won't be identical, but they will be highly correlated.

    I'm ignoring the time dependence, and do realize that could serve to decorrelate the results. What neither of us really know is how highly correlated the various testbeds are, nor do we know how well these testbeds represent an unbiased sampling of the complete population of malware files available, nor do we know what the characteristic time is required to completely decorrelate membership in the various testbeds employed.

    In other words, there are a lot of unknowns and the situation is fairly complicated. Given the signature database sizes reported (500,000 - 1,000,000 or so), and the number of files used in various comprehensive testbeds (often a similar range of magnitude), rough agreement between large scale tests should be expected since they are all directly characterizing a substantial fraction of the pertinent population.

    Blue
     
  3. Macstorm

    Macstorm Registered Member

    Joined:
    Mar 7, 2005
    Posts:
    2,642
    Location:
    Sneffels volcano
    ROTFL.. and please don't forget the other "new king" Dennis "laboratories" the best AV testing organization ever created :rolleyes: :rolleyes:

    dennis dennis dennis and counting...
     
  4. Osaban

    Osaban Registered Member

    Joined:
    Apr 11, 2005
    Posts:
    5,618
    Location:
    Milan and Seoul
    Proactive Detection of New Samples (total 23,237):
    Avira= 17,282 detections
    Norton= 8,465 detections , it missed 8,817 samples

    ~ Removed Direct PDF Link as per AV-Comparatives Request - See Main-Tests page for the actual PDF ~

    Dynamic Test
    Norton 99/100
    Avira 97/100, it missed 2 samples

    ~ Removed Direct PDF Link as per AV-Comparatives Request - See Dynamic Test page for the actual PDF ~

    Yes, All hail the new lilliputian king! (Norton).
     
    Last edited by a moderator: Dec 22, 2009
  5. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753
    I don't count Dennis labs (i can setup a "Fuzzfas labs" if Symantec is willing to pay me), i count AV compar. and AVtest.org ,where Avira scored lower than 90%.



    1) I don't count on demand tests anymore. They are so much yesterday and not "real world".

    2) I don't count AV comps only. I count AVtest.org results too. Add the results , you come up with a clear Norton win. Dennis labs is just a "bonus" , but i leave that to Pleonasm.

    3) I love teasing those who get upset about test results. :D

    4) Merry Xmas!!! Who cares who's the king! I certainly don't! In the last 2 years, i 've been using Twister and Vipre. Do i look like i care or believe enough in tests to influence what i will use? :argh:


    P.S: I don't like big corporations, including Norton. I like the dark horses.
     
  6. Osaban

    Osaban Registered Member

    Joined:
    Apr 11, 2005
    Posts:
    5,618
    Location:
    Milan and Seoul
    Merry Christmas to you too! if you think that what one states in its signature reflects real life, you are a bit naive. The Internet is the land of the anonymous where the real self of people comes out undisturbed as a consequence of not having an identity.

    There are people here claiming to have 7 licences of a company they love to slander at any opportunity they get. It just doesn't make any sense, as your post makes no sense at all, but then again you like teasing people.
     
    Last edited: Dec 23, 2009
  7. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    This is the key assumption, so let’s look at it more closely...

    Correct: neither you nor I know the specific degree of overlap among the malware samples employed by each of the tests conducted by Dennis Technology Lab, AV-Comparatives and AV-Test.

    The report by Dennis Technology Lab does list the specific malware cases tested; AV-Comparatives describes the test case selection process as “{malicious} URLs were collected by using our own in-house crawler,” and AV-Test used “fresh threats.”

    However, with more than 5,000 new web-based malicious threats being created each day (see here), doesn’t it seem highly unlikely that there would exist substantial overlap between the 40 cases used by Dennis Technology Lab, the 100 cases used by AV-Comparatives and the 600 cases used by AV-Test -- especially given (1) the differences among the time periods during which the tests were conducted and (2) the geographic differences among the locations of each organization?

    Think about it this way. If each of 5,000 malware samples on any given day has an equal likelihood of being chosen by one of these three testing organizations, then what is the probability that any two organizations will select the same case? Answer: p = 0.000004%. What is the probability that all three organizations will select the same case? Answer: p = 0.0000000008%. Now, in fairness, some malware samples are more prevent than others at any point in time, and have a disproportionally higher level of being selected. Yet, even if you discount this analysis by a factor of 1,000 (or even by 10,000), the soundness of the argument remains.

    Again, it is possible that there exists substantial overlap in the cases tested by the three organizations, but it simply does not appear to be a realistic assumption given (1) the lack of any coordinated efforts across the three organizations coupled with (2) the massive population of malware from which the test cases were chosen. I could certainly be wrong, but honestly I would be quite surprised if there was even a 1% overlap of the malware cases tested across the three organizations.

    Correct, but remember that the tests conducted by Dennis Technology Lab, AV-Comparatives and AV-Test are not “large scale” (using sample sizes of 40, 100 and 600, respectively). Thus, they are not necessarily representative of the population of all malware, nor do the authors of those tests make that claim. What they do assert, however, is that the test simulates a user’s “real-world” experience and provides a fair comparison of the total protection performance across anti-malware products within the constraints of that simulation.

    P.S.: I enjoy our discussions! :)
     
  8. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753
    That's true. I am not that naive! I will go further and say that some people may not be just enthusiasts, but may actually have interests in "boosting" a product. But contrary to many, i have posted screenshots of my PC several times. So, i know that what i have on my signature, does reflect what i run (well, except for times that i forget to update my signature). :D

    http://img190.imageshack.us/img190/5784/75462826.png

    As for my "Twister history":

    https://www.wilderssecurity.com/showpost.php?p=1252469&postcount=246

    And i have more licenses, that i currently don't use (incl. Twister for which i wait the day they will give x64 support. That's the beauty of lifetime licenses).


    The bold part is the key. :D Not people in general. But when it comes to tests.

    I have attacked (i don't think it was slander)myself products for which i also have license (althouth 1, not 7), with Rollback coming to my mind as the latest, but that's because i wasn't happy with them. Of course if you have 7 licenses, things may look different.

    Bottom line is i come to the forum to have some fun, not to push interests. I don't mind if i sound weird sometimes. It's when i am taking a laugh! It's Xmas, i couldn't care less who wins! But i find the whole story so funny. :argh:

    P.S: As i said, i don't like big companies, including Norton. They don't care as much for the customer as smaller ones do. Besides, without the small fish, the ocean would be full with only whales. That's ugly to imagine!

    But, if you add AV Comps and AV.Test.org, the damn yellow box comes first! Will i buy it?! Not a chance!

    Merry Xmas!
     
  9. Pleonasm

    Pleonasm Registered Member

    Joined:
    Apr 9, 2007
    Posts:
    1,201
    Yes, the more variety and competition among anti-malware vendors, the more innovation will occur; and, the more innovation that happens, the more protection against malware we will all enjoy. So, I too hope that the “small fish” continue to flourish....

    P.S.: Innovation can come from small, agile companies in an industry; but, it can equally well arise from small, focused teams working within a large enterprise, too.
     
  10. Fuzzfas

    Fuzzfas Registered Member

    Joined:
    Jun 24, 2007
    Posts:
    2,753
    I am all for competition, no question about it! And yes, innovation can come from big players too.

    My issue, is , that for my needs, i don't need the "top dog" antivirus. I haven't seen a real infection for so long that for time to time i enter shadow mode and i try a malware that i know isn't too much dangerous. Just for the pleasure of seeing my AV in action.

    If i want to do something dangerous or try some no dvd patch, that even in VirusTotal seems clean, but you can never be sure, i fire up Shadow Defender and see if something weird happens. When i find a properly stable Threatfire version i will probably add it.

    The same applies to most Wilders' members. They don't need the "top antivirus". Some don't need an antivirus at all and several are running "naked" with no ill effects.

    Myself, i prefer the one that seems run lighter and i like its layout, even if they are sub-par on detection. I even gave to another Wilders' member my F-Secure 2010 free 1 year license so that it's not wasted and i run Vipre which is less capable probably. In the case of Vipre, the price was low enough (Black Friday promo) to make me buy it. In the unlikely event that i get infected, i will restore a Paragon image. Unless i get a rootkit of course that i wont' know it's there. :D

    For "average Joe", going with "top dogs" is most important. Because most likely, he hasn't ever heard of sandboxes, hips, etc and relies exclusively on antivirus for protection. Most people i know, don't even have a backup scanner, just for a second opinion. So it's a "life of death" situation. These are the users that Norton, Kaspersky, etc, have been targeting for years and made their reputation with. And the new Norton seems to be in great shape both in detection and in resource usage (many have claimed so, i 've no reason to doubt it).

    This said, when a friend of mine asks me "If i decided to buy an antivirus, which one would you suggest?" , i say "Avira". For the simple fact that in Europe, it costs less than half the price of Norton. In USA things are different with all those "rebates" and continuous promotions. And because, if i say one of the "famous" AVs, i am not helping competition really.

    I 've run a few tests of my own and i feel it's a sound advice for performance-cost. That is, if someone doesn't want to go with some of the excellent freebies out there.
     
    Last edited: Dec 23, 2009
  11. Miyagi

    Miyagi Registered Member

    Joined:
    Mar 12, 2005
    Posts:
    426
    Location:
    None
    That's the beauty we see today. I don't want to mention specific names, but there are couple already who made an impact to the company they are working in. :)
     
  12. Macstorm

    Macstorm Registered Member

    Joined:
    Mar 7, 2005
    Posts:
    2,642
    Location:
    Sneffels volcano
    I don't deny the results from tests of both those organizations while I do not agree with the "methodology" of scoring and rating methods that they employ. As a matter of fact, my confidence to them changed months ago (av-comp specifically) when they decided to "change" their own rules for the "final" score of the products. I know it's not so pleasant for the other giant AV competitors (and they push a lot of $ on testers) to have always the same "king".
     
  13. BlueZannetti

    BlueZannetti Registered Member

    Joined:
    Oct 19, 2003
    Posts:
    6,590
    There are a lot of dependencies here - whether all threats are unique vs. derivative, how well they reflect an unbiased estimator of the real population, and so on. Unfortunately, these details are difficult to know.

    This implies that all are circulating at the same level. Given the various family outbreaks reported over the past couple of years, this is unlikely.
    At various times over the past year or so, I've popped over to www.shadowserver.org to take a peek at detection statistics. The step granularity displayed by distinct products (i.e. not using the same embedded engines) was oftentimes quite surprising. For example, on multiple occasions, one suggestion from the pattern of detection statistics was that a handful of malware families dominated the results. That's the type of situation in which substantial overlap with even small sets would be obtained. Does it occur in practice? I have no idea...
    There is one inexorable reality - tests that attempt to mimic real world situations will invariably need to employ a limited sample set. It's a logistics issue of running the test. There are ways to perform some level of test validation to get a handle on test performance, but that type of result QC is generally not pursued. One of the consequences of attempting to put error bars around the numbers is that the perception of differences tends to get smeared out.
    Likewise!

    Cheers,

    Blue
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.