Evaluating large AV-tests!

Discussion in 'other anti-virus software' started by Firefighter, Mar 29, 2003.

Thread Status:
Not open for further replies.
  1. Firefighter

    Firefighter Registered Member

    Joined:
    Oct 28, 2002
    Posts:
    1,670
    Location:
    Finland
    Hi everybody! Here are many discussions about that which AV-test is the real one. Before we are going to search wich are the real live viruses it is better to make a Histogram analysis of the outcome data. The large AV-tests that have more than some 20 antivirus programs are possible to evaluate at first by Histogram -analysis.

    Histogram -analysis clarifyes that if the test was made under statistical control. There are two important measurements, kurtosis and skewness in that analysis.

    The histogram pattern that displays a spread of data where the peak is lower or higher than normal bell shaped curve is kurtosis and it is a measurement of the flatness or peakness of a distribution. If the kurtosis is near 3, then the data is considered to come from a "normal" distribution.

    The histogram pattern that displays occurences "piled up" away from the center is referred to as "skewed". If the data is centered right the measurement is negative and if the data is centered left the measurement is positive. The bigger value what more the data is centered from the normal distribution's centerpoint.

    There are many statistical programs in the market to calculate those "strange" issues where the results, histogram bars and statistical calculations, are the final outcome of those programs.

    When we are making antivirus programs the main goal is to have the detection rate 100%, and it is in this evaluating case (= Histogram analysis) the only goal.

    When we have now with av-programs only one goal the skewness will be a little bit less than 0 in the ideal case and the kurtosis should be so near 3 as possible.

    The curve of the Histogram is skewed to that direction where the goal is. All calculated points have to be between -3 and +3 sigma. Otherwise when the calculations are far away from that mentioned, there is something in that test what disturbs the final outcome and the test is unacceptable.

    First of all here are the VirusBulletin WinXP 2002 combined On-Demand test results calculated manually.

    Antivirus Zoo test in VirusBulletin 6-2002; WinXP:

    The Zoo test is a summary of three categories, Macro - 4 056 objects, Polymorphic - 15 011 objects and finally Standard - 1 585 objects. The sum of each category was calculated manually from the list on this site:

    http://www.virusbtn.com/old/comparatives/WinXP/2002/test_sets.html


    Detected Objects missed
    (% from 20 652 objects)

    Eset NOD32 100.0000 0
    GDATA AntiVirusKit 99.9952 1
    Kaspersky KAV 99.9952 1
    CA eTrust Antivirus 99.9855 3
    F-Secure Anti-Virus 99.9806 4
    McAfee VirusScan* 99.9564 9
    NAI VirusScan    99.9564 9
    Symantec NAV 99.9322 14
    DrWeb 4.28 99.8305 35
    GeGAD RAV 99.7385 54
    CA Vet Anti-Virus 99.6320 76
    Sophos Anti-Virus 99.5448 94
    Command AntiVirus 99.4916 105
    Frisk F-Prot 99.4722 109
    VirusBuster 99.2737 150
    Alwil Avast32 99.2640 152
    SOFTWIN BitDefender 99.0170 203
    Trend PC-cillin 98.6926 270

    Grisoft AVG 97.9227 429
    Norman Virus Control 96.8381 653
    Panda Antivirus 94.6010 1 115
    Leprechaun VirusBuster 91.1437 1 829

    HAURI ViRobot 43.3324 11 703
    CAT Quickheal 35.2460 13 373

    *) McAfee results were corrected from the VB August number and there from the On-Demand test.

    Here are then the results of 3 other antivirus tests, AV-test.org in the Zoo test 11-2001, VirusP 11-2002 and finally "Saso Badovinac" av-test 22 from www.grc.com.

    http://www.av-test.org/sites/tests.php3?lang=en

    http://www.virus.gr/english/fullxml/default.asp?id=31&mnu=31

    https://grc.com/x/news.exe?cmd=article&group=grc.security.software&item=84294&utag=

    Finally here are the histograms with statistical calculations about the 4 different av-tests. At first I picked out Hauri and Quickheal from the VB Histogram -analysis, because it was too obvious that they were too far from the common distribution.

    Histogram 5.-10. November 2002: VirusP AV-test

    Total number of objects 47 204

    General Statistics: (Ungrouped sample data)
    Pts Plotted = 33 Offscale Pts = 0
    Mean = 75.67303 Std Dev (Sample) =18.67772
    Kurtosis = 2.14768 Skewness = -0.64119
    3 Sigma Limits: 19.63986 TO 131.70621

    Process Capability Indices: (based on +/- 3 sigma)
    Process Capability = 112.06634
    USL = 100.
    CPU = 0.43415
    Z (USL) = 1.30246
    9.64% will be over the USL value of 100.
    Based on standard normal distribution (derived from sample values).

    Histogram Mar-22-2003: "Saso Badovinac" AV-test

    Total number of objects over 100 000

    General Statistics: (Ungrouped sample data)
    Pts Plotted = 20 Offscale Pts = 0
    Mean = 76.77129 Std Dev (Sample) =17.41526
    Kurtosis = 2.67307 Skewness = -0.70072
    3 Sigma Limits: 24.52551 TO 129.01706

    Process Capability Indices: (based on +/- 3 sigma)
    Process Capability = 104.49155
    USL = 100.
    CPU = 0.4446
    Z (USL) = 1.33381
    9.11% will be over the USL value of 100.
    Based on standard normal distribution (derived from sample values).

    Histogram Nov-1-2001: AV-test.org AV-test

    Total number of objects 33 617

    General Statistics: (Ungrouped sample data)
    Pts Plotted = 20 Offscale Pts = 0
    Mean = 96.30886 Std Dev (Sample) =4.053
    Kurtosis = 3.51577 Skewness = -1.1743
    3 Sigma Limits: 84.14987 TO 108.46785

    Process Capability Indices: (based on +/- 3 sigma)
    Process Capability = 24.31798
    USL = 100.
    CPU = 0.30357
    Z (USL) = 0.91072
    18.12% will be over the USL value of 100.
    Based on standard normal distribution (derived from sample values).

    Histogram Jun-1-2002: VirusBulletin WinXP 2002, 22 best AV:s

    Total number of objects 20 652

    General Statistics: (Ungrouped sample data)
    Pts Plotted = 22 Offscale Pts = 2
    Mean = 98.83018 Std Dev (Sample) =2.14387
    Kurtosis = 9.07663 Skewness = -2.58717
    3 Sigma Limits: 92.39856 TO 105.2618

    Process Capability Indices: (based on +/- 3 sigma)
    Process Capability = 12.86323
    USL = 100.
    CPU = 0.18189
    Z (USL) = 0.54566
    29.27% will be over the USL value of 100.
    Based on standard normal distribution (derived from sample values).

    We can see from the av-tables that those 3 av-tests are very similar and acceptable but the fourth test, VirusBulletin WinXP 2002 On-Demand test is skewed too much against the 100 % line and there are not many antiviruses on the left side of the curve. When we are estimating the kurtosis and skewness values, the result is the same and VirusBulletin's values are too far from the ideal value!

    Finally I made the biggest test from VirusBulletin data that passed the Histogram -analysis, and there were 18 best av-Programs within. You can look the results here.

    Histogram Jun-1-2002: VirusBulletin WinXP 2002, best 18 AV:s

    Total number of objects 20 652

    General Statistics: (Ungrouped sample data)
    Pts Plotted = 18 Offscale Pts = 0
    Mean = 99.65324 Std Dev (Sample) =0.38853
    Kurtosis = 3.14462 Skewness = -1.02386
    3 Sigma Limits: 98.48764 TO 100.81885

    Process Capability Indices: (based on +/- 3 sigma)
    Process Capability = 2.33121
    USL = 100.
    CPU = 0.29749
    Z (USL) = 0.89247
    18.61% will be over the USL value of 100.
    Based on standard normal distribution (derived from sample values).

    I think that VirusBulletin does not have a real in the Zoo test within, because there are too many AV:s which are capable to find all or almost all objects in their test. Personally I am the last to doom those 3 AV-tests because they are under statistical control and there are not the same top five in those tests which belongs to a free competition game. The second thing is that why there in those 3 tests are so many AV:s that are capable to find over 95 % of those objects!

    I am curious to see what are the reasons why only VB WinXP 2002 test is so far away from the other tests.

    It seems to me that here are people who can't stand the thuth!

    PS. Can You tell me shortly (with pictures if possible) how I can add those attachment GIF pictures to this comment, Please?

    "The truth is out there, but it hurts!"

    Best Regards,
    Firefighter!
     

    Attached Files:

  2. Tinribs

    Tinribs Registered Member

    Joined:
    Mar 14, 2002
    Posts:
    734
    Location:
    England
    I'll read your post again when I'm sober and more with it Firefighter, after all that its ME that feels skewed and too much Kurtosis than is good for a normal person....... is it just me or was that very heavy going?

    I do enjoy your posts Firefighter but sometimes I do worry that you're a bit too wrapped with antivirus programmes. They are merely there to protect you from nasties not to run your entire online life ;)
     
  3. Madsen DK

    Madsen DK Registered Member

    Joined:
    Nov 23, 2002
    Posts:
    324
    Location:
    Denmark
  4. Madsen DK

    Madsen DK Registered Member

    Joined:
    Nov 23, 2002
    Posts:
    324
    Location:
    Denmark
    Sorry FF
    I didnt mean to laugh at you, and i enjoy your posts too, but i agree with Tinribs.
    Regards
    Ole
     
  5. Smokey

    Smokey Registered Member

    Joined:
    Apr 1, 2002
    Posts:
    1,513
    Location:
    Annie's Pub
    Hi Firefighter!

    Maybe you need some holidays? :eek:

    Take a break, it will upgrade your entire life........
     
  6. Technodrome

    Technodrome Security Expert

    Joined:
    Feb 13, 2002
    Posts:
    2,140
    Location:
    New York
  7. solarpowered candle

    solarpowered candle Registered Member

    Joined:
    Jan 9, 2003
    Posts:
    1,181
    Location:
    new zealand
    MMMmm. and I thought it was the acid wearing off. So its not me thats bent , but the VB tests.Is that what you have said firefighter
     
  8. Madsen DK

    Madsen DK Registered Member

    Joined:
    Nov 23, 2002
    Posts:
    324
    Location:
    Denmark
    A little support for FF.
    I have to admit that its easy to get carried away regarding securitymatteres, and if its your hobby, thats okay .
    A few years ago i couldnt care less about the name of my AV.
    Came pre-installed on my pc, and sometimes i even forgot to update it.
    Today, i certainly DO care, and i almost read everything about every little virus & worm that occurs.
    Perhaps its a sort of evolution. :D
    But the buttomline is.
    Its not important what you do, but that you do something that you like.
    Regards
    Ole
     
  9. Tinribs

    Tinribs Registered Member

    Joined:
    Mar 14, 2002
    Posts:
    734
    Location:
    England
    Agreed, its my hobby too, but I think its possible to get a bit too involved.

    I would like to read a condensed version of the post though, I have no doubt its informative but I feel my heads about to explode after the second praragraph.

    Maybe FF can summarise when he returns.

    :)
     
  10. Madsen DK

    Madsen DK Registered Member

    Joined:
    Nov 23, 2002
    Posts:
    324
    Location:
    Denmark
    Agreed, its my hobby too, but I think its possible to get a bit too involved.
    Tinribs said.

    Totally agree with you.
    Regards Ole
     
  11. This post is a pill!!! :D :D :D :D :D :D

    If I may ask, Firefighter.. Just three questions...

    1. What is your point? ( I am not trying to be sarcastic..)
    2. What Anti Virus program, after your analysis, is the winner of your award?
    3. Have you taken a vacation lately? (Just Kidding...) :D :D :D :D

    I can just imagine it right now.. I meet a girl online, and we finally meet face to face at a donut shoppe after a year and a half of writing "addled" instant messages to each other...

    "What is your hobby?" she asks me...

    "Oh, Evaluating antivirus programs", I smartly say. Then I whip out a printed copy of this whole post and say with a wink, "For the past 4 months, at EVERY spare moment I have, I have been trying to figure out the meaning of this "histogram related antivirul summary"...

    Just kidding, Firefighter.. I even "applauded" you right now.. You're okay man... Seriously, though, could you just summarize and tell us what your final opinion is, in maybe one paragraph?

    One thing I did pick up on, is that Rodzilla was right when he said that 2% missed virues could mean about 150 or so, to paraphrase him...

    Thanks,

    Shooter...
     
  12. Madsen DK

    Madsen DK Registered Member

    Joined:
    Nov 23, 2002
    Posts:
    324
    Location:
    Denmark
    Oh, Evaluating antivirus programs", I smartly say. Then I whip out a printed copy of this whole post and say with a wink, "For the past 4 months, at EVERY spare moment I have, I have been trying to figure out the meaning of this "histogram related antivirul summary"...
    :D :D :D :D :D :D :D

    Straight Shooter!!!!
    I really like your humour :D :cool:
    Regards
    Ole
     
  13. Firefighter

    Firefighter Registered Member

    Joined:
    Oct 28, 2002
    Posts:
    1,670
    Location:
    Finland
    Reply to everyone from Firefighter!

    What I said earlier might have confused most of the Forum readers. When we are talking about Statistical Process Control (SPC), it is not so very simple issue.

    If you are a nuclear physicist, there are not many who are asking what is nuclear physics, because if the answer should be so easy, there were no nuclear physicist either. It is the same with Statistical Process (or Quality) Control, it's the own branch of science. If you really want some quick overview of this stuff, a good start may be that you are going to read for example the pocket quide from site:

    http://www.qualitycoach.net/1879364441.htm

    To further studies I recommend Juran's Quality Handbook (5th Edition).

    http://www.knovel.com/knovel2/Toc.jsp?BookID=623

    When you have read and understood the whole some 2 000 pages, you are becoming a pro in this branch of science.

    The main point is still quite simple. Almost every free process in the world produces normal distribution curve outcome, when the process runs between two specification limits. The outcome will be then a symmetric bell shaped curve where the Skewness is about zero and Kurtosis about 3.

    When we have only one specification limit to run with, the outcome of measurements will be skewed bell shaped curve, but the Kurtosis value will be near 3 and the Skewness usually something between -1 or +1, but a little bit away from zero, depending on the specification's position against the distribution's mean point.

    There are such kind of measurement limits like ± 3 sigma (the strange letter in those pictures that I can't find from this site), there are no measurement values outside of those limits in one sample and between the limits is the "normal distribution" curve, which was the lila bell shaped curve in those pictures I showed earlier. If there still are some measurement values outside those two limits, there is a phrase, "the process is not in control", there is a special , (known or main), reason why those values exists and it is capable to find by scrutinizing the case more closely. But if there are no such values, the situation or case is normal and the differences are belonging within natural variation like we people are not as tall each other, but there is any (= one, main) known reason to it.

    Those ± 3 sigma limits can be calculated very easy with those programs I said earlier by just saving the measurements straight to the program.

    The histogram bar is in this case the number of av:s inside the same certain detecting tolerance % (at regular intervals or like 60-65, 65-70, 70-75 and so on). There are certain rules how many bars are acceptable in the analysis and it is depending on the sample (number of av:s in this case) size. The shape of bars is a pattern of that study and it must be like the normal distribution curve.

    What I am saying now is not exactly the whole truth, but it is a good estimate to start. When our measurement curve is more skewed and the skewness value is somewhat between -1 and +1, as it was in those 3 tests, you have to had at least 50 - 100 measurements (= av-programs in this case), when the first value outside ± 3 sigma limits is acceptable.

    When we are looking at the VirusBulletin WinXP 2002 histogram pattern, there were 6 values outside the - 3 sigma limit and it is then totally unacceptable study by statiscal rules.

    Shortly said the histogram pattern (the shape of bars) is the shortest way to say if the study is acceptable by statistical rules, which is a scientific fact and not an opinion or feeling of things. Who proves it fake may sure have the next "Nobel price".

    Somebody were asking which was the best av in my mind? At first I have to say that in the Zoo criteria is only one of many other criterias like in the Wild detection, memory consumption, false positives, capability to read packed or archived files, ease to use, the resources of your PC or what ever. But if we forget those other issues, I have to say as my writing says, I need more large av-test results (lets say some 20), because the results are outcome of independent test occasions where natural variation rules, the winner can't always be the same program. After that I am probably sure which is the best one. You can just now follow the top 5 first.

    PS. If you can't open the "Saso Badovinac" link, here are the results of that page.

    Subject:
    Antivirus Zoo test:

    https://grc.com/x/news.exe?cmd=article&group=grc.security.software&item=84294&utag=

    Date:
    Sat, 22 Mar 2003 13:45:36 +0100
    From:
    "saso badovinac" <sbadov@volja.net>

    Here are the results of the latest Zoo test. Some things to note:

    -this Zoo test is based on more then 100 000 infected files so in my opinion
    it is a good one :)

    -this is a Zoo test and cannot be compared to an itw test, for an itw test I recommend the

    http://www.virusbtn.com/vb100/

    -although it include several trojans and worms it cannot be used as comparison for trojans detection

    What for is this test good? It shows very good how much of "all" viruses an antivirus does detect. So if additionally to this list you take a look at

    http://www.virusbtn.com/vb100/

    (which is a test for the latest most active viruses) then you get very good results of how good your antivirus is (again to note, trojans are in a special category and if someone know of a good trojan comparison test i would love to hear it).

    Results (the number is the % of the files (viruses) detected:

    mcafee 99,26414
    kaspersky 98,44877
    f-prot 96,66934
    trend 90,94085
    symantec 88,49767
    drweb 88,39807
    sophos 88,13495
    alwil 85,32385
    rav 84,64077
    eset 83,64997
    inoculateit 81,69216
    panda 80,39365
    h+bedv 75,24566
    avg 65,57999
    avxc 62,58306
    virus buster 61,59301
    vet 61,06453
    ikarus 54,85513
    mks 53,27045
    hauri 35,17965


    "The truth is out there, but it hurts"

    Best Regards,
    Firefighter
     
  14. Technodrome

    Technodrome Security Expert

    Joined:
    Feb 13, 2002
    Posts:
    2,140
    Location:
    New York
    Here is a good trojan Test. It may help you with your research thing! ;)


    http://members.lycos.co.uk/scheinsicherheit/scanner.htm



    Technodrome
     
  15. Firefighter

    Firefighter Registered Member

    Joined:
    Oct 28, 2002
    Posts:
    1,670
    Location:
    Finland
    To Technodrome from Firefighter!

    Thanx for the link and props too! :D ;)


    Best Regards,
    Firefighter!
     
  16. Technodrome

    Technodrome Security Expert

    Joined:
    Feb 13, 2002
    Posts:
    2,140
    Location:
    New York
  17. Firefighter

    Firefighter Registered Member

    Joined:
    Oct 28, 2002
    Posts:
    1,670
    Location:
    Finland
    To everybody again from Firefighter!

    Hi everybody, I have to clarify my statement about the best av:s concerning in the Zoo test.

    You can keep an eye the top 5 scanning engines now and wait until the long testing tournament is over. We have seen only 3 "Grand Prix Tours" but we have still some 17 left, so we have to be patient and let's hope there will be an exciting and equally matched competition. :D


    "The truth is out there, but it hurts!"

    Best Regards,
    Firefighter!
     
Loading...
Thread Status:
Not open for further replies.