# Evaluating large AV-tests!

Discussion in 'other anti-virus software' started by Firefighter, Mar 29, 2003.

Hi everybody! Here are many discussions about that which AV-test is the real one. Before we are going to search wich are the real live viruses it is better to make a Histogram analysis of the outcome data. The large AV-tests that have more than some 20 antivirus programs are possible to evaluate at first by Histogram -analysis.

Histogram -analysis clarifyes that if the test was made under statistical control. There are two important measurements, kurtosis and skewness in that analysis.

The histogram pattern that displays a spread of data where the peak is lower or higher than normal bell shaped curve is kurtosis and it is a measurement of the flatness or peakness of a distribution. If the kurtosis is near 3, then the data is considered to come from a "normal" distribution.

The histogram pattern that displays occurences "piled up" away from the center is referred to as "skewed". If the data is centered right the measurement is negative and if the data is centered left the measurement is positive. The bigger value what more the data is centered from the normal distribution's centerpoint.

There are many statistical programs in the market to calculate those "strange" issues where the results, histogram bars and statistical calculations, are the final outcome of those programs.

When we are making antivirus programs the main goal is to have the detection rate 100%, and it is in this evaluating case (= Histogram analysis) the only goal.

When we have now with av-programs only one goal the skewness will be a little bit less than 0 in the ideal case and the kurtosis should be so near 3 as possible.

The curve of the Histogram is skewed to that direction where the goal is. All calculated points have to be between -3 and +3 sigma. Otherwise when the calculations are far away from that mentioned, there is something in that test what disturbs the final outcome and the test is unacceptable.

First of all here are the VirusBulletin WinXP 2002 combined On-Demand test results calculated manually.

Antivirus Zoo test in VirusBulletin 6-2002; WinXP:

The Zoo test is a summary of three categories, Macro - 4 056 objects, Polymorphic - 15 011 objects and finally Standard - 1 585 objects. The sum of each category was calculated manually from the list on this site:

http://www.virusbtn.com/old/comparatives/WinXP/2002/test_sets.html

Detected Objects missed
(% from 20 652 objects)

Eset NOD32 100.0000 0
GDATA AntiVirusKit 99.9952 1
Kaspersky KAV 99.9952 1
CA eTrust Antivirus 99.9855 3
F-Secure Anti-Virus 99.9806 4
McAfee VirusScan* 99.9564 9
NAI VirusScan    99.9564 9
Symantec NAV 99.9322 14
DrWeb 4.28 99.8305 35
CA Vet Anti-Virus 99.6320 76
Sophos Anti-Virus 99.5448 94
Command AntiVirus 99.4916 105
Frisk F-Prot 99.4722 109
VirusBuster 99.2737 150
Alwil Avast32 99.2640 152
SOFTWIN BitDefender 99.0170 203
Trend PC-cillin 98.6926 270

Grisoft AVG 97.9227 429
Norman Virus Control 96.8381 653
Panda Antivirus 94.6010 1 115
Leprechaun VirusBuster 91.1437 1 829

HAURI ViRobot 43.3324 11 703
CAT Quickheal 35.2460 13 373

*) McAfee results were corrected from the VB August number and there from the On-Demand test.

Here are then the results of 3 other antivirus tests, AV-test.org in the Zoo test 11-2001, VirusP 11-2002 and finally "Saso Badovinac" av-test 22 from www.grc.com.

http://www.av-test.org/sites/tests.php3?lang=en

http://www.virus.gr/english/fullxml/default.asp?id=31&mnu=31

https://grc.com/x/news.exe?cmd=article&group=grc.security.software&item=84294&utag=

Finally here are the histograms with statistical calculations about the 4 different av-tests. At first I picked out Hauri and Quickheal from the VB Histogram -analysis, because it was too obvious that they were too far from the common distribution.

Histogram 5.-10. November 2002: VirusP AV-test

Total number of objects 47 204

General Statistics: (Ungrouped sample data)
Pts Plotted = 33 Offscale Pts = 0
Mean = 75.67303 Std Dev (Sample) =18.67772
Kurtosis = 2.14768 Skewness = -0.64119
3 Sigma Limits: 19.63986 TO 131.70621

Process Capability Indices: (based on +/- 3 sigma)
Process Capability = 112.06634
USL = 100.
CPU = 0.43415
Z (USL) = 1.30246
9.64% will be over the USL value of 100.
Based on standard normal distribution (derived from sample values).

Total number of objects over 100 000

General Statistics: (Ungrouped sample data)
Pts Plotted = 20 Offscale Pts = 0
Mean = 76.77129 Std Dev (Sample) =17.41526
Kurtosis = 2.67307 Skewness = -0.70072
3 Sigma Limits: 24.52551 TO 129.01706

Process Capability Indices: (based on +/- 3 sigma)
Process Capability = 104.49155
USL = 100.
CPU = 0.4446
Z (USL) = 1.33381
9.11% will be over the USL value of 100.
Based on standard normal distribution (derived from sample values).

Histogram Nov-1-2001: AV-test.org AV-test

Total number of objects 33 617

General Statistics: (Ungrouped sample data)
Pts Plotted = 20 Offscale Pts = 0
Mean = 96.30886 Std Dev (Sample) =4.053
Kurtosis = 3.51577 Skewness = -1.1743
3 Sigma Limits: 84.14987 TO 108.46785

Process Capability Indices: (based on +/- 3 sigma)
Process Capability = 24.31798
USL = 100.
CPU = 0.30357
Z (USL) = 0.91072
18.12% will be over the USL value of 100.
Based on standard normal distribution (derived from sample values).

Histogram Jun-1-2002: VirusBulletin WinXP 2002, 22 best AV:s

Total number of objects 20 652

General Statistics: (Ungrouped sample data)
Pts Plotted = 22 Offscale Pts = 2
Mean = 98.83018 Std Dev (Sample) =2.14387
Kurtosis = 9.07663 Skewness = -2.58717
3 Sigma Limits: 92.39856 TO 105.2618

Process Capability Indices: (based on +/- 3 sigma)
Process Capability = 12.86323
USL = 100.
CPU = 0.18189
Z (USL) = 0.54566
29.27% will be over the USL value of 100.
Based on standard normal distribution (derived from sample values).

We can see from the av-tables that those 3 av-tests are very similar and acceptable but the fourth test, VirusBulletin WinXP 2002 On-Demand test is skewed too much against the 100 % line and there are not many antiviruses on the left side of the curve. When we are estimating the kurtosis and skewness values, the result is the same and VirusBulletin's values are too far from the ideal value!

Finally I made the biggest test from VirusBulletin data that passed the Histogram -analysis, and there were 18 best av-Programs within. You can look the results here.

Histogram Jun-1-2002: VirusBulletin WinXP 2002, best 18 AV:s

Total number of objects 20 652

General Statistics: (Ungrouped sample data)
Pts Plotted = 18 Offscale Pts = 0
Mean = 99.65324 Std Dev (Sample) =0.38853
Kurtosis = 3.14462 Skewness = -1.02386
3 Sigma Limits: 98.48764 TO 100.81885

Process Capability Indices: (based on +/- 3 sigma)
Process Capability = 2.33121
USL = 100.
CPU = 0.29749
Z (USL) = 0.89247
18.61% will be over the USL value of 100.
Based on standard normal distribution (derived from sample values).

I think that VirusBulletin does not have a real in the Zoo test within, because there are too many AV:s which are capable to find all or almost all objects in their test. Personally I am the last to doom those 3 AV-tests because they are under statistical control and there are not the same top five in those tests which belongs to a free competition game. The second thing is that why there in those 3 tests are so many AV:s that are capable to find over 95 % of those objects!

I am curious to see what are the reasons why only VB WinXP 2002 test is so far away from the other tests.

It seems to me that here are people who can't stand the thuth!

PS. Can You tell me shortly (with pictures if possible) how I can add those attachment GIF pictures to this comment, Please?

"The truth is out there, but it hurts!"

Best Regards,
Firefighter!

I'll read your post again when I'm sober and more with it Firefighter, after all that its ME that feels skewed and too much Kurtosis than is good for a normal person....... is it just me or was that very heavy going?

I do enjoy your posts Firefighter but sometimes I do worry that you're a bit too wrapped with antivirus programmes. They are merely there to protect you from nasties not to run your entire online life

Sorry FF
I didnt mean to laugh at you, and i enjoy your posts too, but i agree with Tinribs.
Regards
Ole

Hi Firefighter!

Maybe you need some holidays?

MMMmm. and I thought it was the acid wearing off. So its not me thats bent , but the VB tests.Is that what you have said firefighter

A little support for FF.
I have to admit that its easy to get carried away regarding securitymatteres, and if its your hobby, thats okay .
A few years ago i couldnt care less about the name of my AV.
Came pre-installed on my pc, and sometimes i even forgot to update it.
Today, i certainly DO care, and i almost read everything about every little virus & worm that occurs.
Perhaps its a sort of evolution.
But the buttomline is.
Its not important what you do, but that you do something that you like.
Regards
Ole

Agreed, its my hobby too, but I think its possible to get a bit too involved.

I would like to read a condensed version of the post though, I have no doubt its informative but I feel my heads about to explode after the second praragraph.

Maybe FF can summarise when he returns.

Agreed, its my hobby too, but I think its possible to get a bit too involved.
Tinribs said.

Totally agree with you.
Regards Ole

This post is a pill!!!

If I may ask, Firefighter.. Just three questions...

1. What is your point? ( I am not trying to be sarcastic..)
2. What Anti Virus program, after your analysis, is the winner of your award?
3. Have you taken a vacation lately? (Just Kidding...)

I can just imagine it right now.. I meet a girl online, and we finally meet face to face at a donut shoppe after a year and a half of writing "addled" instant messages to each other...

"Oh, Evaluating antivirus programs", I smartly say. Then I whip out a printed copy of this whole post and say with a wink, "For the past 4 months, at EVERY spare moment I have, I have been trying to figure out the meaning of this "histogram related antivirul summary"...

Just kidding, Firefighter.. I even "applauded" you right now.. You're okay man... Seriously, though, could you just summarize and tell us what your final opinion is, in maybe one paragraph?

One thing I did pick up on, is that Rodzilla was right when he said that 2% missed virues could mean about 150 or so, to paraphrase him...

Thanks,

Shooter...

Oh, Evaluating antivirus programs", I smartly say. Then I whip out a printed copy of this whole post and say with a wink, "For the past 4 months, at EVERY spare moment I have, I have been trying to figure out the meaning of this "histogram related antivirul summary"...

Straight Shooter!!!!
Regards
Ole

What I said earlier might have confused most of the Forum readers. When we are talking about Statistical Process Control (SPC), it is not so very simple issue.

If you are a nuclear physicist, there are not many who are asking what is nuclear physics, because if the answer should be so easy, there were no nuclear physicist either. It is the same with Statistical Process (or Quality) Control, it's the own branch of science. If you really want some quick overview of this stuff, a good start may be that you are going to read for example the pocket quide from site:

http://www.qualitycoach.net/1879364441.htm

To further studies I recommend Juran's Quality Handbook (5th Edition).

http://www.knovel.com/knovel2/Toc.jsp?BookID=623

When you have read and understood the whole some 2 000 pages, you are becoming a pro in this branch of science.

The main point is still quite simple. Almost every free process in the world produces normal distribution curve outcome, when the process runs between two specification limits. The outcome will be then a symmetric bell shaped curve where the Skewness is about zero and Kurtosis about 3.

When we have only one specification limit to run with, the outcome of measurements will be skewed bell shaped curve, but the Kurtosis value will be near 3 and the Skewness usually something between -1 or +1, but a little bit away from zero, depending on the specification's position against the distribution's mean point.

There are such kind of measurement limits like ± 3 sigma (the strange letter in those pictures that I can't find from this site), there are no measurement values outside of those limits in one sample and between the limits is the "normal distribution" curve, which was the lila bell shaped curve in those pictures I showed earlier. If there still are some measurement values outside those two limits, there is a phrase, "the process is not in control", there is a special , (known or main), reason why those values exists and it is capable to find by scrutinizing the case more closely. But if there are no such values, the situation or case is normal and the differences are belonging within natural variation like we people are not as tall each other, but there is any (= one, main) known reason to it.

Those ± 3 sigma limits can be calculated very easy with those programs I said earlier by just saving the measurements straight to the program.

The histogram bar is in this case the number of av:s inside the same certain detecting tolerance % (at regular intervals or like 60-65, 65-70, 70-75 and so on). There are certain rules how many bars are acceptable in the analysis and it is depending on the sample (number of av:s in this case) size. The shape of bars is a pattern of that study and it must be like the normal distribution curve.

What I am saying now is not exactly the whole truth, but it is a good estimate to start. When our measurement curve is more skewed and the skewness value is somewhat between -1 and +1, as it was in those 3 tests, you have to had at least 50 - 100 measurements (= av-programs in this case), when the first value outside ± 3 sigma limits is acceptable.

When we are looking at the VirusBulletin WinXP 2002 histogram pattern, there were 6 values outside the - 3 sigma limit and it is then totally unacceptable study by statiscal rules.

Shortly said the histogram pattern (the shape of bars) is the shortest way to say if the study is acceptable by statistical rules, which is a scientific fact and not an opinion or feeling of things. Who proves it fake may sure have the next "Nobel price".

Somebody were asking which was the best av in my mind? At first I have to say that in the Zoo criteria is only one of many other criterias like in the Wild detection, memory consumption, false positives, capability to read packed or archived files, ease to use, the resources of your PC or what ever. But if we forget those other issues, I have to say as my writing says, I need more large av-test results (lets say some 20), because the results are outcome of independent test occasions where natural variation rules, the winner can't always be the same program. After that I am probably sure which is the best one. You can just now follow the top 5 first.

PS. If you can't open the "Saso Badovinac" link, here are the results of that page.

Subject:
Antivirus Zoo test:

https://grc.com/x/news.exe?cmd=article&group=grc.security.software&item=84294&utag=

Date:
Sat, 22 Mar 2003 13:45:36 +0100
From:

Here are the results of the latest Zoo test. Some things to note:

-this Zoo test is based on more then 100 000 infected files so in my opinion
it is a good one

-this is a Zoo test and cannot be compared to an itw test, for an itw test I recommend the

http://www.virusbtn.com/vb100/

-although it include several trojans and worms it cannot be used as comparison for trojans detection

What for is this test good? It shows very good how much of "all" viruses an antivirus does detect. So if additionally to this list you take a look at

http://www.virusbtn.com/vb100/

(which is a test for the latest most active viruses) then you get very good results of how good your antivirus is (again to note, trojans are in a special category and if someone know of a good trojan comparison test i would love to hear it).

Results (the number is the % of the files (viruses) detected:

mcafee 99,26414
kaspersky 98,44877
f-prot 96,66934
trend 90,94085
symantec 88,49767
drweb 88,39807
sophos 88,13495
alwil 85,32385
rav 84,64077
eset 83,64997
inoculateit 81,69216
panda 80,39365
h+bedv 75,24566
avg 65,57999
avxc 62,58306
virus buster 61,59301
vet 61,06453
ikarus 54,85513
mks 53,27045
hauri 35,17965

"The truth is out there, but it hurts"

Best Regards,
Firefighter

http://members.lycos.co.uk/scheinsicherheit/scanner.htm

Technodrome

To Technodrome from Firefighter!

Thanx for the link and props too!

Best Regards,
Firefighter!

To everybody again from Firefighter!

Hi everybody, I have to clarify my statement about the best av:s concerning in the Zoo test.

You can keep an eye the top 5 scanning engines now and wait until the long testing tournament is over. We have seen only 3 "Grand Prix Tours" but we have still some 17 left, so we have to be patient and let's hope there will be an exciting and equally matched competition.

"The truth is out there, but it hurts!"

Best Regards,
Firefighter!