Interesting new AV review by Andreas Clementi

IBK · May 29, 2004

Thank you!

Backslash · May 29, 2004

Great and professional test! If only Mr. Marx could achieve this level, the anti-virus users world would benefit greatly.

slash

Straight Shooter · May 29, 2004

se7engreen said:

McAfee is doing very well compared to the others, at least in these tests...
now if they can get their security center in operating order or dump that thing all together, they might have a product worth purchasing.
Click to expand...

Couldn't agree more!

Tweakie · May 30, 2004

Hello,

Very interesting retropesctive test.

I have several remarks, though :

First of all, I think there should always be a false-positive test associated with a proactive detection test. The reason is simple : everybody could write a program detecting 100% of viruses...and 100% of clean files. More practically, heuristic detection is a statistical problem, and reducing the f/p rate by 0.01% may induce a reduction of 5% of the detection rate.

Then, at the end of the test, there is written "Keep in mind that you should take care of ItW-samples and not of Zoo-samples that you will most probably NEVER encounter". Whereas this is true for worm and virus samples, it is clearly not true for trojan horses and backdoors. Some backdoors are very widespread (Optix family, Beast, etc.) and should be considered as ITW (although they are not listed in the wildlist), others are really "zoo". Moreover, each month, there are much more new trojan and backdoors created/modified than worms and viruses. Hence, proactive detection of non-viral malware is of primary importance.

Finally, I regret that Norman Virus Control has not been added to the set of tested products. Since it seems to use a heuristic technique that is quite similar to Nod32, I'd have liked to see a comparison of both products. I myself tested it against a small test set mainly composed of backdoors/trojan horses, and noticed a very good heuristic detection rate for the webdownloaders trojan horses, and a fair detection rate for other categories.

I'd also like to share some reflexions on proactive detection tests.

From a theoritical point of view, retrospective testing actually is the only way of efficiently testing the proactive detection capabilities. However, nowadays, heuristic engines are regularly updated, without the need of updating the anti-virus engine itself. This way, AVs try to anticipate new variants of existing malwares. From the end-user point of view, the only interesting problem is : "does my AV catches this malware when it is released", nobody cares about detecting a malware 3 monthes before it is released. As a result, I think each malware should be tested using versions of AVs updated 1 week before its discovery. Something like of a "sliding time window".

IMHO, AV testers should as far as possible try to distinguish between new "variants" and brand new malwares in such tests. I'd also like to know how many samples are *identified* by the AVs (new or modified variants of... repacked version of.., W32/something.gen probably detected using generic signatures) and how many are detected by pure heuristics and named generically (I know that such a distinction may not be possible with all AVs).

_0__0_ · May 30, 2004

@Tweakie

" As a result, I think each malware should be tested using versions of AVs updated 1 week before its discovery. "

I think this could be quite difficult. In respect of new "public" trojans you will usually know their release date. However, the heuristic detection of trojans is frequently not worth mentioning.

It is much harder to determine when a worm, hijacker or virus has been released/discovered for the first time. But maybe it would be possible to conduct such a "1 week before discovery" test in respect of ITW samples which are contained in the Wildlist.

In addition, I also have a comment. Some testers are accused of using inappropriate malware samples. For example, a malware sample could be considered inappropirate if it is ...

a DOS virus or a Linux sample and the scanner runs on a Windows machine,
a harmless trojan client instead of a dangerous trojan server,
a hacktool, which does not endanger the user of the hacktool but third parties,
a sample which is contained in an archive/installer package and which will be detected by the on-access scanner and not by the on-demand scanner,
a sample which is defect and does not properly run etc.

I wonder whether it would be helpful to prepare a documentation regarding the test set. It could be expressly stated that certain kinds of malware samples are/are not included in the test set. Then everybody could make up its own mind whether such samples are important/appropriate etc.

Tweakie · May 30, 2004

_0__0_ said:

However, the heuristic detection of trojans is frequently not worth mentioning.
Click to expand...

Why that so ? I think heuristic detection of trojans is an important challenge. There are much more new trojan horses than viruses ( http://www.dials.ru/english/inf/news.php?id=754 ). Some products are trying to detect it heuristically (e.g. the latest version of Antivir PE). Even if heuristic detection of trojan horses is weak for most of the products, it should be mentionned.

But maybe it would be possible to conduct such a "1 week before discovery" test in respect of ITW samples which are contained in the Wildlist.
Click to expand...

You're right. Considering only the wildlist, this is much more feasible, and it would certainly be interesting.

In addition, I also have a comment. Some testers are accused of using inappropriate malware samples. For example, a malware sample could be considered inappropirate if it is ...

a DOS virus or a Linux sample and the scanner runs on a Windows machine,
a harmless trojan client instead of a dangerous trojan server,
a hacktool, which does not endanger the user of the hacktool but third parties,
Click to expand...

I agree for all these examples.

a sample which is contained in an archive/installer package and which will be detected by the on-access scanner and not by the on-demand scanner,
Click to expand...

I do not completely agree with this one. If you are scanning, let's say, a file server, you would like your on-demand scanner to detect this kind of malware. You may not want to rely on the on-access component running on each individual workstation. Moreover, there are users (like me*) that do not use on-access scanning (or just for tests).

a sample which is defect and does not properly run etc.
Click to expand...

I agree. Even more : most recent backdoor servers need to be "configured" to run properly (this is particularly true for backdoors that "phone home"). All tested samples should IMHO be "configured" properly.

I wonder whether it would be helpful to prepare a documentation regarding the test set. It could be expressly stated that certain kinds of malware samples are/are not included in the test set. Then everybody could make up its own mind whether such samples are important/appropriate etc.
Click to expand...

An interesting alternative is to split the test set in several categories. The "infectious" one and the "others" one. It seems that it is the way Andreas Clementi proceeded, for some samples : the latest category of its test, named "other samples" contains all the "intended", "hacktool" stuff. But you're right : it could be better documented (e.g., I dont know in which category the client parts of the trojan horses are included).

--
*I do not advise anybody to do the same.

IBK · May 30, 2004

About false alarm test: such a test is done (not officially) in order to prevent such paranoic scanners that picks up everything when using the best possible settings. Anyway my FA test-sets are small and in my eyes also FA tests can be not always objective, as I do not know which files you have on your system (I mean, in my test a scanner could have 0,001% of false alarms in 100.000 files, and on your system you could have 1% of FA based on 10.000 files; well, hope not ). As I said, in my very little FA testset I got no FAs.

_0__0_ is right in his statements.

About Norman: Norman does take (probably) months to scan my test-sets (if it does not crash before) when using its best possible settings (and anyway the test results would not be really high [based on my test-sets]). For example NOD32 with /AH does take just some hours and has better results. It is a condition to be able to scan the databases within a reasonable time, so I did not test Norman or some other AV (as I do it for free and testing AV is not a full time job for me - I have to finish my studies). Maybe I will include other 2 AV in the future, but I am atm not sure about that. Also I can not make deeper tests as my time is limited; all tests are done mainly also to satisfy my own curiosity ;-).

Of course a test like you propose would be more realistic, but it is only possible to do with ItW-samples (its like the Outbreak-responses that Marx does), but with Zoo-samples it is (IMO) not possible to do it; there would be not enough samples, the % could not be compared between the scanners, AV could cheat or the results would be influenced by the used samples. Such a test would not be objective and because I want to deliver independent test results, I will/can not make such a test.

_0__0_ · May 30, 2004

@Tweakie

"Why that so ? I think heuristic detection of trojans is an important challenge."

Yes. 100% agreed.

"Some products are trying to detect it heuristically (e.g. the latest version of Antivir PE). Even if heuristic detection of trojan horses is weak for most of the products, it should be mentionned."

The test figures show indeed that the heuristic trojan detection of most AV scanners is quite weak.

I believe that, for example, the new AntiVir heuristic, the TDS-3 heuristic and, last but not least, the forthcoming a2 v2 IDS do/will do a much better job. With respect to these products a "1 week before discovery" test relating to "public" trojans may make sense.

Tweakie · May 30, 2004

IBK said:

As I said, in my very little FA testset I got no FAs.
Click to expand...

Ok. I think some testers include in their FA testset legitimate software that are known to have some functionnalities roughly similar to some malware behavior : utilities to kill processes, format hard-drive, some programs that open a listening service without displaying any UI, executable packers, demo written in asm, legitimate remote control utilities...

Maybe you could try making up such a test set, if/when you have time to do so.

About Norman: Norman does take (probably) months to scan my test-sets
Click to expand...

I'd say around 10 hours for your ~8000 samples (just a guess).

(if it does not crash before)
Click to expand...

Did you see it crash recently ? I'm surprised.

(and anyway the test results would not be really high [based on my test-sets]).
Click to expand...

IMHO, you have to test it before to say such things (OK, I trust you, but I'm just curious).

For example NOD32 with /AH does take just some hours and has better results.
Click to expand...

I'd really like to know how NOD works. Emulation must be really fast. Norman says it can emulate over 3M instructions /seconds on a P4@2GHz, which makes it (very roughly) 700 times slower than execution. I remember that I have read quite similar figures about AVG 6 (was 1000 times slower than execution at that time). I'd be interested in knowing emulation speed of NOD32 (but well, I don' really expect Eset will communicate on that topic).

(as I do it for free and testing AV is not a full time job for me - I have to finish my studies). Maybe I will include other 2 AV in the future, but I am atm not sure about that. Also I can not make deeper tests as my time is limited; all tests are done mainly also to satisfy my own curiosity ;-).
Click to expand...

Very good. And congratulations : they also satisfy my own curiosity

Of course a test like you propose would be more realistic, but it is only possible to do with ItW-samples (its like the Outbreak-responses that Marx does),
Click to expand...

Agreed. I think that this kind of test, performed over ITW samples, would be meaningful.

IBK · May 30, 2004

If I include a scanner in the retrospective test, it must be included also in the other tests. E.g. 10 hours for 1 scanner are to much if other scanners just needs some minutes. I tried last year to test Norman; as you can imagine, it takes ages to scan over 300.000 samples (and this maybe several times to be sure that everything it can detect is detected) with best possible settings compared to others. In that test it crashed. If it still does as often as last year, I do not know. I think you understand that I do not want to deliver the test results 1 month later just because 1 scanner needs much time to scan the databases; most scanners are able to scan with best possible settings the full databases witin 6 hours. I set a time limit of 36 hours in the conditions (in August the time limit will be 48 hours); Test/reference machine is a Pentium 4 HT 2,8 GHz, 512 MB RAM.
I am quite sure that the % on my test-sets Norman would still not reach the 85% (in August/September 2003 it was around 81% in my in-lab test, so it is not official).

Tweakie · May 30, 2004

IBK said:

If I include a scanner in the retrospective test, it must be included also in the other tests.
Click to expand...

I do not really see why this is necessary, but well, you decide...

I tried last year to test Norman; as you can imagine, it takes ages to scan over 300.000 samples (and this maybe several times to be sure that everything it can detect is detected) with best possible settings compared to others.
Click to expand...

Honestly, I don't understand why you are keeping this huge
DOS viruses collection.It's becoming more and more
meaningless for evaluating AVs. By droping these, you
undoubtably would considerably reduce the scanning
time.

In that test it crashed. If it still does as often as last year, I do not know. I think you understand that I do not want to deliver the test results 1 month later just because 1 scanner needs much time to scan the databases; most scanners are able to scan with best possible settings the full databases witin 6 hours. I set a time limit of 36 hours in the conditions (in August the time limit will be 48 hours); Test/reference machine is a Pentium 4 HT 2,8 GHz, 512 MB RAM.
Click to expand...

On your computer, I expect Norman to be faster than previously "guessed" (honestly, the estimat was done supposing that every and all of your samples would be emulated inside the Sandbox for more than 4 seconds each in average, which is probably over-estimated).

I am quite sure that the % on my test-sets Norman would still not reach the 85% (in August/September 2003 it was around 81% in my in-lab test, so it is not official).
Click to expand...

I'm not partcularly surprised, especially if it does not score well in backdoor/trojn categories. But since they are using a quite original technique for detecting new malware, and that they communicate a lot about it, it would be particularly interesting to see whether or not this "Sandbox technology" is efficient, compared with other AVs techniques. That's why I'd find it particularly interesting to test its proactive detection capabilities.

IBK · May 30, 2004

Scanning the DOS collection is very much faster than scanning e.g. the (much smaller) trojan or backdoor collection.
Well, as I said, my capabilities are limited, so, for the moment, I am not going to test it.

Tweakie · May 30, 2004

Understood.

Thank you for the test, anyway !

Dazed_and_Confused · May 30, 2004

What do the results say about NOD32? My experience so far has been great, but I keep seeing tests that show them underperforming. Is that what I'm seeing here?

IBK · May 30, 2004

No.

Kobra · Jun 2, 2004

I'm unsure why people always leave out products like AVK-US from expendia.. Looking at that list, AVK with KAV+RAV engines, probably would have scored at the top, depending on how much of a redundancy there was.

VikingStorm · Jun 2, 2004

Kobra said:

I'm unsure why people always leave out products like AVK-US from expendia.. Looking at that list, AVK with KAV+RAV engines, probably would have scored at the top, depending on how much of a redundancy there was.
Click to expand...

Read the preconditions, only single engine products are tested (I would assume to benchmark each individual real company's technology progression).

Log in or Sign up

Interesting new AV review by Andreas Clementi

IBK AV Expert

Backslash Registered Member

Straight Shooter Guest

Tweakie Registered Member

_0__0_ Guest

Tweakie Registered Member

IBK AV Expert

_0__0_ Guest

Tweakie Registered Member

IBK AV Expert

Tweakie Registered Member

IBK AV Expert

Tweakie Registered Member

Dazed_and_Confused Registered Member

IBK AV Expert

Kobra Registered Member

VikingStorm Registered Member

Log in or Sign up

Interesting new AV review by Andreas Clementi

IBK AV Expert

Backslash Registered Member

Straight Shooter Guest

Tweakie Registered Member

_0__0_ Guest

Tweakie Registered Member

IBK AV Expert

_0__0_ Guest

Tweakie Registered Member

IBK AV Expert

Tweakie Registered Member

IBK AV Expert

Tweakie Registered Member

Dazed_and_Confused Registered Member

IBK AV Expert

Kobra Registered Member

VikingStorm Registered Member

Useful Searches