Interesting new AV review by Andreas Clementi

Discussion in 'other anti-virus software' started by -_-, May 28, 2004.

Thread Status:
Not open for further replies.
  1. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    Thank you! :)
     
  2. Backslash

    Backslash Registered Member

    Joined:
    May 29, 2004
    Posts:
    5
    Great and professional test! If only Mr. Marx could achieve this level, the anti-virus users world would benefit greatly.

    slash
     
  3. Couldn't agree more!
     
  4. Tweakie

    Tweakie Registered Member

    Joined:
    Feb 28, 2004
    Posts:
    90
    Location:
    E.U.
    Hello,

    Very interesting retropesctive test.

    I have several remarks, though :

    First of all, I think there should always be a false-positive test associated with a proactive detection test. The reason is simple : everybody could write a program detecting 100% of viruses...and 100% of clean files. More practically, heuristic detection is a statistical problem, and reducing the f/p rate by 0.01% may induce a reduction of 5% of the detection rate.

    Then, at the end of the test, there is written "Keep in mind that you should take care of ItW-samples and not of Zoo-samples that you will most probably NEVER encounter". Whereas this is true for worm and virus samples, it is clearly not true for trojan horses and backdoors. Some backdoors are very widespread (Optix family, Beast, etc.) and should be considered as ITW (although they are not listed in the wildlist), others are really "zoo". Moreover, each month, there are much more new trojan and backdoors created/modified than worms and viruses. Hence, proactive detection of non-viral malware is of primary importance.

    Finally, I regret that Norman Virus Control has not been added to the set of tested products. Since it seems to use a heuristic technique that is quite similar to Nod32, I'd have liked to see a comparison of both products. I myself tested it against a small test set mainly composed of backdoors/trojan horses, and noticed a very good heuristic detection rate for the webdownloaders trojan horses, and a fair detection rate for other categories.

    I'd also like to share some reflexions on proactive detection tests.

    From a theoritical point of view, retrospective testing actually is the only way of efficiently testing the proactive detection capabilities. However, nowadays, heuristic engines are regularly updated, without the need of updating the anti-virus engine itself. This way, AVs try to anticipate new variants of existing malwares. From the end-user point of view, the only interesting problem is : "does my AV catches this malware when it is released", nobody cares about detecting a malware 3 monthes before it is released. As a result, I think each malware should be tested using versions of AVs updated 1 week before its discovery. Something like of a "sliding time window".

    IMHO, AV testers should as far as possible try to distinguish between new "variants" and brand new malwares in such tests. I'd also like to know how many samples are *identified* by the AVs (new or modified variants of... repacked version of.., W32/something.gen probably detected using generic signatures) and how many are detected by pure heuristics and named generically (I know that such a distinction may not be possible with all AVs).
     
  5. _0__0_

    _0__0_ Guest

    @Tweakie

    " As a result, I think each malware should be tested using versions of AVs updated 1 week before its discovery. "

    I think this could be quite difficult. In respect of new "public" trojans you will usually know their release date. However, the heuristic detection of trojans is frequently not worth mentioning.

    It is much harder to determine when a worm, hijacker or virus has been released/discovered for the first time. But maybe it would be possible to conduct such a "1 week before discovery" test in respect of ITW samples which are contained in the Wildlist.



    In addition, I also have a comment. Some testers are accused of using inappropriate malware samples. For example, a malware sample could be considered inappropirate if it is ...

    a DOS virus or a Linux sample and the scanner runs on a Windows machine,
    a harmless trojan client instead of a dangerous trojan server,
    a hacktool, which does not endanger the user of the hacktool but third parties,
    a sample which is contained in an archive/installer package and which will be detected by the on-access scanner and not by the on-demand scanner,
    a sample which is defect and does not properly run etc.

    I wonder whether it would be helpful to prepare a documentation regarding the test set. It could be expressly stated that certain kinds of malware samples are/are not included in the test set. Then everybody could make up its own mind whether such samples are important/appropriate etc.
     
  6. Tweakie

    Tweakie Registered Member

    Joined:
    Feb 28, 2004
    Posts:
    90
    Location:
    E.U.
    Why that so ? I think heuristic detection of trojans is an important challenge. There are much more new trojan horses than viruses ( http://www.dials.ru/english/inf/news.php?id=754 ). Some products are trying to detect it heuristically (e.g. the latest version of Antivir PE). Even if heuristic detection of trojan horses is weak for most of the products, it should be mentionned.

    You're right. Considering only the wildlist, this is much more feasible, and it would certainly be interesting.

    I agree for all these examples.

    I do not completely agree with this one. If you are scanning, let's say, a file server, you would like your on-demand scanner to detect this kind of malware. You may not want to rely on the on-access component running on each individual workstation. Moreover, there are users (like me*) that do not use on-access scanning (or just for tests).

    I agree. Even more : most recent backdoor servers need to be "configured" to run properly (this is particularly true for backdoors that "phone home"). All tested samples should IMHO be "configured" properly.

    An interesting alternative is to split the test set in several categories. The "infectious" one and the "others" one. It seems that it is the way Andreas Clementi proceeded, for some samples : the latest category of its test, named "other samples" contains all the "intended", "hacktool" stuff. But you're right : it could be better documented (e.g., I dont know in which category the client parts of the trojan horses are included).

    --
    *I do not advise anybody to do the same.
     
  7. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    About false alarm test: such a test is done (not officially) in order to prevent such paranoic scanners that picks up everything when using the best possible settings. Anyway my FA test-sets are small and in my eyes also FA tests can be not always objective, as I do not know which files you have on your system (I mean, in my test a scanner could have 0,001% of false alarms in 100.000 files, and on your system you could have 1% of FA based on 10.000 files; well, hope not :p). As I said, in my very little FA testset I got no FAs.

    _0__0_ is right in his statements.

    About Norman: Norman does take (probably) months to scan my test-sets (if it does not crash before) when using its best possible settings (and anyway the test results would not be really high [based on my test-sets]). For example NOD32 with /AH does take just some hours and has better results. It is a condition to be able to scan the databases within a reasonable time, so I did not test Norman or some other AV (as I do it for free and testing AV is not a full time job for me - I have to finish my studies). Maybe I will include other 2 AV in the future, but I am atm not sure about that. Also I can not make deeper tests as my time is limited; all tests are done mainly also to satisfy my own curiosity ;-).

    Of course a test like you propose would be more realistic, but it is only possible to do with ItW-samples (its like the Outbreak-responses that Marx does), but with Zoo-samples it is (IMO) not possible to do it; there would be not enough samples, the % could not be compared between the scanners, AV could cheat or the results would be influenced by the used samples. Such a test would not be objective and because I want to deliver independent test results, I will/can not make such a test.
     
  8. _0__0_

    _0__0_ Guest

    @Tweakie

    "Why that so ? I think heuristic detection of trojans is an important challenge."

    Yes. 100% agreed.

    "Some products are trying to detect it heuristically (e.g. the latest version of Antivir PE). Even if heuristic detection of trojan horses is weak for most of the products, it should be mentionned."

    The test figures show indeed that the heuristic trojan detection of most AV scanners is quite weak.

    I believe that, for example, the new AntiVir heuristic, the TDS-3 heuristic and, last but not least, the forthcoming a2 v2 IDS do/will do a much better job. With respect to these products a "1 week before discovery" test relating to "public" trojans may make sense.
     
  9. Tweakie

    Tweakie Registered Member

    Joined:
    Feb 28, 2004
    Posts:
    90
    Location:
    E.U.
    Ok. I think some testers include in their FA testset legitimate software that are known to have some functionnalities roughly similar to some malware behavior : utilities to kill processes, format hard-drive, some programs that open a listening service without displaying any UI, executable packers, demo written in asm, legitimate remote control utilities...

    Maybe you could try making up such a test set, if/when you have time to do so.

    I'd say around 10 hours for your ~8000 samples (just a guess).

    Did you see it crash recently ? I'm surprised.

    IMHO, you have to test it before to say such things :p (OK, I trust you, but I'm just curious).

    I'd really like to know how NOD works. Emulation must be really fast. Norman says it can emulate over 3M instructions /seconds on a P4@2GHz, which makes it (very roughly) 700 times slower than execution. I remember that I have read quite similar figures about AVG 6 (was 1000 times slower than execution at that time). I'd be interested in knowing emulation speed of NOD32 (but well, I don' really expect Eset will communicate on that topic).

    Very good. And congratulations : they also satisfy my own curiosity :)

    Agreed. I think that this kind of test, performed over ITW samples, would be meaningful.
     
  10. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    If I include a scanner in the retrospective test, it must be included also in the other tests. E.g. 10 hours for 1 scanner are to much if other scanners just needs some minutes. I tried last year to test Norman; as you can imagine, it takes ages to scan over 300.000 samples (and this maybe several times to be sure that everything it can detect is detected) with best possible settings compared to others. In that test it crashed. If it still does as often as last year, I do not know. I think you understand that I do not want to deliver the test results 1 month later just because 1 scanner needs much time to scan the databases; most scanners are able to scan with best possible settings the full databases witin 6 hours. I set a time limit of 36 hours in the conditions (in August the time limit will be 48 hours); Test/reference machine is a Pentium 4 HT 2,8 GHz, 512 MB RAM.
    I am quite sure that the % on my test-sets Norman would still not reach the 85% (in August/September 2003 it was around 81% in my in-lab test, so it is not official).
     
  11. Tweakie

    Tweakie Registered Member

    Joined:
    Feb 28, 2004
    Posts:
    90
    Location:
    E.U.
    I do not really see why this is necessary, but well, you decide...

    Honestly, I don't understand why you are keeping this huge
    DOS viruses collection.It's becoming more and more
    meaningless for evaluating AVs. By droping these, you
    undoubtably would considerably reduce the scanning
    time.

    On your computer, I expect Norman to be faster than previously "guessed" (honestly, the estimat was done supposing that every and all of your samples would be emulated inside the Sandbox for more than 4 seconds each in average, which is probably over-estimated).

    I'm not partcularly surprised, especially if it does not score well in backdoor/trojn categories. But since they are using a quite original technique for detecting new malware, and that they communicate a lot about it, it would be particularly interesting to see whether or not this "Sandbox technology" is efficient, compared with other AVs techniques. That's why I'd find it particularly interesting to test its proactive detection capabilities.
     
  12. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    Scanning the DOS collection is very much faster than scanning e.g. the (much smaller) trojan or backdoor collection.
    Well, as I said, my capabilities are limited, so, for the moment, I am not going to test it.
     
  13. Tweakie

    Tweakie Registered Member

    Joined:
    Feb 28, 2004
    Posts:
    90
    Location:
    E.U.
    Understood.

    Thank you for the test, anyway !
     
  14. Dazed_and_Confused

    Dazed_and_Confused Registered Member

    Joined:
    Mar 4, 2004
    Posts:
    1,831
    Location:
    USA
    What do the results say about NOD32? My experience so far has been great, but I keep seeing tests that show them underperforming. Is that what I'm seeing here?
     
  15. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    No. :p
     
  16. Kobra

    Kobra Registered Member

    Joined:
    May 11, 2004
    Posts:
    129
    I'm unsure why people always leave out products like AVK-US from expendia.. Looking at that list, AVK with KAV+RAV engines, probably would have scored at the top, depending on how much of a redundancy there was.
     
  17. VikingStorm

    VikingStorm Registered Member

    Joined:
    Jun 7, 2003
    Posts:
    387
    Read the preconditions, only single engine products are tested (I would assume to benchmark each individual real company's technology progression).
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.