Retrospective/ProActive Test May 2006

Discussion in 'other anti-virus software' started by TradeMark, Apr 2, 2006.

Thread Status:
Not open for further replies.
  1. lifehacker

    lifehacker Registered Member

    Joined:
    Feb 23, 2006
    Posts:
    44
    1- NOD32
    2- Vba32
    3- BitDefender
    4- AntiVir
    5- KAV
     
  2. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    forget it, it is impossible to guess the outcome.
     
  3. Franklin

    Franklin Registered Member

    Joined:
    May 12, 2005
    Posts:
    2,517
    Location:
    West Aussie
    1-Defensewall
    2-Online Armour
    3-Bufferzone
    4-Sandboxie
    5-LOL:D
     
  4. RejZoR

    RejZoR Lurker

    Joined:
    May 31, 2004
    Posts:
    6,426
    IBK, i assume results will be available in the beginning of June right?
     
  5. pykko

    pykko Registered Member

    Joined:
    Apr 27, 2005
    Posts:
    2,236
    Location:
    Romania...and walking to heaven
    I bet so. :D
     
  6. IBK

    IBK AV Expert

    Joined:
    Dec 22, 2003
    Posts:
    1,886
    Location:
    Innsbruck (Austria)
    yes, the 1st june like usual.
     
  7. Tweakie

    Tweakie Registered Member

    Joined:
    Feb 28, 2004
    Posts:
    90
    Location:
    E.U.
    Yeah, another pointless wild guess:

    1. Nod32
    2. Trustport
    3. AVK
    4. Bitdefender
    5. Norman
    ...

    Don't know about VBA32, but from what I heard I would expect a high number of f/p if paranoid heuristics are used.

    But IMHO, it would be more interesting to talk about the methods used by AVs for variant detection, since AV companies do not publish much on this issue, and since it is currently a big differenciator. I can imagine three families of methods, but all these are probably too slow to be used in AV software.

    - Static analysis:
    ------------------

    . Global binary pattern analysis:

    Very broadly speaking, compression algorithms such as the one used for for .zip file create first an optimized "dictionnary" linking the patterns that appear the most frequently in a given file to shorter bitestrings, re-encode then the file using these bitstrings and finally append the dictionnary at the end of the file. It is a mix of LZ77 (for finding repeating patterns) and Huffman coding (for optimizing the size of the bitstrings).

    Applied to variants detection, the principle of this technique is the following: a "dictionnary" is created from a known malware A (or a significant part of this malware), and applied to a suspicious file B. If the file B is compressed (almost) as efficiently as the file A using this dictionnary, then it means that A and B are similar.

    An antivirus would then need to store a set of such dictionnaries (heavy) and corresponding optimal compression ratios (computed on original malware). Note also that once A and B have been identified as two minor variants belonging to the same family, a new "dictionnary" can be created from (A plus B).

    A simple experiment along these lines is presented here, using the gzip compression tool:

    http://www.google.com/url?sa=D&q=http://homepages.cwi.nl/~wehner/worms/index.html

    The principle is described here:

    http://arxiv.org/PS_cache/cs/pdf/0312/0312044.pdf

    But this technique is probably too demanding.

    . Static topological analysis (call graphs):

    Considering an executable, the principle is to represent the function calls by a graph: each function is represented by a node while edges represent calls from one function to another (Sobig is here: http://www.f-secure.com/2003/sobig_f_2.pdf). Then, the similarity between two executables is deduced from the topological distance between their graphs.

    Halvar Flake (sabre security) presented this method:

    http://www.google.com/url?sa=D&q=ht.../presentations/win-usa-04/bh-win-04-flake.pdf

    Ero Carrera applied it to malware classification:

    http://www.f-secure.com/weblog/archives/carrera_erdelyi_VB2004.pdf

    Once again, this method seems too heavy (in terms of storage, CPU, memory requirement) for being implemented into an AV software. It is more a tool used in AV labs for classifying malware more easily.

    - Dynamic analysis - Behavioral signatures :
    ----------------------------------------------

    AVs do emulate (this includes dynamic translation) executables for finding "suspicious behaviours". This is the base of BitDefender's HiVE, Norman's Sandbox and probably NOD's Advanced Heuristics. The purpose of this emulation is to observe safely how the executable interacts with the system, hence, the calls to windows API functions are monitored.
    Currently only a few suspicious API calls are actively monitored (by HiVE or the Sandbox), or used for detection : those used for writing to registry or to file system, to open a network connection, to inject a thread into another process, etc.

    A fundamental characteristic of an emulated environment is that it stays (or can stay) exactly the same: running processes, files on disk, random generator seed and system time never change. This means that a file that is run in such an environment will always behave exactly the same, and that the same API functions will be called exactly in the same order and with the same parameters. Hence, (almost) all these API calls are meaningfull since they can be used for making a behavioral signature of the program. However, only some of their parameters are meaningfull (meaningfull: will not vary from one variant to another): the name of a registry key, of a window or of a process can be meaningfull. The permissions used for opening a file or manipulating processes can be important. The name of a created file or of a new mutex may vary from a malware to another of the same family. The value of a "handle" or a memory address is in general not characteristic of anything.

    In plain text, a behavioral signature would look like:

    Kernel32.dll->GetCommandLineA
    Kernel32.dll->GetStartupInfo (*)
    Kernel32.dll->GetVersion
    Kernel32.dll->GetThreadLocale
    Kernel32.dll->GetLocaleInfoA (*,*,*,*)
    Kernel32.dll->InitializeCriticalSection (*)
    Kernel32.dll->GetCurrentThreadId
    Kernel32.dll->GetModuleFileNameA (*,*,*)
    Kernel32.dll->GetModuleFileNameA (*,*,*)
    AdvApi32.dll->RegOpenKeyExA (*,"Software\Borland\Locales",*,*,*)
    AdvApi32.dll->RegOpenKeyExA (*,"Software\Borland\Locales",*,*,*)
    Kernel32.dll->lstrcat ("C:\WINDOWS\SYSTEM","\")
    Kernel32.dll->lstrcat ("C:\WINDOWS\SYSTEM\","schost.exe")
    Kernel32.dll->_lopen ("C:\WINDOWS\SYSTEM\schost.exe",*)
    (...)
    User32.dll->LoadStringA (*,*,*,*)
    (...)
    Kernel32.dll->WinExec ("C:\WINDOWS\SYSTEM\schost.exe",*)

    Notice that such a behavioral signature do not need to correspond to "bad" behavior. It may contain only legitimate actions. It just needs to be specific enough to avoid false positives (and generic enough at the same time).

    Then, I suggest the following technique:

    In the lab:
    -----------
    - Create a catalog of Windows API functions. By default none of their parameters are meaningfull.
    - Observe each function and define the set of parameters that could be charasteristic of the behavior of a malware.
    - For every known malware (excluding viruses):
    . Find its original entry point (or unpack it as a pre-processing if needed)
    . From this entry point (or a predefined number of calls later, to avoid the crap added by some compilers), record the 100 (for example) first API calls, whith their meaningfull parameters (if any). This record is the behavioral signature for this malware.

    On the customer's PC:
    ---------------------
    - For every suspicious file:
    . Find its original entry point (or unpack it as a pre-processing if needed)
    . From this entry point (or a predefined number of calls later, to avoid the crap added by some compilers), record the 100 (for example) first API calls, whith their meaningfull parameters (if any). Compare this record with existing behavioral signatures. If a match is found, then we have a *(minor) "variant of...".

    This should be faster than HiVE/Sandbox since the emulation can be stopped after the first calls.
     
  8. Tandra

    Tandra Registered Member

    Joined:
    May 11, 2006
    Posts:
    5
    what is that av comparative?
     
  9. CJsDad

    CJsDad Registered Member

    Joined:
    Jan 22, 2006
    Posts:
    618
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.