Retrospective/ProActive Test May 2006

lifehacker · May 10, 2006

1- NOD32
2- Vba32
3- BitDefender
4- AntiVir
5- KAV

IBK · May 11, 2006

forget it, it is impossible to guess the outcome.

Franklin · May 11, 2006

1-Defensewall
2-Online Armour
3-Bufferzone
4-Sandboxie
5-LOL

RejZoR · May 11, 2006

IBK, i assume results will be available in the beginning of June right?

pykko · May 11, 2006

I bet so.

IBK · May 11, 2006

yes, the 1st june like usual.

Tweakie · May 11, 2006

IBK said:

forget it, it is impossible to guess the outcome.
Click to expand...

Yeah, another pointless wild guess:

1. Nod32
2. Trustport
3. AVK
4. Bitdefender
5. Norman
...

Don't know about VBA32, but from what I heard I would expect a high number of f/p if paranoid heuristics are used.

But IMHO, it would be more interesting to talk about the methods used by AVs for variant detection, since AV companies do not publish much on this issue, and since it is currently a big differenciator. I can imagine three families of methods, but all these are probably too slow to be used in AV software.

- Static analysis:
------------------

. Global binary pattern analysis:

Very broadly speaking, compression algorithms such as the one used for for .zip file create first an optimized "dictionnary" linking the patterns that appear the most frequently in a given file to shorter bitestrings, re-encode then the file using these bitstrings and finally append the dictionnary at the end of the file. It is a mix of LZ77 (for finding repeating patterns) and Huffman coding (for optimizing the size of the bitstrings).

Applied to variants detection, the principle of this technique is the following: a "dictionnary" is created from a known malware A (or a significant part of this malware), and applied to a suspicious file B. If the file B is compressed (almost) as efficiently as the file A using this dictionnary, then it means that A and B are similar.

An antivirus would then need to store a set of such dictionnaries (heavy) and corresponding optimal compression ratios (computed on original malware). Note also that once A and B have been identified as two minor variants belonging to the same family, a new "dictionnary" can be created from (A plus B).

A simple experiment along these lines is presented here, using the gzip compression tool:

http://www.google.com/url?sa=D&q=http://homepages.cwi.nl/~wehner/worms/index.html

The principle is described here:

http://arxiv.org/PS_cache/cs/pdf/0312/0312044.pdf

But this technique is probably too demanding.

. Static topological analysis (call graphs):

Considering an executable, the principle is to represent the function calls by a graph: each function is represented by a node while edges represent calls from one function to another (Sobig is here: http://www.f-secure.com/2003/sobig_f_2.pdf). Then, the similarity between two executables is deduced from the topological distance between their graphs.

Halvar Flake (sabre security) presented this method:

http://www.google.com/url?sa=D&q=ht.../presentations/win-usa-04/bh-win-04-flake.pdf

Ero Carrera applied it to malware classification:

http://www.f-secure.com/weblog/archives/carrera_erdelyi_VB2004.pdf

Once again, this method seems too heavy (in terms of storage, CPU, memory requirement) for being implemented into an AV software. It is more a tool used in AV labs for classifying malware more easily.

- Dynamic analysis - Behavioral signatures :
----------------------------------------------

AVs do emulate (this includes dynamic translation) executables for finding "suspicious behaviours". This is the base of BitDefender's HiVE, Norman's Sandbox and probably NOD's Advanced Heuristics. The purpose of this emulation is to observe safely how the executable interacts with the system, hence, the calls to windows API functions are monitored.
Currently only a few suspicious API calls are actively monitored (by HiVE or the Sandbox), or used for detection : those used for writing to registry or to file system, to open a network connection, to inject a thread into another process, etc.

A fundamental characteristic of an emulated environment is that it stays (or can stay) exactly the same: running processes, files on disk, random generator seed and system time never change. This means that a file that is run in such an environment will always behave exactly the same, and that the same API functions will be called exactly in the same order and with the same parameters. Hence, (almost) all these API calls are meaningfull since they can be used for making a behavioral signature of the program. However, only some of their parameters are meaningfull (meaningfull: will not vary from one variant to another): the name of a registry key, of a window or of a process can be meaningfull. The permissions used for opening a file or manipulating processes can be important. The name of a created file or of a new mutex may vary from a malware to another of the same family. The value of a "handle" or a memory address is in general not characteristic of anything.

In plain text, a behavioral signature would look like:

Kernel32.dll->GetCommandLineA
Kernel32.dll->GetStartupInfo (*)
Kernel32.dll->GetVersion
Kernel32.dll->GetThreadLocale
Kernel32.dll->GetLocaleInfoA (*,*,*,*)
Kernel32.dll->InitializeCriticalSection (*)
Kernel32.dll->GetCurrentThreadId
Kernel32.dll->GetModuleFileNameA (*,*,*)
Kernel32.dll->GetModuleFileNameA (*,*,*)
AdvApi32.dll->RegOpenKeyExA (*,"Software\Borland\Locales",*,*,*)
AdvApi32.dll->RegOpenKeyExA (*,"Software\Borland\Locales",*,*,*)
Kernel32.dll->lstrcat ("C:\WINDOWS\SYSTEM","\")
Kernel32.dll->lstrcat ("C:\WINDOWS\SYSTEM\","schost.exe")
Kernel32.dll->_lopen ("C:\WINDOWS\SYSTEM\schost.exe",*)
(...)
User32.dll->LoadStringA (*,*,*,*)
(...)
Kernel32.dll->WinExec ("C:\WINDOWS\SYSTEM\schost.exe",*)

Notice that such a behavioral signature do not need to correspond to "bad" behavior. It may contain only legitimate actions. It just needs to be specific enough to avoid false positives (and generic enough at the same time).

Then, I suggest the following technique:

In the lab:
-----------
- Create a catalog of Windows API functions. By default none of their parameters are meaningfull.
- Observe each function and define the set of parameters that could be charasteristic of the behavior of a malware.
- For every known malware (excluding viruses):
. Find its original entry point (or unpack it as a pre-processing if needed)
. From this entry point (or a predefined number of calls later, to avoid the crap added by some compilers), record the 100 (for example) first API calls, whith their meaningfull parameters (if any). This record is the behavioral signature for this malware.

On the customer's PC:
---------------------
- For every suspicious file:
. Find its original entry point (or unpack it as a pre-processing if needed)
. From this entry point (or a predefined number of calls later, to avoid the crap added by some compilers), record the 100 (for example) first API calls, whith their meaningfull parameters (if any). Compare this record with existing behavioral signatures. If a match is found, then we have a *(minor) "variant of...".

This should be faster than HiVE/Sandbox since the emulation can be stopped after the first calls.

Tandra · May 11, 2006

what is that av comparative?

CJsDad · May 12, 2006

http://www.av-comparatives.org

Log in or Sign up

Retrospective/ProActive Test May 2006

lifehacker Registered Member

IBK AV Expert

Franklin Registered Member

RejZoR Lurker

pykko Registered Member

IBK AV Expert

Tweakie Registered Member

Tandra Registered Member

CJsDad Registered Member

Log in or Sign up

Retrospective/ProActive Test May 2006

lifehacker Registered Member

IBK AV Expert

Franklin Registered Member

RejZoR Lurker

pykko Registered Member

IBK AV Expert

Tweakie Registered Member

Tandra Registered Member

CJsDad Registered Member

Useful Searches