TDS-3 CRC32-test: comments/questions

Discussion in 'Trojan Defence Suite' started by FanJ, Feb 12, 2003.

Thread Status:
Not open for further replies.
  1. FanJ

    FanJ Guest

    I have posted a thread with some guidelines for the CRC32-test in TDS-3.

    You can find that thread here:

    http://www.wilderssecurity.com/showthread.php?t=7200

    [hr]

    In this thread you could post comments/questions.
     
  2. Pieter_Arntz

    Pieter_Arntz Spyware Veteran

    Joined:
    Apr 27, 2002
    Posts:
    13,491
    Location:
    Netherlands
    Very nice, FanJ :)

    ~applaud~

    Regards,

    Pieter
     
  3. jvmorris

    jvmorris Registered Member

    Joined:
    Feb 9, 2002
    Posts:
    618
    Looks good to me! ;)
     
  4. puff-m-d

    puff-m-d Registered Member

    Joined:
    Feb 13, 2002
    Posts:
    5,703
    Location:
    North Carolina, USA
    FanJ,

    I knew this was on your "to-do" list......
    GREAT JOB :D !!!
    Have a karma cookie on me!!

    Regards,
    Kent
     
  5. FanJ

    FanJ Guest

    Thanks Pieter, Joseph and Kent ;)
     
  6. jvmorris

    jvmorris Registered Member

    Joined:
    Feb 9, 2002
    Posts:
    618
    FanJ,

    I was simply responding to your write-up. However, reliance on CRC-32 hashes has been compromisable for at least ten years now. I would be much happier if TDS (and KAV, for that matter) would at least upgrade to MD5 or SHA1 hashes.

    Yes, that means that authentication would take more time. But, what are we interested in here? The time to accomplish authentication or the credibility of the authentication? Me, I vote for the credibility.
     
  7. Wayne - DiamondCS

    Wayne - DiamondCS Security Expert

    Joined:
    Jul 19, 2002
    Posts:
    1,533
    Location:
    Perth, Oz
    CRC32 is an excellent 32-bit checksum - its small, very fast, and the chance of two files having the same CRC32 checksum are 1 in 2^32 (4,294,967,296). All file integrity checkers are trying to determine if 1 or more bits in the file has changed - you don't need 128bit algorithms to do this, it's actually overkill. Even a 16-bit checksum would suffice for this kind of simple task, but 32-bit is ideal.

    For a trojan to modify a CRC32-checked file (ie. c:\autoexec.bat so it could add an autostart entry for itself) it would have to continuously modify the file (in memory), CRCing it each time, up to 4,294,967,296 times, until it found a match of the original. The file might be huge (up to 4 gigabytes) and it may corrupt the existing file due to random data. 4 billion CRC calculations will take a fair while as far as trojans go, generating the random data also slows it down, and as the other file with the same CRC could be anything up to 4GB in size, the trojan would have no choice but to use disk space, and disk I/O is always slow. Is all of that slow brute-forcing really worth hiding a modification in one file? :)

    Anyway all that aside, nice work Jan :)

    Best regards,
    Wayne
     
  8. Jooske

    Jooske Registered Member

    Joined:
    Feb 12, 2002
    Posts:
    9,713
    Location:
    Netherlands, EU near the sea
    He did it again! One less on the "to-do list"!
    Ehhhmmmm i suppose it would be toooo much to ask to include several test formats in TDS4, which the operator can choose at wish (maybe as an additional plugin, whatever.....)
     
  9. jvmorris

    jvmorris Registered Member

    Joined:
    Feb 9, 2002
    Posts:
    618
    Well, you got the first part right -- CRC32 is indeed an excellent 32-bit checksum; if it wasn't, it would have been superseded by some other 32-bit hashing function. :D

    Unfortunately, you got the second part wrong. What you displayed is actually the probability that if you randomly generate a 32-bit checksum and then select a file (randomly), the probability that it will match the arbitrary checksum.

    Well, that's not the issue.

    We ran into the same false logic when developing the FBI's DNA database -- this is exactly what they wanted to do. Unfortunately, it doesn't quite work that way. In that instance, the REAL question was what's the probability that a DNA profile based on forensic evidence from a crime scene may be matched by two or more individual's DNA? Now, given that approximately 1% of Caucasians are in fact identical twins, this obviously isn't even close to being correct. (There are a lot of other complications in DNA due to genetic inheritance, but we don't need to go there.)

    There are two fundamental problems with your second statement.

    First, CRC32 is actually only a pseudorandom generator (albeit a very nice one), but the odds aren't quite up to 1 in 232. And, in actuality, given that most files tend to be of finite size, they don't even approximate this level.

    At best, the question is: What is the probability that two (or more) files on a machine will generate the same CRC32 value (purely randomly)? This probability is much higher. (Indeed, it follows the same logic related to the 'Birthday problem': What's the probability that two individuals in a class will have the same date of birth (excluding year considerations)? If we assume that there are only 365 possible birthdates in a year, the odds become even when the class size exceeds 23 individuals. It surpasses 90% if there are approximately 40 people in the class.

    Now, I can't speak for others, but I have over 10,000 executable files on this clunky old Win 98 SE machine. The possibility that two (or more) of them would generate the same CRC32 hash is far higher than what you indicate. Indeed, this is part of the reason that even Zone Alarm went to MD5 (128-bit hash algorithm) and that Microsoft went even further to SHA1 (160-bit hash algorithm). And, very shortly, I expect that we will be seeing routine use of 256-bit and 320-bit hash algorithms to establish authenticity.

    Quite frankly, a lot of searches for file duplicates (regardless of how they may be named or what file extension they may use) use these higher-order hash algorithms for precisely this reason.

    Still, not finished. All we've been talking about (so far) is whether two, perfectly innocuous, files are likely to generate the same CRC32. Now, let's talk about someone trying to maliciously 'match' a file by giving it the same filename/fileext and CRC32 hash. Last time I checked (must have been about two years ago), that could be done in about two hours on a desktop PC. Well, this is what malicious code is all about, now, isn't it? They want to pass the 'duck test'. And, at this point, all the nice fine statistical analyses about randomly generated hashes goes right out the window.

    In all likelihood, Wayne, I assume you check far more than the hash under any circumstances, as does KAV.

    Bad scenario, Wayne. Nobody in their right mind is going to try to 'alias' an individual user's Autoexec.bat file. They're going to go after something like explorer.exe, iexplore.exe, kernel32.dll, mprexe.exe, msimn.exe, msmgs.exe, mstask.exe, outlook.exe, pstores.exe, rnapp.exe, rpcss.exe, rundll32.exe (just to name some options on Win 98 SE, which I'm using as I write this). I think you can easily guess which ones I'd target -- the ones that the user doesn't typically see and thereby notice that it doesn't quite function normally anymore. They can do this before they distribute their little nasty, no need to run the junking algorithm on the target machine. (Exactly how they get it on the end-user's box is an entirely different matter, but I think we both know how this could be done.)

    The problem that they confront in successfully accomplishing this is directly related to whether they have to anticipate CRC32 (32-bit), MD5 (128-bit), or even SHA1(160-bit) Let's ignore the matter of more sophisticated hashes for the moment. As I noted in my original posting, methods of generating an app with a CRC32 identical to that of an existing valid application have been known for at least ten years; even MD5 has been cracked since (what?) 1996 or 1997.

    Upgrading the hashing to MD5 (or even SHA1) is not that much of a performance hit. On my Win XP and Win 2000 Pro machines, I can re-compute the hash for all of the approximately 10,000 executables in five or six minutes (and I'm not using anything to do it that's all that sophisticated). Yes, it's a bitch on this clunky old Win 98 SE box with its slow hard drive, but that's an entirely different issue. (I can still generate the hash on an individual file in about 0.1 seconds.)
     
  10. jvmorris

    jvmorris Registered Member

    Joined:
    Feb 9, 2002
    Posts:
    618
    Oops, forgot to include the relevant algorithm:

    I believe that it runs as follows:

    Given that a particular hash algorithm can truly and randomly generate n distinct hashes (232 for CRC32, 2128 for MD5, and 2160 for SHA1), and that there are k executable files on a particular machine, then the probability that two or more executables will exhibit the same hash are:

    1- n!/((n-k)!*nk)

    For the reasons noted above, this is a very conservative estimate, but it is far greater than 1/n in most practical instances. (Don't try running that formula on a standard PC; you'll get overflow and horrendous rounding errors; this is just the mathematical formula.)
     
  11. FanJ

    FanJ Guest

    Some additional remarks:

    I talked about the abbreviation %WINDIR% for C:\WINDOWS
    If you think that that abbreviation is a little bit confusing for you, it is perfectly fine to use C:\WINDOWS
    I will show an example:
    I have put a shortlink to that very nice program jv 16 Powertools at my desktop.
    Then I added the full path to that link to my file crcfiles.txt.
    Instead of using:
    %WINDIR%\Desktop\jv16 PowerTools.lnk
    I used now:
    C:\WINDOWS\Desktop\jv16 PowerTools.lnk

    Then I did manually the CRC32-test:

    00:36:07 [CRC32] Started - verifying 123 files ...
    00:36:12 [CRC32] Test finished.

    Then I removed that shortlink from my desktop, and I did manually that CRC32-test again.
    TDS-3 gave me this alert:

    00:37:14 [CRC32] Started - verifying 123 files ...
    00:37:15 [CRC32] File doesn't exist: C:\WINDOWS\Desktop\jv16 PowerTools.lnk
    00:37:19 [CRC32] Test finished.

    So this showed you that it is perfectly fine to use C:\WINDOWS instead of %WINDIR%

    BTW: I was talking here about Windows 98 SE, which is the system I use.

    [hr]

    Another remark:
    I have told you that the CRC32-test will tell you whether any file, which is listed in crcfiles.txt, is changed or removed.
    However it will NOT tell you:
    1.whether such a file is MOVED TO ANOTHER DIRECTORY. In that case it will simply tell you that that file doesn’t exist (which is true because it doesn’t exist at THAT place on your system anymore).
    2.whether a NEW FILE is placed on your system. Remember: only files will be checked that are listed in crcfiles.txt, and if a new file is put on your system and you have not added it to crcfiles.txt, there will be no check by the CRC32-test for such a new file.
    If you want to be warned in such circumstances, you need to use a more specialized File Integrity Checker like for example NISFileCheck or Adinf32.
     
  12. xor

    xor Guest

    This database used by the TDS checker should be protected from unauthorized modifications - for instance an intruder who can change the database can subvert the entire integrity checking scheme.
    So make it read only from the boot time on if it should be secure.

    [xor]
     
  13. Jason_DiamondCS

    Jason_DiamondCS Former DCS Moderator

    Joined:
    Nov 11, 2002
    Posts:
    1,046
    Location:
    Perth, Western Australia
    You provide a lot of good points Joseph. I just want to provide a bit more information in regards to MD5 being "cracked". Given a perfect HASH for a given bitsize, lets say a perfect 128bit hash, the number of collisons in an amount of data is

    2 ^ ( sizeof(DATA)-sizeof(HASH) )bits

    A collision occurs when some DATA has a HASH the same as another piece of DIFFERENT DATA. So if we have 17bytes (136bits) of DATA and we are hashing it with an 16byte hash (128bit) the number of collisions is 256 ( 136-128 = 8, hence 2^:cool:. So if we used a perfect 128bit hash on a 17 byte password there are 255 OTHER (<=17 byte) passwords with the same hash. If the password was 18bytes long then there are 65536 (2^16) other passwords matching the same hash. This is the basis of BRUTE-FORCE cracking any HASH to find DATA which will match a specified hash. The time taken to BRUTE-FORCE a HASH is significantly high because there are 2^(sizeof(HASH)bits) different combinations to try, of course there are varying optimizations to this limited attack on a hash.

    A lot of HASH's have been cracked in a way which you don't need to BRUTE FORCE to find a collision or minimize the amount of combinations to such an extent it can be cracked very quickly. Given that any HASH or Encryption (excluding one time pads) can be brute forced given enough time I wouldn't call a brute force attempt a crack as such. To my knowledge MD5 hasn't been FULLY broken in a way that you can find any collision given a length unless you brute force it.

    It will always be the way that when 128bit hashes (MD5, etc) can be easily bruteforced, everyone else should be using the next best thing. Similar to what is happening with RSA encryption at the moment, computers are getting quicker and quicker so we keep moving up the amount of BITS we encrypt in. Though BRUTE FORCING prime numbers can be done much more quickly these days then brute forcing HASH's since you don't have to test EVERY single combination unlike in good HASH's.

    You can work out the odds of two different files having the same HASH given the number of collisions :)

    -Jason-
     
  14. FanJ

    FanJ Guest

    Hi all,

    My proposal is to continue the more general discussion about the security of HASH algorithms (and related general topics about those algorithms) at a new thread in the NIS File Check forum.

    In that way we could have this existing thread merely for questions/remarks with respect to the CRC32-test in TDS-3, and have the new thread in the NIS File Check forum for the more general issues that are not aimed in particular to TDS-3.

    I hope you all can agree with that.

    I will try to copy the more general part of the postings about HASH-algorithms from this thread to the new one.

    I hope that this is the right decision and that I'll do it in the right way. If I did wrong, I really apologize to you all.

    Best Regards, Jan.
     
  15. FanJ

    FanJ Guest

    You can find the new thread here:

    http://www.wilderssecurity.com/showthread.php?t=7276

    Let's use that new thread for the more general discussion about HASHES, and let's use this thread "TDS-3 CRC32-test: comments/questions" for the questions related to TDS-3.
     
  16. FanJ

    FanJ Guest

    See also this thread about files with attr h :

    http://www.wilderssecurity.com/showthread.php?t=7287
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.