Security software using MD5

Discussion in 'other anti-malware software' started by BoerenkoolMetWorst, Aug 12, 2013.

Thread Status:
Not open for further replies.
  1. BoerenkoolMetWorst

    BoerenkoolMetWorst Registered Member

    Joined:
    Dec 22, 2009
    Posts:
    4,867
    Location:
    Outer space
    A lot of security software rely on hashing. AV's for example to identify trusted files to prevent rescanning and HIPS to create rules for a specific file. Most still use the insecure MD-5 which is already deemed vulnerable years ago:
    http://en.wikipedia.org/wiki/Md5

    I always thought they still use MD5 because it's a lot faster and thus lighter. However, here is a graph that shows while SHA-512 and SHA-384 are a lot slower, SHA-1 is actually a little faster than MD5, and SHA-256 is a lot faster:

    ~~ image removed (copyright) - it's available at the link below. ~~

    http://www.not-implemented.com/comparing-hash-algorithms-md5-sha1-sha2/
     
    Last edited by a moderator: Aug 14, 2013
  2. Hermescomputers

    Hermescomputers Registered Member

    Joined:
    Jan 9, 2006
    Posts:
    1,069
    Location:
    Toronto, Ontario, Canada, eh?
    The issue with encryption failure as described is inconsequential where it comes to simple check-sum signature verification. It is a solid, and effective identification method...
    Also once an MD5 hash is minted it's counterparts (Sha1/Sha256) can be used to confirm validity (internally). You can mess with an MD5 cypher by braking it then use this as the basis for decryption, but you can not change a file's signature check-sum without changing the file itself at bit level, thus changing it's check-sum (MD5 signature check-sum is a simple bit level calculation) meaning MD5 in this context is still safe...

    However using MD5 as the basis for an encryption algorithm is where the issues are potentially negative. A distinction must be made between an MD5 signature Check-sum and an MD5 Encryption algorithm cypher.
    This matters since security tools makers don't use this encryption to "protect" via encryption, but to "identify" static binaries...

    Another point is size really is a significant variable, as cypher processing speed is not the only thing that matters.
    For example, my own Advanced Process Analysis and Identification System https://hermes-computers.ca//apais_1.php currently is composed of multiple large sized databases. Several of which are well over 3,000,000 checksums a piece. One of these large checksum containers is 160 Megs (x3) this is with MD5 as checksums, now switch to SHA512 and multiply that byte size (relative to each signatures strings) well, you get the picture...

    For Example:
    Take the byte count and tally up: (hint 1 byte = 8bit) and multiply this by the number of Millions signatures most products contains.
    MD5 (32 Bytes)
    d4941f3843e8795c7e7417dcac49a152
    SHA1 (40 Bytes)
    cf4a820c10b79ef602438014c5c9d19b2cda118a
    SHA256 (63 Bytes)
    1883b9822db58ecea8e206c6bb1639e6557d9a243d8da9329a4f124d3c9dde8
    SHA512(128 Bytes)
    e82b473420fc1398f027bf0612f86e3335ca2b83e2946f3088d8118f408e5458bbcc2c43d68aca843c41db3cff9b9b870f409e1ac30c688b8901f6420f8a8a0d

    Simply put, the "More" secure Sha512 for example will likely never be used to identify malware... It simply is too frigging big!
    And it makes no sense at all to even consider it as the benefits over MD5 check-sums are nil.

    For those curious the signatures listed above are from a serious malware. A very nasty Trojan...
    You can check what it is by cutting and pasting into here: https://www.virustotal.com/en/#search

    Or you can try and test your skills, and do a little reverse behavior analysis and see if you can even see the malware elements:
    https://malwr.com/analysis/NjNlOTU0ZDhmNmZjNGEzOWIzMDJmYWU5MzE0OTRlOWE/
     
    Last edited: Aug 13, 2013
  3. vojta

    vojta Registered Member

    Joined:
    Feb 26, 2010
    Posts:
    830
    Great post, Hermescomputers.

    It's true, md5 is an abomination when used to encrypt passwords, for example, but is fine for identifying single files. They are absolutely different things.
     
  4. ELWIS1

    ELWIS1 Registered Member

    Joined:
    Sep 29, 2010
    Posts:
    60
  5. BoerenkoolMetWorst

    BoerenkoolMetWorst Registered Member

    Joined:
    Dec 22, 2009
    Posts:
    4,867
    Location:
    Outer space
    Thanks for your informative reply, however per Wikipedia, creating different files with the same MD5 checksum was already possible 9 years ago:
    Of course creating a different file with same checksum that also does what you want it to do is something else entirely and probably practically impossible, but a lot of new techniques can be discovered in 9 years and available processing power to calculate collisions has improved exponentially, especially when you put GPU's to work.
     
  6. Hermescomputers

    Hermescomputers Registered Member

    Joined:
    Jan 9, 2006
    Posts:
    1,069
    Location:
    Toronto, Ontario, Canada, eh?
    Well, if someone wants to impersonate malware, which is kinda strange idea to begin with, then let them go right ahead! They can create some type of MD5 impersonation (collision) and get detected as malware pronto! Suits me just fine...

    The primary difference is perhaps (if even remotely possible) the use of impersonation against white-listing systems that use MD5, and that again as stated would be difficult if not impossible and highly impractical...

    This would require the use of highly sophisticated polymorphic impersonation technology dynamically actuated in real time or a purpose built binary to impersonate a specific application on someone's platform, perhaps possible but highly unlikely anyone would go to this trouble as it's a lot easier to simply create a brand new binary never before identified as malware...

    This is why polymorphic malware is so prevalent. Otherwise everyone would be infected and even white-listing would be ineffective...
    (What is Polymorphic Code? https://en.wikipedia.org/wiki/Polymorphic_virus)

    The alternative to signatures based identification is Heuristics technology and perhaps the only currently effective method used to identify variants in real time. (Heuristics: https://en.wikipedia.org/wiki/Antivirus#Heuristics) or something like the (Google Code Yara Project) which uses binary code patterns to sniff malware based on code chunks instead of heuristics behavioral scopes, and check-sums. This method is more like a form of real-time albeit limited reverse engineering. (Yara Project https://code.google.com/p/yara-project/)

    A typical malware identification technology will combine most of the above stated methodology, I use my own custom system, (Primary Risk Analysis and Advanced Risk Analysis, and a few sophisticated behavior scopes), so MD5 signatures are certainly not the only ways we identify hostile elements...

    Besides these many systems (internally) utilize other types of check-sum verification, some proprietary (In house) or some public like CRC32's (Cyclic Redundancy Code or it's variant "check") which can prove useful to reduce database size. But is insecure...

    MD5 remains the prime choice until something better appears. It's also why many products are attempting to go away from check-sum identification since the data-set size is really beginning to be prohibitive...

    For example my entire data-set (Around 6,000,000 Malware Signatures) is but an actual fraction of currently known malware. Even worst if taking into consideration malware designed to affect defunct operating systems like MS-DOS and Windows 3.1 / 95 / 98 / ME etc...

    Among many other tricks of the trade this effectively forces developers to try, and selectively filter out of the data the most unlikely malware to appear, and attempt to predict the malware with the highest probability of currency for inclusion in an attempt to reduce check-sum database size...

    This enormous bloating of malware databases can be directly attributed to the appearance of polymorphic capabilities in malware. A single Trojan, can now become 10,000 different Trojans in a matter of days if not less simply by dynamically modifying it's check-sum signatures in real time at each propagation. (Keep in mind that this also changes the file at bit level) but as polymorphic malware can change it's signatures dynamically it creates a nightmare for signatures based products...

    Another reason, MD5 is used so extensively is standardization. It's now more or less the most compatible Whitelist/Malware signature sharing methodology as a technology. Everyone understand MD5 and almost everyone uses it....

    CRC32 checksum
    http://www.accuhash.com/what-is-crc32.html
     
    Last edited: Aug 15, 2013
  7. BoerenkoolMetWorst

    BoerenkoolMetWorst Registered Member

    Joined:
    Dec 22, 2009
    Posts:
    4,867
    Location:
    Outer space
    Thanks for your extensive reply :) I did indeed mean using hash collision to get the same hash as a whitelisted/trusted file, but like you said, even if it is possible, there are tons of other ways to infect someone which are way easier.
     
  8. Hermescomputers

    Hermescomputers Registered Member

    Joined:
    Jan 9, 2006
    Posts:
    1,069
    Location:
    Toronto, Ontario, Canada, eh?
    You are welcome!

    However I have the same concern about white listing as discussed, this is why one must take it contextually during verification.

    It is another reason why second guessing is engineered to be so easy in my tools as you can simply verify the white listed or any other executable under analysis via VT, or any number of other systems with a single click... This now also includes using other check-sums like Sha1 for example.

    Discrepancies can be easily tagged. (Path, binary size, file date, etc...) and blacklisting a previously Local or Global white listed item is just one click!

    For example, Globally white-listed item, are registered applications and will never be located in /Temp/ directory, if it does my Primary Risk Analysis technology will pick that up and raise the alarm...

    Another example is my temporal Signatures tracking, which document the check-sum of each file under analysis each time it is analyzed. It also adds the full path, and the file revision. This data aggregates over time. Any modification to this data, can be used to identify impersonation even if it's discovered on a different pc. So signatures impersonation to be successful also require to be exact in all aspect or they will be detected. This data is easily accessible and visible to the technician/user...

    Also the premise to my eclectic methodology is to fully engage the technicians, where full automation attempts to exclude them out of the analytical, and decision making process.

    Also I focus on one file at a time. Full Spectrum. Which really brings it to into focus. Anyone fully utilizing the resources and methodology included in my Advanced Process Analysis and Identification System (A.P.A.I.S.) will easily flush out hostiles.

    This engagement can lead to identify previously unidentified malware as well, as noticing discrepancies which often lead to a rapid yet correct infection diagnostic.
     
    Last edited: Aug 18, 2013
  9. Noob

    Noob Registered Member

    Joined:
    Nov 6, 2009
    Posts:
    6,491
    This turned into a very technical and informative thread. :)

    So in less words, MD5 is still fine as one of the detection methods used by AVs but its not recommended for encryption.
     
    Last edited: Aug 17, 2013
  10. erikloman

    erikloman Developer

    Joined:
    Jun 4, 2009
    Posts:
    3,152
    Location:
    Hengelo, The Netherlands
    It's quite easy to create a file with the same MD5 hash (already in 2006):
    http://www.mscs.dal.ca/~selinger/md5collision/

    This was the main reason for us to go with SHA-256.

    In terms of performance, using SHA-256 or MD5 is of no importance as when hashing files from disk; disk is waaaay much slower than the hashing algorithms (we hash concurrently; read block 1, hash block 1-while reading block 2, etc.). CPU usually sits idle, waiting for data.
    Also, you can accelerate SHA-256 up to ~11,5 CPU cycles per byte using SSE4 and AVX instruction set:
    http://www.intel.com/content/dam/ww...hite-papers/sha-256-implementations-paper.pdf

    In the near future, Intel adds CPU instructions to further accelerate SHA-256 hashing:
    http://software.intel.com/en-us/articles/intel-sha-extensions

    In terms of storage, MD5 is indeed smaller. But as we store a lot more info on the millions of files we receive (classification, history, file info, behavior, statistics etc), storing MD5 or SHA-256 makes ~2% difference in total DB storage. But if you store the hashes plain in a file, it becomes twice the size when using SHA-256 instead of MD5.
     
  11. vojta

    vojta Registered Member

    Joined:
    Feb 26, 2010
    Posts:
    830
  12. BoerenkoolMetWorst

    BoerenkoolMetWorst Registered Member

    Joined:
    Dec 22, 2009
    Posts:
    4,867
    Location:
    Outer space
  13. Hermescomputers

    Hermescomputers Registered Member

    Joined:
    Jan 9, 2006
    Posts:
    1,069
    Location:
    Toronto, Ontario, Canada, eh?
  14. Hermescomputers

    Hermescomputers Registered Member

    Joined:
    Jan 9, 2006
    Posts:
    1,069
    Location:
    Toronto, Ontario, Canada, eh?
    Speaking of Collisions... Impersonation
    Here is an example you can taste....

    17/09/2013 14:43:36
    C:\Windows\system32\kernel32.dll
    Date : 01/08/2013
    Size : 1114112
    Version: 6.1.7601.18229
    MD5 : 365A5034093AD9E04F433046C4CDF6AB
    SHA1 : 7244AE695F8E5A730857781635ACB2969F15C594


    and another here:

    17/09/2013 14:48:13
    C:\Windows\system32\KERNELBASE.dll
    Date : 01/08/2013
    Size : 274944
    Version: 6.1.7601.18229
    MD5 : 1B7343C3765638D4D17CB925F84F8ABE
    SHA1 : B001F04386EBE09DDAC86297FA7B18AF37ABAFFF


    This is how you test...

    First checks the MD5 Here: https://www.virustotal.com/en/#search

    Then check the SHA-1 same way but in another window...

    Then compare all the signatures.... and Poof Impersonation discovered!

    ...It's a simple hack by highly funded and technically adept professionals...

    They can spoof almost anything... but there is a catch!
    They cant spoof the two in tandem!

    They can spoof the MD5 or the Sha-1 but not both...

    So get vigilant and do some comparison and you will identify all off their attempts... Some will even blow your mind!

    Once you identify the impersonation, you then simply use the target to reverse engineer and acquire the code, method etc...

    You can use this tool to find the impersonations Advanced Process Analysis and Identification System Technician's Edition

    (It's what I use: https://hermes-computers.ca/apais_1.php)

    Cheers! :-*

    Guy
     
    Last edited: Sep 20, 2013
  15. noone_particular

    noone_particular Registered Member

    Joined:
    Aug 8, 2008
    Posts:
    3,798
    Regarding collisions on MD5 and other hashes, this utility might be of interest. It adds this tab to the properties menu.
    hashes.gif
    How many different hashes would you like?
    http://www.febooti.com/downloads/
     
  16. BoerenkoolMetWorst

    BoerenkoolMetWorst Registered Member

    Joined:
    Dec 22, 2009
    Posts:
    4,867
    Location:
    Outer space

    Attached Files:

  17. Snoop3

    Snoop3 Registered Member

    Joined:
    Jan 2, 2011
    Posts:
    474
    yeah, thats one of my first adds to a new install. there's also HashTab that is similar. i like those programs that add more info to the properties menu.
     
  18. Hermescomputers

    Hermescomputers Registered Member

    Joined:
    Jan 9, 2006
    Posts:
    1,069
    Location:
    Toronto, Ontario, Canada, eh?
    Great Input, BoerenkoolMetWorst and noone_particular!

    However having the ability to view the signatures is only half the job...
    You really need to cross validate with a third party to pickup discrepancies...

    That is why you need to check with any service that allows you to view their checksums like Virus Total. I already explain the proper methodology in a previous post...

    Here are a few URL to help with this:
    https://www.virustotal.com/en/#search
    http://virusscan.jotti.org/hashsearch.php

    Unfortunately currently the only thing proving effective is a visual inspection to identify collisions, but most important impersonations.
    And you need to independently check both MD5 and Sha1 with a third party to pick out the particulars.

    Also you need to confirm the file data, i.e. file inception date, size and static directory location. Then you can correctly pick out exact discrepancies.... Also report them with V.T. it really helps when everyone else can also see this...

    It's really important because even VT doesn't know if the files it has have collisions or are impersonated. Leave a note/comment and explain the difference you identified. It really will go a long way in addressing this issue and make it less attractive...

    And maybe if enough of us do this, they will build some type of algorithm to flush these out and make it visible in their analysis report.
     
    Last edited: Sep 23, 2013
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.