Security software using MD5

BoerenkoolMetWorst · Aug 14, 2013

A lot of security software rely on hashing. AV's for example to identify trusted files to prevent rescanning and HIPS to create rules for a specific file. Most still use the insecure MD-5 which is already deemed vulnerable years ago:

In 1996, a flaw was found with the design of MD5, and while it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1—which has since been found to be vulnerable as well. In 2004, more serious flaws were discovered in MD5, making further use of the algorithm for security purposes questionable—specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum.[4][5] Further advances were made in breaking MD5 in 2005, 2006, and 2007.[6] In December 2008, a group of researchers used this technique to fake SSL certificate validity,[7][8] and CMU Software Engineering Institute now says that MD5 "should be considered cryptographically broken and unsuitable for further use",[9] and most U.S. government applications now require the SHA-2 family of hash functions.[10]
Click to expand...

http://en.wikipedia.org/wiki/Md5

I always thought they still use MD5 because it's a lot faster and thus lighter. However, here is a graph that shows while SHA-512 and SHA-384 are a lot slower, SHA-1 is actually a little faster than MD5, and SHA-256 is a lot faster:

~~ image removed (copyright) - it's available at the link below. ~~

http://www.not-implemented.com/comparing-hash-algorithms-md5-sha1-sha2/

Hermescomputers · Aug 13, 2013

The issue with encryption failure as described is inconsequential where it comes to simple check-sum signature verification. It is a solid, and effective identification method...
Also once an MD5 hash is minted it's counterparts (Sha1/Sha256) can be used to confirm validity (internally). You can mess with an MD5 cypher by braking it then use this as the basis for decryption, but you can not change a file's signature check-sum without changing the file itself at bit level, thus changing it's check-sum (MD5 signature check-sum is a simple bit level calculation) meaning MD5 in this context is still safe...

However using MD5 as the basis for an encryption algorithm is where the issues are potentially negative. A distinction must be made between an MD5 signature Check-sum and an MD5 Encryption algorithm cypher.
This matters since security tools makers don't use this encryption to "protect" via encryption, but to "identify" static binaries...

Another point is size really is a significant variable, as cypher processing speed is not the only thing that matters.
For example, my own Advanced Process Analysis and Identification System https://hermes-computers.ca//apais_1.php currently is composed of multiple large sized databases. Several of which are well over 3,000,000 checksums a piece. One of these large checksum containers is 160 Megs (x3) this is with MD5 as checksums, now switch to SHA512 and multiply that byte size (relative to each signatures strings) well, you get the picture...

For Example:
Take the byte count and tally up: (hint 1 byte = 8bit) and multiply this by the number of Millions signatures most products contains.
MD5 (32 Bytes)
d4941f3843e8795c7e7417dcac49a152
SHA1 (40 Bytes)
cf4a820c10b79ef602438014c5c9d19b2cda118a
SHA256 (63 Bytes)
1883b9822db58ecea8e206c6bb1639e6557d9a243d8da9329a4f124d3c9dde8
SHA512(128 Bytes)
e82b473420fc1398f027bf0612f86e3335ca2b83e2946f3088d8118f408e5458bbcc2c43d68aca843c41db3cff9b9b870f409e1ac30c688b8901f6420f8a8a0d

Simply put, the "More" secure Sha512 for example will likely never be used to identify malware... It simply is too frigging big!
And it makes no sense at all to even consider it as the benefits over MD5 check-sums are nil.

For those curious the signatures listed above are from a serious malware. A very nasty Trojan...
You can check what it is by cutting and pasting into here: https://www.virustotal.com/en/#search

Or you can try and test your skills, and do a little reverse behavior analysis and see if you can even see the malware elements:
https://malwr.com/analysis/NjNlOTU0ZDhmNmZjNGEzOWIzMDJmYWU5MzE0OTRlOWE/

vojta · Aug 14, 2013

Great post, Hermescomputers.

It's true, md5 is an abomination when used to encrypt passwords, for example, but is fine for identifying single files. They are absolutely different things.

ELWIS1 · Aug 14, 2013

I agree with Hermes.

If we talk about the identification of malwares with the help of MD5 and the file size is it a fast and safe.

AV Companies use MD5 and SHA 256.
SHA 256 is slow.

I say yet, that for example today SHA 512 is safe, but tomorrow it may not be.

http://atodorov.org/blog/2013/02/05/performance-test-md5-sha1-sha256-sha512/

BoerenkoolMetWorst · Aug 14, 2013

Hermescomputers said:

but you can not change a file's signature check-sum without changing the file itself at bit level, thus changing it's check-sum (MD5 signature check-sum is a simple bit level calculation) meaning MD5 in this context is still safe...
Click to expand...

Thanks for your informative reply, however per Wikipedia, creating different files with the same MD5 checksum was already possible 9 years ago:

In 2004, more serious flaws were discovered in MD5, making further use of the algorithm for security purposes questionable—specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum.
Click to expand...

Of course creating a different file with same checksum that also does what you want it to do is something else entirely and probably practically impossible, but a lot of new techniques can be discovered in 9 years and available processing power to calculate collisions has improved exponentially, especially when you put GPU's to work.

Hermescomputers · Aug 15, 2013

BoerenkoolMetWorst said:

Of course creating a different file with same checksum that also does what you want it to do is something else entirely and probably practically impossible, but a lot of new techniques can be discovered in 9 years and available processing power to calculate collisions has improved exponentially, especially when you put GPU's to work.
Click to expand...

Well, if someone wants to impersonate malware, which is kinda strange idea to begin with, then let them go right ahead! They can create some type of MD5 impersonation (collision) and get detected as malware pronto! Suits me just fine...

The primary difference is perhaps (if even remotely possible) the use of impersonation against white-listing systems that use MD5, and that again as stated would be difficult if not impossible and highly impractical...

This would require the use of highly sophisticated polymorphic impersonation technology dynamically actuated in real time or a purpose built binary to impersonate a specific application on someone's platform, perhaps possible but highly unlikely anyone would go to this trouble as it's a lot easier to simply create a brand new binary never before identified as malware...

This is why polymorphic malware is so prevalent. Otherwise everyone would be infected and even white-listing would be ineffective...
(What is Polymorphic Code? https://en.wikipedia.org/wiki/Polymorphic_virus)

The alternative to signatures based identification is Heuristics technology and perhaps the only currently effective method used to identify variants in real time. (Heuristics: https://en.wikipedia.org/wiki/Antivirus#Heuristics) or something like the (Google Code Yara Project) which uses binary code patterns to sniff malware based on code chunks instead of heuristics behavioral scopes, and check-sums. This method is more like a form of real-time albeit limited reverse engineering. (Yara Project https://code.google.com/p/yara-project/)

A typical malware identification technology will combine most of the above stated methodology, I use my own custom system, (Primary Risk Analysis and Advanced Risk Analysis, and a few sophisticated behavior scopes), so MD5 signatures are certainly not the only ways we identify hostile elements...

Besides these many systems (internally) utilize other types of check-sum verification, some proprietary (In house) or some public like CRC32's (Cyclic Redundancy Code or it's variant "check") which can prove useful to reduce database size. But is insecure...

MD5 remains the prime choice until something better appears. It's also why many products are attempting to go away from check-sum identification since the data-set size is really beginning to be prohibitive...

For example my entire data-set (Around 6,000,000 Malware Signatures) is but an actual fraction of currently known malware. Even worst if taking into consideration malware designed to affect defunct operating systems like MS-DOS and Windows 3.1 / 95 / 98 / ME etc...

Among many other tricks of the trade this effectively forces developers to try, and selectively filter out of the data the most unlikely malware to appear, and attempt to predict the malware with the highest probability of currency for inclusion in an attempt to reduce check-sum database size...

This enormous bloating of malware databases can be directly attributed to the appearance of polymorphic capabilities in malware. A single Trojan, can now become 10,000 different Trojans in a matter of days if not less simply by dynamically modifying it's check-sum signatures in real time at each propagation. (Keep in mind that this also changes the file at bit level) but as polymorphic malware can change it's signatures dynamically it creates a nightmare for signatures based products...

Another reason, MD5 is used so extensively is standardization. It's now more or less the most compatible Whitelist/Malware signature sharing methodology as a technology. Everyone understand MD5 and almost everyone uses it....

CRC32 checksum
http://www.accuhash.com/what-is-crc32.html

BoerenkoolMetWorst · Aug 15, 2013

Thanks for your extensive reply I did indeed mean using hash collision to get the same hash as a whitelisted/trusted file, but like you said, even if it is possible, there are tons of other ways to infect someone which are way easier.

Hermescomputers · Aug 18, 2013

BoerenkoolMetWorst said:

Thanks for your extensive reply I did indeed mean using hash collision to get the same hash as a whitelisted/trusted file, but like you said, even if it is possible, there are tons of other ways to infect someone which are way easier.
Click to expand...

You are welcome!

However I have the same concern about white listing as discussed, this is why one must take it contextually during verification.

It is another reason why second guessing is engineered to be so easy in my tools as you can simply verify the white listed or any other executable under analysis via VT, or any number of other systems with a single click... This now also includes using other check-sums like Sha1 for example.

Discrepancies can be easily tagged. (Path, binary size, file date, etc...) and blacklisting a previously Local or Global white listed item is just one click!

For example, Globally white-listed item, are registered applications and will never be located in /Temp/ directory, if it does my Primary Risk Analysis technology will pick that up and raise the alarm...

Another example is my temporal Signatures tracking, which document the check-sum of each file under analysis each time it is analyzed. It also adds the full path, and the file revision. This data aggregates over time. Any modification to this data, can be used to identify impersonation even if it's discovered on a different pc. So signatures impersonation to be successful also require to be exact in all aspect or they will be detected. This data is easily accessible and visible to the technician/user...

Also the premise to my eclectic methodology is to fully engage the technicians, where full automation attempts to exclude them out of the analytical, and decision making process.

Also I focus on one file at a time. Full Spectrum. Which really brings it to into focus. Anyone fully utilizing the resources and methodology included in my Advanced Process Analysis and Identification System (A.P.A.I.S.) will easily flush out hostiles.

This engagement can lead to identify previously unidentified malware as well, as noticing discrepancies which often lead to a rapid yet correct infection diagnostic.

Noob · Aug 17, 2013

This turned into a very technical and informative thread.

So in less words, MD5 is still fine as one of the detection methods used by AVs but its not recommended for encryption.

erikloman · Aug 17, 2013

BoerenkoolMetWorst said:

Thanks for your extensive reply I did indeed mean using hash collision to get the same hash as a whitelisted/trusted file, but like you said, even if it is possible, there are tons of other ways to infect someone which are way easier.
Click to expand...

It's quite easy to create a file with the same MD5 hash (already in 2006):
http://www.mscs.dal.ca/~selinger/md5collision/

This was the main reason for us to go with SHA-256.

In terms of performance, using SHA-256 or MD5 is of no importance as when hashing files from disk; disk is waaaay much slower than the hashing algorithms (we hash concurrently; read block 1, hash block 1-while reading block 2, etc.). CPU usually sits idle, waiting for data.
Also, you can accelerate SHA-256 up to ~11,5 CPU cycles per byte using SSE4 and AVX instruction set:
http://www.intel.com/content/dam/ww...hite-papers/sha-256-implementations-paper.pdf

In the near future, Intel adds CPU instructions to further accelerate SHA-256 hashing:
http://software.intel.com/en-us/articles/intel-sha-extensions

In terms of storage, MD5 is indeed smaller. But as we store a lot more info on the millions of files we receive (classification, history, file info, behavior, statistics etc), storing MD5 or SHA-256 makes ~2% difference in total DB storage. But if you store the hashes plain in a file, it becomes twice the size when using SHA-256 instead of MD5.

vojta · Aug 17, 2013

As an example of what has been said, Microsoft "would in six months restrict the use of digital certificates with MD5 hashes issued under roots in the Microsoft root certificate program":

http://threatpost.com/microsoft-starts-countdown-on-eliminating-md5/101994

BoerenkoolMetWorst · Aug 18, 2013

Hi Erik, thanks for the reply and good to see your input here

vojta said:

As an example of what has been said, Microsoft "would in six months restrict the use of digital certificates with MD5 hashes issued under roots in the Microsoft root certificate program":

http://threatpost.com/microsoft-starts-countdown-on-eliminating-md5/101994
Click to expand...

Yeah, quite late from MS..
You can already download the patch by yourself btw:
https://www.wilderssecurity.com/showthread.php?t=351881

Hermescomputers · Aug 19, 2013

BoerenkoolMetWorst said:

You can already download the patch by yourself btw:
https://www.wilderssecurity.com/showthread.php?t=351881
Click to expand...

That MS August 13 post, rattled my Brain box!
Seems to me someone is accelerating the demise of MD5

Hermescomputers · Sep 20, 2013

Speaking of Collisions... Impersonation
Here is an example you can taste....

17/09/2013 14:43:36
C:\Windows\system32\kernel32.dll
Date : 01/08/2013
Size : 1114112
Version: 6.1.7601.18229
MD5 : 365A5034093AD9E04F433046C4CDF6AB
SHA1 : 7244AE695F8E5A730857781635ACB2969F15C594

and another here:

17/09/2013 14:48:13
C:\Windows\system32\KERNELBASE.dll
Date : 01/08/2013
Size : 274944
Version: 6.1.7601.18229
MD5 : 1B7343C3765638D4D17CB925F84F8ABE
SHA1 : B001F04386EBE09DDAC86297FA7B18AF37ABAFFF

This is how you test...

First checks the MD5 Here: https://www.virustotal.com/en/#search

Then check the SHA-1 same way but in another window...

Then compare all the signatures.... and Poof Impersonation discovered!

...It's a simple hack by highly funded and technically adept professionals...

They can spoof almost anything... but there is a catch!
They cant spoof the two in tandem!

They can spoof the MD5 or the Sha-1 but not both...

So get vigilant and do some comparison and you will identify all off their attempts... Some will even blow your mind!

Once you identify the impersonation, you then simply use the target to reverse engineer and acquire the code, method etc...

You can use this tool to find the impersonations Advanced Process Analysis and Identification System Technician's Edition

(It's what I use: https://hermes-computers.ca/apais_1.php)

Cheers!

Guy

noone_particular · Sep 20, 2013

Regarding collisions on MD5 and other hashes, this utility might be of interest. It adds this tab to the properties menu.

How many different hashes would you like?
http://www.febooti.com/downloads/

BoerenkoolMetWorst · Sep 22, 2013

noone_particular said:

Regarding collisions on MD5 and other hashes, this utility might be of interest. It adds this tab to the properties menu.
View attachment 239677
How many different hashes would you like?
http://www.febooti.com/downloads/
Click to expand...

Many?

Snoop3 · Sep 23, 2013

noone_particular said:

Regarding collisions on MD5 and other hashes, this utility might be of interest. It adds this tab to the properties menu.
View attachment 239677
How many different hashes would you like?
http://www.febooti.com/downloads/
Click to expand...

yeah, thats one of my first adds to a new install. there's also HashTab that is similar. i like those programs that add more info to the properties menu.

Hermescomputers · Sep 23, 2013

Great Input, BoerenkoolMetWorst and noone_particular!

However having the ability to view the signatures is only half the job...
You really need to cross validate with a third party to pickup discrepancies...

That is why you need to check with any service that allows you to view their checksums like Virus Total. I already explain the proper methodology in a previous post...

Here are a few URL to help with this:
https://www.virustotal.com/en/#search
http://virusscan.jotti.org/hashsearch.php

Unfortunately currently the only thing proving effective is a visual inspection to identify collisions, but most important impersonations.
And you need to independently check both MD5 and Sha1 with a third party to pick out the particulars.

Also you need to confirm the file data, i.e. file inception date, size and static directory location. Then you can correctly pick out exact discrepancies.... Also report them with V.T. it really helps when everyone else can also see this...

It's really important because even VT doesn't know if the files it has have collisions or are impersonated. Leave a note/comment and explain the difference you identified. It really will go a long way in addressing this issue and make it less attractive...

And maybe if enough of us do this, they will build some type of algorithm to flush these out and make it visible in their analysis report.

Log in or Sign up

Security software using MD5

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

vojta Registered Member

ELWIS1 Registered Member

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

Noob Registered Member

erikloman Developer

vojta Registered Member

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

Hermescomputers Registered Member

noone_particular Registered Member

BoerenkoolMetWorst Registered Member

Attached Files:

Untitled.png

Snoop3 Registered Member

Hermescomputers Registered Member

Log in or Sign up

Security software using MD5

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

vojta Registered Member

ELWIS1 Registered Member

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

Noob Registered Member

erikloman Developer

vojta Registered Member

BoerenkoolMetWorst Registered Member

Hermescomputers Registered Member

Hermescomputers Registered Member

noone_particular Registered Member

BoerenkoolMetWorst Registered Member

Attached Files:

Untitled.png

Snoop3 Registered Member

Hermescomputers Registered Member

Useful Searches