HashingCryptographyBeginner

What is MD5 and Why Is It Broken?

MD5 was once the gold standard for cryptographic hashing. Today it's considered cryptographically broken for security use — but it's still everywhere. Here's why, and what to use instead.

November 12, 20246 min read

What is MD5?

MD5 (Message Digest Algorithm 5) is a cryptographic hash function designed by Ronald Rivest in 1991. It takes any input — a word, a file, a database — and produces a fixed 128-bit (32 hexadecimal character) output called a hash or digest.

For example, the MD5 hash of the string "hello" is always:

5d41402abc4b2a76b9719d911017c592

Change even a single character — say, "Hello" — and the output changes completely:

8b1a9953c4611296a827abf8c47804d7

This dramatic change from a tiny input difference is called the avalanche effect, and it's a core property of any good hash function.

How hashing works

Cryptographic hash functions have four essential properties:

Deterministic — the same input always produces the same output.
One-way — given the hash, you can't reverse-engineer the original input.
Fast to compute — hashing should be quick for the legitimate user.
Collision-resistant — it should be computationally infeasible to find two different inputs that produce the same hash.

MD5 satisfies the first three properties well. It's the fourth one — collision resistance — where it catastrophically fails.

Why MD5 is broken

In 2004, Chinese cryptographer Xiaoyun Wang and her colleagues demonstrated a practical collision attack against MD5. They showed that it was possible to find two different inputs that produce the identical MD5 hash — and do so on a standard PC in under an hour.

By 2008, researchers had used MD5 collisions to create a rogue certificate authority, fooling major browsers into trusting fake HTTPS certificates. This was no longer a theoretical vulnerability — it was a real-world attack vector.

Collision attacks explained

A collision occurs when two different inputs hash to the same output. Here's a simplified analogy: imagine a hash function as a filing cabinet with 100 drawers. If you have 101 items to file, at least two must share a drawer — this is the birthday paradox.

MD5 produces 128-bit hashes, meaning there are 2¹²⁸ possible outputs. Theoretically, you'd need to try around 2⁶⁴ inputs before finding a collision by chance. In 2004, Wang's team showed collisions could be found in just millions of operations — roughly 2²⁴ to 2³⁹ — breaking the security assumption by orders of magnitude.

⚠ Key point

An attacker who can find two files with the same MD5 hash can substitute one for the other — signing a legitimate contract with the same hash as a malicious one, for example.

Real-world impact

The most famous real-world exploit using MD5 collisions is the Flame malware (2012), a sophisticated piece of cyberespionage software believed to be state-sponsored. Flame exploited an MD5 collision vulnerability in Microsoft's terminal server certificate process to sign itself with a fraudulent Microsoft certificate, making it appear as a legitimate Windows update.

This attack compromised millions of Windows machines across the Middle East. It was a direct consequence of MD5's broken collision resistance — used in production infrastructure long after the vulnerability was publicly known.

When MD5 is still fine to use

Not every use of MD5 is dangerous. Collision resistance only matters when an attacker can choose or manipulate the inputs being hashed. In many contexts, that's not the threat model:

File integrity checksums for distribution verification — if you're checking that a file wasn't corrupted in transit (not tampered with by an attacker), MD5 is acceptable.
Non-cryptographic deduplication — identifying duplicate files in a database where no adversary is involved.
Internal logging and correlation IDs — where uniqueness (not security) is the goal.
Legacy system compatibility — where changing the algorithm isn't feasible and the risk is understood.

What you should never use MD5 for: password storage, digital signatures, certificate generation, or any context where an attacker could craft colliding inputs.

What to use instead

The good news: there are excellent, widely-supported alternatives.

Algorithm	Output size	Security status	Best for
`SHA-256`	256 bits	✓ Secure	General hashing, digital signatures
`SHA-512`	512 bits	✓ Secure	High-security applications
`SHA-3`	224–512 bits	✓ Secure	Future-proofing, post-quantum
`bcrypt`	60 chars	✓ Secure	Password storage (slow by design)
`Argon2`	Variable	✓ Secure	Password storage (memory-hard)
`SHA-1`	160 bits	⚠ Deprecated	Legacy compatibility only
`MD5`	128 bits	✗ Broken	Non-security checksums only

For most developers, SHA-256 is the right choice for general hashing. For password storage, you need a slow hashing algorithm like bcrypt or Argon2 — never a fast hash like SHA-256, which can be brute-forced at billions of guesses per second on a GPU.

Summary

MD5 is a 1991 hash function that was broken for security use in 2004 when practical collision attacks were demonstrated. It has been successfully exploited in the wild — most famously in the Flame malware. Today, MD5 should not be used for digital signatures, certificate generation, or password storage.

For security applications, use SHA-256 or SHA-512. For passwords specifically, use a dedicated slow hash: bcrypt or Argon2. MD5 remains acceptable only for non-adversarial integrity checks where collision attacks aren't a concern.

Want to see the difference in action? Try our Hash Generator to compute MD5, SHA-1, SHA-256, and SHA-512 hashes side by side.