What is MD5 and Why Is It Broken?
MD5 was once the gold standard for cryptographic hashing. Today it's considered cryptographically broken for security use — but it's still everywhere. Here's why, and what to use instead.
What is MD5?
MD5 (Message Digest Algorithm 5) is a cryptographic hash function designed by Ronald Rivest in 1991. It takes any input — a word, a file, a database — and produces a fixed 128-bit (32 hexadecimal character) output called a hash or digest.
For example, the MD5 hash of the string "hello" is always:
5d41402abc4b2a76b9719d911017c592
Change even a single character — say, "Hello" — and the output changes completely:
8b1a9953c4611296a827abf8c47804d7
This dramatic change from a tiny input difference is called the avalanche effect, and it's a core property of any good hash function.
How hashing works
Cryptographic hash functions have four essential properties:
- Deterministic — the same input always produces the same output.
- One-way — given the hash, you can't reverse-engineer the original input.
- Fast to compute — hashing should be quick for the legitimate user.
- Collision-resistant — it should be computationally infeasible to find two different inputs that produce the same hash.
MD5 satisfies the first three properties well. It's the fourth one — collision resistance — where it catastrophically fails.
Why MD5 is broken
In 2004, Chinese cryptographer Xiaoyun Wang and her colleagues demonstrated a practical collision attack against MD5. They showed that it was possible to find two different inputs that produce the identical MD5 hash — and do so on a standard PC in under an hour.
By 2008, researchers had used MD5 collisions to create a rogue certificate authority, fooling major browsers into trusting fake HTTPS certificates. This was no longer a theoretical vulnerability — it was a real-world attack vector.
Collision attacks explained
A collision occurs when two different inputs hash to the same output. Here's a simplified analogy: imagine a hash function as a filing cabinet with 100 drawers. If you have 101 items to file, at least two must share a drawer — this is the birthday paradox.
MD5 produces 128-bit hashes, meaning there are 2¹²⁸ possible outputs. Theoretically, you'd need to try around 2⁶⁴ inputs before finding a collision by chance. In 2004, Wang's team showed collisions could be found in just millions of operations — roughly 2²⁴ to 2³⁹ — breaking the security assumption by orders of magnitude.
⚠ Key point
An attacker who can find two files with the same MD5 hash can substitute one for the other — signing a legitimate contract with the same hash as a malicious one, for example.
Real-world impact
The most famous real-world exploit using MD5 collisions is the Flame malware (2012), a sophisticated piece of cyberespionage software believed to be state-sponsored. Flame exploited an MD5 collision vulnerability in Microsoft's terminal server certificate process to sign itself with a fraudulent Microsoft certificate, making it appear as a legitimate Windows update.
This attack compromised millions of Windows machines across the Middle East. It was a direct consequence of MD5's broken collision resistance — used in production infrastructure long after the vulnerability was publicly known.
When MD5 is still fine to use
Not every use of MD5 is dangerous. Collision resistance only matters when an attacker can choose or manipulate the inputs being hashed. In many contexts, that's not the threat model:
- File integrity checksums for distribution verification — if you're checking that a file wasn't corrupted in transit (not tampered with by an attacker), MD5 is acceptable.
- Non-cryptographic deduplication — identifying duplicate files in a database where no adversary is involved.
- Internal logging and correlation IDs — where uniqueness (not security) is the goal.
- Legacy system compatibility — where changing the algorithm isn't feasible and the risk is understood.
What you should never use MD5 for: password storage, digital signatures, certificate generation, or any context where an attacker could craft colliding inputs.
What to use instead
The good news: there are excellent, widely-supported alternatives.
| Algorithm | Output size | Security status | Best for |
|---|---|---|---|
SHA-256 | 256 bits | ✓ Secure | General hashing, digital signatures |
SHA-512 | 512 bits | ✓ Secure | High-security applications |
SHA-3 | 224–512 bits | ✓ Secure | Future-proofing, post-quantum |
bcrypt | 60 chars | ✓ Secure | Password storage (slow by design) |
Argon2 | Variable | ✓ Secure | Password storage (memory-hard) |
SHA-1 | 160 bits | ⚠ Deprecated | Legacy compatibility only |
MD5 | 128 bits | ✗ Broken | Non-security checksums only |
For most developers, SHA-256 is the right choice for general hashing. For password storage, you need a slow hashing algorithm like bcrypt or Argon2 — never a fast hash like SHA-256, which can be brute-forced at billions of guesses per second on a GPU.
Summary
MD5 is a 1991 hash function that was broken for security use in 2004 when practical collision attacks were demonstrated. It has been successfully exploited in the wild — most famously in the Flame malware. Today, MD5 should not be used for digital signatures, certificate generation, or password storage.
For security applications, use SHA-256 or SHA-512. For passwords specifically, use a dedicated slow hash: bcrypt or Argon2. MD5 remains acceptable only for non-adversarial integrity checks where collision attacks aren't a concern.
Want to see the difference in action? Try our Hash Generator to compute MD5, SHA-1, SHA-256, and SHA-512 hashes side by side.