Reading SpamAssassin rules — and how to defuse the top 20
SpamAssassin's rule corpus is enormous, but a small handful of rules account for most of the points your campaigns lose.
ContentSpamAssassin
SpamAssassin has thousands of rules. In practice, twenty or so account for the bulk of the points legitimate senders lose. Knowing them by name turns "my email is spammy" into a checklist.
How to read a SpamAssassin output
A report fragment looks like:
2.5 HTML_IMAGE_ONLY_28 BODY: HTML: images with 1600-2400 bytes of words
0.8 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
1.2 MISSING_HEADERS Missing To: headerThree columns: score, rule name, description. Rules in the report panel of WillItInbox come straight from this.
The top 20 rules legitimate senders trip on
- MIME_HTML_ONLY — HTML part with no plain-text alternative. Add a text/plain part.
- HTML_IMAGE_ONLY_* — Mostly images, very little text. Add real copy.
- HTML_MESSAGE — Informational; contributes a little but unavoidable for HTML mail.
- HTML_FONT_LOW_CONTRAST — Text color too close to background. Often a hidden-text indicator.
- HTML_SHORT_LINK_IMG_* — Links wrapped around small images, often invisible.
- MISSING_HEADERS — Missing required header (To, From, Subject, Date).
- MISSING_DATE — No Date header at all.
- DATE_IN_FUTURE_* — Date header is in the future.
- DATE_IN_PAST_* — Date header is more than a few days old.
- MISSING_MID — No Message-ID header.
- NO_RECEIVED — No Received header. Strange in normal mail flow.
- NO_RELAYS — No Received headers at all.
- TVD_SPACE_RATIO_* — Excessive whitespace, often used to hide content.
- HK_RANDOM_FROM — Random-looking From local part.
- URIBL_* — A link in the body resolves to a domain on a blacklist.
- SUBJ_ALL_CAPS — All-caps subject line.
- TVD_SUBJ_FREE — Subject contains "free" prominently.
- HELO_DYNAMIC_IPADDR — HELO hostname looks like a residential dynamic IP.
- RDNS_NONE — No reverse DNS for the sender IP.
- SPF_FAIL / DKIM_INVALID — Authentication failures, scored heavily.
How to fix the most common ones
- MIME_HTML_ONLY: every modern mail library lets you add a text alternative. Even an auto-generated plain-text from your HTML is enough.
- HTML_IMAGE_ONLY_*: add at least 100 words of real copy. Aim for an image area smaller than 50% of total content area.
- MISSING_ / DATE_**: these are usually a misconfigured mail library. Check that your sender is setting them.
- URIBL_*: one of your links is on a blacklist. Even shared link shorteners can trigger this. Use your own domain for tracking.
- HELO_DYNAMIC_IPADDR / RDNS_NONE: see the rDNS post.
If you only do one thing
Get your SpamAssassin score below 2.0 on a clean message. Once you're there, normal content variation won't push you into spam territory.
Keep reading