Form anti-spam without a captcha: 6 filters that catch 99% of the junk
Honeypot, URL regex, phone length, non-Cyrillic ratio, rate-limit, log. Six filters without a captcha catch almost all form spam without user friction.
A captcha is a tax on conversion. By various measurements it costs 3-8% of real submissions, especially on mobile. From the start of the project I decided no captcha. Spam is killed by six simple PHP filters, each closing its class of attacks. Combined - about 60 lines of code, zero dependencies, zero requests to third-party services.
The short version
- 6 filters: honeypot, URL regex, phone length, non-Cyrillic ratio, rate-limit, log.
- I do not use a captcha - it costs 3-8% of conversions and ties the form to a third-party service.
- All filters run on the server (no JavaScript, no dependencies). Bots that visit without a JS engine cannot bypass.
- Spam drops 99% in my metrics, real leads come through without loss.
Filter 1: Honeypot
Add a hidden field to the form - usually with a name like website or url, the ones bots react to most actively. Hide via CSS (not type="hidden", because bots often ignore fields by type).
<input type="text" name="website" tabindex="-1" autocomplete="off"
style="position:absolute;left:-9999px;width:0;height:0;visibility:hidden">
Check first on the server:
if (!empty($_POST['website'])) {
log_spam('honeypot', $_POST);
exit; // Drop silently, no message
}
Important - drop silently. Do not respond ‘you are a bot’. Bots learn: if your server reacts to a caught honeypot with a redirect or error, they figure out the field is checked. Better to pretend the submission went through: return 200 OK and the normal ‘thank you’ page. The bot thinks it worked and moves on to the next target.
In my logs, the honeypot catches 60-80% of all attacks on forms. The simplest and most effective filter.
Filter 2: Regex for URLs in the comment
Most of the remaining spam is posts with links to gambling, replicas, escort. They put the URL directly in the comment or message field.
$message = $_POST['message'] ?? '';
if (preg_match('#https?://|www\.|\.com/|\.ru/|\.net/|\.shop/#i', $message)) {
log_spam('url_in_message', $_POST);
exit;
}
This blocks ‘normal’ spam with links. It happens that a legitimate user wants to mention a link - say, ‘here is our site example.com’. Solution - do not use the ‘message’ field as a catch-all. In my form there is a separate ‘site’ field (optional, passes through its own URL validator), and the message field has URLs disallowed by policy.
If your customers often mention sites - loosen the filter to explicit http:// and https://, not every dot with a domain. Then mentioning example.com passes, but a real link https://casino.xyz gets caught.
Filter 3: Phone length
This is common sense. A Russian phone has at least 10 digits (without country code) or 11 (with). Bots usually put random 4-7 digits, or even 16-20 (imitating international format with extra junk).
$phone = preg_replace('/\D/', '', $_POST['phone'] ?? '');
$len = strlen($phone);
if ($len < 10 || $len > 11) {
log_spam('phone_length', $_POST);
exit;
}
preg_replace('/\D/', '') strips everything that is not a digit - spaces, dashes, brackets. After that you count length. Russia and CIS are almost always 10-11 digits. International customers - expand to 15 (E.164 max).
Optional: check the first digit is 7 or 8 (for those with bots that smash the first digit):
if ($len === 11 && !in_array($phone[0], ['7', '8'])) {
log_spam('phone_country', $_POST);
exit;
}
Filter 4: Non-Cyrillic ratio in comments
The site is Russian-speaking, real customers write in Russian. English-spam bots are filtered by the ratio of Cyrillic to total characters.
$message = $_POST['message'] ?? '';
$len = mb_strlen($message);
if ($len > 5) {
// Count Cyrillic characters
preg_match_all('/[\p{Cyrillic}]/u', $message, $matches);
$cyr = count($matches[0]);
if ($cyr / $len < 0.3) {
log_spam('not_cyrillic', $_POST);
exit;
}
}
30% Cyrillic is a working threshold. A bilingual comment like ‘apartment cleaning, area 80 m², 2 bathrooms’ passes. Pure English spam gets cut. Pure Russian passes naturally.
For English-language sites you invert the filter - check the Latin ratio. For bilingual ones you can either disable or raise the bar to 80% - but then more spam gets through.
Filter 5: Rate-limit at 5 requests per hour per IP
One user does not submit ten forms per minute. If an IP makes more than 5 requests per hour - it is an attacker testing filters or pushing bulk spam.
Without a Redis cluster on shared hosting it is simplest to store in a file or MySQL. I keep it in MySQL:
CREATE TABLE rate_limit (
ip VARCHAR(45) NOT NULL,
ts INT UNSIGNED NOT NULL,
KEY idx_ip_ts (ip, ts)
);
Before processing the submission:
$ip = $_SERVER['REMOTE_ADDR'];
$hour_ago = time() - 3600;
$pdo->prepare("DELETE FROM rate_limit WHERE ts < ?")->execute([$hour_ago]);
$stmt = $pdo->prepare("SELECT COUNT(*) FROM rate_limit WHERE ip = ? AND ts > ?");
$stmt->execute([$ip, $hour_ago]);
$count = $stmt->fetchColumn();
if ($count >= 5) {
log_spam('rate_limit', $_POST);
exit;
}
$pdo->prepare("INSERT INTO rate_limit (ip, ts) VALUES (?, ?)")->execute([$ip, time()]);
Table cleanup - daily cron, so it does not grow indefinitely.
The catch - users behind NAT (corporate network, mobile operator). If 10 people in one office fill forms - the office IP triggers the rate-limit. 5 per hour usually leaves room: real users do not submit more than 1-2, plenty of buffer. If you worry - raise to 10-20 per hour. Just do not remove it entirely, or a single-IP mass attack will overload the inbox.
Filter 6: Log to a protected file
All rejected submissions are written to spam_log.txt. Not the general server log, not the DB (DB is more expensive on writes), but a simple text file:
function log_spam(string $reason, array $data): void {
$entry = date('c') . ' | ' . $reason . ' | IP ' . $_SERVER['REMOTE_ADDR'] . ' | ' . json_encode($data, JSON_UNESCAPED_UNICODE) . "\n";
file_put_contents(__DIR__ . '/../spam_log.txt', $entry, FILE_APPEND | LOCK_EX);
}
The log absolutely has to be protected at the web-server level - otherwise anyone can download it and see your spam patterns. For Apache, in the root .htaccess:
<Files "spam_log.txt">
Require all denied
</Files>
For Nginx - in the location config:
location = /spam_log.txt {
deny all;
}
What the log gives you. Once a week I open it and look. You see which filters trigger most often: if 90% are honeypot, the rest barely fire because bots do not get past them. You see attack patterns: a sudden flood from one IP range, a stream of Chinese comments with links to the same topic. From those patterns you can tune filters or temporarily ban a range in .htaccess.
After 30 days I rotate spam_log.txt → spam_log.txt.bak, a fresh empty file gets created. The old one I keep one period for analysis, then delete.
Order of filters
Important point - check in the right order, from cheap to expensive. So you do not run a rate-limit SQL query if the honeypot already flagged the submission as garbage.
// 1. Honeypot - cheapest
if (!empty($_POST['website'])) { log_spam('honeypot', $_POST); exit; }
// 2. URL regex - also cheap (in-memory)
if (preg_match('#https?://#i', $_POST['message'] ?? '')) { ... }
// 3. Phone length - cheap
if (strlen(preg_replace('/\D/', '', $_POST['phone'] ?? '')) < 10) { ... }
// 4. Non-Cyrillic - slightly more expensive due to UTF-8 regex
preg_match_all('/[\p{Cyrillic}]/u', $_POST['message'] ?? '', $m);
// ...
// 5. Rate-limit - most expensive, needs SQL
$pdo->prepare("SELECT COUNT(*) FROM rate_limit WHERE ...");
That keeps load minimal - most attacks get cut on the first two filters, before the server touches the DB.
What these filters do not close
Six filters kill automated spam. They do not close:
- Targeted attacks by a human. If someone sits and fills your form by hand with competing offers - filters let it through. But that is a rare and expensive attack. One or two cases a year - resolved in 5 minutes.
- Lead-generation spam via CRM. Sometimes a ‘lead generation’ contractor mass-registers a client’s contacts on sites of friends with fake data. Honeypot does not fire (human filling), but rate-limit catches it.
- DDoS on a POST endpoint. That is a server-level problem, not application-level. Solved by nginx limit_req or CDN. Beget CDN does this at the edge for free.
Together the filters cover 99% of automated traffic attacking a normal B2B services site. Enough for a business that is not the target of a targeted attack.
The full case on launching a custom site on shared with PHP 8.4 - in the article ‘50 days of SEO in B2B cleaning’. Form anti-spam is part of week-one work.
Related: CLS 0.377 → 0.002 in a day and OPcache on shared hosting - other fast fixes with big impact.