About a month ago this blog started to be hit by comment spam attacks, not at the scale that other more popular
sites may experience, but nevertheless a bother: those comments wasted bandwidth, database space and my own time.
For a while I manually deleted them with the embrionic admin interface that I'll never get around to properly finish,
or with straight database delete SQL calls, but leaving for a weekend was enough to find more
than 500 comments attached to a single post. I'm obviously not the only one to be hit by this problem, for example
two blogs I like have disabled comments for this very reason.
It was time to think about a solution that didn't involve an ongoing investment on my time, so I immediately discarded comment moderation; I didn't want to bother potential commenters with registration systems or CAPTCHAs either, so I decided to analyze the spam by looking at the server's logs. Here are the countermeasures I came up with:
It was time to think about a solution that didn't involve an ongoing investment on my time, so I immediately discarded comment moderation; I didn't want to bother potential commenters with registration systems or CAPTCHAs either, so I decided to analyze the spam by looking at the server's logs. Here are the countermeasures I came up with:
- IP blacklist: I created a table in the database that contains the IP addresses of the most egregious offenders and a servlet filter mapped to *.jsp and *.action, such that when a request is coming from one of the IP addresses in the table it gets immediately rejected with a 500 error code
- Stricter parameter validation: spammers hit directly the *.action URL by making a POST request from some spammer tools, without stopping by the comment form like legitimate humans, so the action now checks for a couple of parameters; when those parameters are absent, the response is another 500 error
- Akismet spam filtering system: the 2 methods above are pretty effective already against existing spammers, but new spammers may come up and existing ones could learn how to bypass the filters; the final barrier is Akismet by Matt Mullenweg, that now gets every comment posted to the site via the Java API by David A. Czarnecki, and responds by marking a comment as either spam or ham