Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, … Spam!

The Cafes seems to be off and running. There were a few initial glitches that I have now cleaned up. Today’s project is to make the staging server work enough like the production server that I can use it for testing and debugging without affecting the production server. Yesterday I got stymied by a slight difference in how the PHP engines were configured. (The staging server didn’t have libtidy support that the site relies on heavily.)

At least three people tried to post with fictional or nospam e-mail addresses. Sorry. That won’t work. Anonymous posters are not supported. You must supply a valid e-mail address at least once to post, and it will be verified. It’s sad, but the biggest issue that has been raised most consistently by users is an unwillingness to provide an e-mail address due to fear of spam and worm droppings. While I hate spam as much as the next person, I am loathe to break a useful feature like mailto links just to avoid spambots. It’s the wrong solution to the problem. I am a big fan of spam filters including realtime black hole lists. If you’re not using them, you should be. If your ISP isn’t using them, you should find a new ISP. But in the meantime, I do wonder if there might be a middle ground that confuses spambots, Microsoft worms, and other venomous spiders without putting any noticeable roadbloacks in the path of legitimate users.

Possibly you could use Paul Tyma’s Mailinator, though I’ve never once gotten that to actually deliver me a message. (OK, I have to update that: I just tried a test post, and it did indeed get through.) But I’d really rather you not. Discussions are more productive and interesting when the posters aren’t anonymous. If Mailinator or similar one time services are abused, I’ll seriously consider dumping comments from those addresses.

All messages sent from this site are addressed from me personally, elharo@metalab.unc.edu, at least for now. This means that if you use a challenge-response system and you’ve added me to your whitelist or if I’m in a good mood, you might be able to use that to prevent spam. (When I’m not in a good mood, I tend to delete challenges without responding. Alex, if you haven’t heard from me in a while, guess why. ) I think Bruce Eckel managed to get subscribed this way.

I’m considering adding an “I don’t know how to use a spam filter” checkbox to allow people to obscure their e-mail addresses but I’d prefer not to have to. I don’t like hobbling the user interface just because of a few hundred greedy, scum sucking leeches. I’ve already blocked out a lot of known spam bots in a variety of ways. I’m considering whether there are any other ways I can hinder spambots without hindering humans. I could use JavaScript to obscure the e-mail addresses, but I’d rather not depend on JavaScript. Any other thoughts?

I wonder what would happen if I used some advanced features of XHTML for the content? I haven’t seen anyone do this before. Let’s try it out. First, here’s a a mailto link that uses numeric character references:

NumericCharacterReference@mailinator.com

I made it a mailinator link so everyone can login and see if the mail gets through. My guess is this will block some spambots that use grep, but others are known to use HTML parsers, and those wouldn’t be fooled.

Here’s another idea: what if we used XHTML with an unusual prefix. Try this one out:

FunkyNamespacePrefix@mailinator.com

I suspect this is going to break a few browsers though. 🙁 Mozilla can handle it, but only if the document is served as application/xhtml+xml. Safari and Internet Explorer can’t pick it up, so that one’s probably a non-starter.

Anybody have any other suggestions? I suppose I should put a few of these up on a honeypot page, point them at a real address, and see what happens.

5 Responses to “Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, Spam, … Spam!”

  1. jasonrbriggs Says:

    encoded email

    What about encoding the email addresses with character entities and hexidecimal coding (e.g. href=”mailto:yu… etc” and href=”mailto:%6a%61%…”)? Not going to stop every spambot, but will stop some.

  2. Kevin Klinemeier Says:

    Why display email addresses?

    What does displaying email addresses to those without accounts gain us, exactly? I don’t expect unregistered users will regularly send legitimate email to my personal address.

    Add an image-recognition bit to the registration setup, display email addresses only to registered users, and I’d be quite satisfied.

    In addition, I found this comment to be a bit much:

    I’m considering adding an “I don’t know how to use a spam filter” checkbox to allow people to obscure their e-mail addresses but I’d prefer not to have to.

    I’m quite familiar with the care and feeding of a spam filter. I’m also quite aware that spam filters are imperfect, even the presumably fine one included with my gmail account. As a result, one of the ways I keep my email traffic manageable is to avoid getting on lists in the first place.

    It is too bad that we have to take these precautions because of “a few hundred scum sucking leeches”. It’s too bad that it isn’t safe for me to walk in some parts of my town late at night because of a few dozen scum sucking leeches of a different variety. Still, I know my society, and I know how to operate within it.

  3. elharo Says:

    Use of spam filters

    I’m not convinced about the comment:

    > I am a big fan of spam filters including realtime black hole lists. If you’re not using them, you should be. If your ISP isn’t using them, you should find a new ISP.

    I don’t want my ISP to be guessing what mail I do and don’t want. A couple of times I’ve had clients not receive mail from me because their ISP or enterprise firewall has blocked any message containing .exe files – in one case in a .zip. Also, twice in the last few months I’ve received spam which has helped track down an infected machine. (Ironically, the final steps of tracking down for the second case were actually done by the person whose machine was infected in the first).

    People who understand e-mail, etc, can usefully look at the spam which is coming through in order to, sometimes, take useful action and to learn what is going on – at least in part in order to be able to give well grounded advice to the less knowledgable on the subject. Hiding the problem won’t make it go away – it’ll just cause the spammers to flood the network with more combinations of message contents in the hope that a few will get passed the filters.

    I have my mail system set up to move all expected mail directly to appropriate folders. I just scan what’s left, pick out the few welcome messages, and get rid of the rest. In the 30 days to 2004-11-09 I received spam at the rate of just under 4.5 messages/hour. It takes a few minutes a day, at most, to scan and junk them – plus whatever time I spend on ones which catch my eye – e.g., obvious spam claiming to be from people known to me.

    Right, that’s my spam action time budget for today taken up 🙂

    Posted by edavies@nildram.co.uk on Wednesday, December 1st, 2004 at 9:05 AM

  4. elharo Says:

    I do think ISP-based spam filters should have the option of being disabled. on specific user requests. As you point out there are a few legitimate uses for letting the virsu droppings and other rubbish through. Some people want to do their own client side filtering, and need the spam to help generate Bayesian rules. Others want to research and possibly cure viruses, and they want to see what the viruses are sending. But this probably isn’t 0.1% of actual users. Spam, and virus messages in particular, really need to be stopped by default.

    Posted by Elliotte Rusty Harold (elharo@metalab.unc.edu) on Wednesday, December 1st, 2004 at 10:20 AM

  5. elharo Says:

    Re: encoded email

    I’ve now implemented some simple obscuring of e-mail addresses. As Jason said, this won’t stop everything, but it will stop some of it.