Friday, July 18, 2008

Some tips and tricks to prevent Email Harvesting / Scraping

First a little background for those of you that don't know. The term Email Harvesting or Email Scraping comes from the trick some spammers use to get valid email addresses.

Overly simplified, harvesting works as such: Some automated program goes out to web pages and looks over the source HTML for text that matches email formats (name@domain.com or mailto:name@domain.com). This software then stores matches to a database for spamming later.

<rant>
Web developers pay attention, nothing erks me more then web sites that post my full email address for all to scrape. I don't think I'm the only one that feels this way.
</rant>

So how do you prevent your email from being harvested?
Here a couple of suggestions:

  • Be creative with how you present your email address so they don't match the standard format.
      • Examples include:
        • name at domain dot com
        • name shift+2 domain period com
        • String.Format("{0}@{1}.com", "name", "domain");
  • Use some type of masking javascript to obscure the email address

<script type="text/javascript">eval(unescape('d%6fc%75%6de%6e%74%2e%77%72%69%74e%28%27%3Ca%20%68%72ef%3D%22%26%23109%3Ba%26%23105%3B%6c%26%23116%3B%26%23111%3B%3A%26%23110%3B%26%2397%3B%26%23109%3B%26%23101%3B%26%2364%3B%26%23100%3B%26%23111%3B%26%23109%3B%26%2397%3B%26%23105%3B%26%23110%3B%26%2346%3B%26%2399%3B%26%23111%3B%26%23109%3B%22%3E%6ea%6de%3C%2fa%3E%27%29%3B'));</script>

<a href="&#109;a&#105;l&#116;&#111;:&#110;&#97;&#109;&#101;&#64;&#100;&#111;&#109;&#97;&#105;&#110;&#46;&#99;&#111;&#109;">name</a>

  • Use an anti-bot process
    • Example: (Google Groups)
      • Clicking on the ellipse (...) of the email takes you to a CAPTCHA form

groups
google-captcha

  • Use an email friendly URL shinker
    • Example: (tool used: http://is.gd)
      • Entered "mailto:name@domain.com" 
        and got http://is.gd/XBs 
        (try it - you should experience the standard behavior of a "mailto," usually launching an email client)

Of course this is all a cat and mouse game. The harder we make it to harvest our email address the smarter the automated software becomes to figure it out our obfuscation. We just got to stay a step ahead of that darn cat.

<disclaimer>
I offer my advice with not warrantee or guarantee. Feel free to run with anything, but use at your own risk.
</disclaimer>

No comments:

Post a Comment