Harvesting Protection

Protecting your website's email addresses from spammers

If your email address appears on a campus website, you are at high risk of having that address "harvested" and added to mailing lists used to send "spam" - unsolicited bulk email.

It is not only your personal email address which is at risk of being added to spammers' mailing lists. If your department or service publicizes general contact addresses on its website, such as webmaster@ yourhost .calpoly.edu, feedback@ yourhost .calpoly.edu, or servicename@ yourhost .calpoly.edu, these addresses may also become the target of spammers. Even addresses for your departmental or campus mailing lists can be harvested from the websites where they are publicized, potentially resulting in spam messages being sent to those lists.

Fortunately, there are a number of methods you can take for protecting your websites from email address harvesting by spammers. Five of these are discussed below:

  • returning addresses through JavaScript,

  • accepting email via contact forms,

  • obscuring addresses,

  • requiring authentication to view private pages, and

  • redirecting harvesting tools.

To use some of these methods, you simply need to be able to modify the pages on your website on which your email addresses appear. However, to use some others, you either will need to be a web server administrator or obtain that person's help.

You may wish to use these methods universally, so that every occurrence of email addresses on your website will be protected. For instance, if you have a web application which returns contact information for campus people, including their email addresses, you might consider modifying this application to protect these addresses using one or more of the methods described below.

And if you archive mail posted to a departmental or campus mailing list and make these archives available for browsing via the Web, any email addresses appearing within these archives might similarly be protected.

Finally, remember that spammers' harvesting tools can collect email addresses which appear anywhere within in the HTML markup of your web pages. Addresses appearing in the <head> section of your pages, inside "mailto" links, in "hidden" form fields, and even in HTML comments are all vulnerable to being harvested and would need to be protected.

Five methods for protecting email addresses on your web pages

1. Return email addresses through JavaScript

To use this method, you will place programming code written in the JavaScript scripting language onto your website's pages. This code, when run by your visitors' JavaScript-enabled web browsers, inserts email addresses "on the fly" onto specified locations on these pages as they are being prepared for display.

This is one of the most effective methods of protecting email addresses, because email harvesting tools used by spammers are widely reported to be incapable of executing JavaScript scripts - or any other types of scripts - on web pages. Essentially, harvesting tools are 'blind' to JavaScript-generated email addresses.

You can find a great many examples of JavaScript code for inserting email addresses available for downloading from the Web. (See Resources , below, for a few starting places.) A common approach used in these code examples is to piece together small chunks of an email address - which hopefully will be overlooked by spammers' harvesting tools - to generate full email addresses that are then inserted onto your web pages. Some programmers have gone further and obscured, scrambled, or encrypted these chunks of email addresses, which are then decoded and reassembled just prior to being inserted.

While you could place identical copies of your JavaScript code at every location on your website's pages where addresses are to be inserted, a preferable approach is to create a JavaScript "function" (i.e., a small script that accepts input and returns a value) that can be called from multiple locations on your website's pages, but whose code is stored only in a single place. This JavaScript function can either be stored in the <head> section of each of your website's pages, or better yet, just in a single file on your web server, which is again referenced from the <head> section of each of your pages. By using a JavaScript function, and storing it in as few locations as possible, you can greatly simplify the task of maintaining your code, if you should later need to fix bugs or make other changes to it.

Below is a very simple example of a JavaScript function that inserts a "mailto" link containing an email address. This function accepts the hostname and username portions of an email address and pieces these together, along with the "at sign" and the ".calpoly.edu" domain name. In the following example, this function's code is shown placed directly within the <head> section of your web pages, as follows:

<head>
   <title>Your Page Title</title>
   <script language="javascript" type="text/javascript">
     <!--
       function generate_address( username, hostname ) {
         var domain = ".calpoly.edu";
         var atsign = "&#64;";
         var addr = username + atsign + hostname + domain;
         document.write(
           "<" + "a" + " " + "href=" + "mail" + "to:" + addr + ">" +
           addr +
           "<\/a>");
       }
     //-->
   </script>
   <!-- Other head section tags go here ... -->
<head>

To make your code even more maintainable, you can instead move this entire JavaScript function into a separate file and then just reference that file from the <head> section of each of your web pages. This results in simpler code, as shown below:

<head>
   <title>Your Page Title</title>
   <script language="javascript" type="text/javascript"
     src="/scripts/generate_address_script.js">
   </script>
   <!-- Other head section tags go here ... -->
</head>

To make this work, you would then need to place a file named "generate_address_script.js", containing your "generate_address" JavaScript function, above, into the "scripts" directory at the top level of your web server directory.

After writing your JavaScript function, you will need to invoke ("call") this function in every place that you might ordinarily put an email address on your pages. The following code will call the "generate_address" function above to insert a "mailto" link containing the hostname and email address you specify within the parens:

In HTML:


<script language="javascript" type="text/javascript">
   <!--
     generate_address( "webmaster", "yourhost" );
   //-->
</script>
<noscript>
   <!-- An alternate method of sending mail to this address -->
   <!-- for non-JavaScript enabled browsers can go here -->
</noscript>

This will appear in a JavaScript-enabled web browser as:

webmaster@yourhost.calpoly.edu

Benefits . This method should work with nearly all browsers. To your site's visitors, your email addresses and "mailto" links simply work as expected.

Drawbacks . A few of your visitors may be using browsers in which JavaScript scripting is not implemented. These may include older browsers; text-based browsers; and browsers on handheld devices, such as mobile phones and PDAs. Other visitors may have turned off JavaScript in their browsers in an effort to avoid malicious scripts, ads appearing in "pop-up windows", and the like. In either case, these visitors will not see your email addresses. (To put this into perspective, a recent survey of more than 4,000 page accesses to a UCLink website found that more than 99.7 percent of these visits were made from JavaScript-enabled browsers.)

You can offer HTML contact forms (below) as an alternative for users of "JavaScript impaired" browsers. As a less desirable alternative, you can instead provide non-JavaScript versions of your addresses, protecting these by the obscuring techniques discussed below. In either case, you can place the HTML markup for your contact forms or the obscured versions of your email addresses within <noscript> tags immediately following each place where you invoke your JavaScript function on your website's pages. Alternately, you can choose to place this markup elsewhere on your pages, such as a separate contacts page linked from your other pages, which can help ensure compatibility with the largest number of browsers.

Both of these methods are discussed in further detail, below, and are subject to all of their respective benefits and drawbacks. For instance, certain techniques for obscuring addresses may still leave these addresses somewhat vulnerable to harvesting tools. These addresses may thus constitute a weak link in your protection, even if you're using stronger, JavaScript-based methods to protect these same addresses elsewhere on your pages.

Finally, a few examples of JavaScript code offered on the Web use such simple and obvious approaches that a very sophisticated harvesting tool with pattern matching (or "regular expressions") capabilities might still be able to extract the parts of your email addresses, from each place that you invoke your JavaScript code, and successfully piece these together.

Resources :

  • Hvelogic Email Address Encoder (http://hivelogic.com/enkoder). An example of a free, web-based tool that can write simple JavaScript code to insert an email address of your choosing.

2.Accept email via contact forms

Instead of publishing email addresses on your website, you can allow your visitors to contact you by submitting their requests or other information via HTML forms on one or more of your site's pages. The data submitted by your visitors can then automatically be mailed to you via a "forms handling" program or script.

This forms handling program or script can be run either on your own web server or on another server you have permission to use for this purpose. Many campus web servers already have such a program or script installed, so check first with your server's webmaster. If you should need to install one, there are a myriad of forms handling programs and scripts freely available for downloading from the Web. (See Resources , below, for a few starting places.)

Benefits . This method should work with almost all browsers. Unlike the JavaScript-based method above, the ability to submit email via HTML contact forms generally isn't dependent on your visitors' browser settings. (Some of your site's visitors may have their browsers' security settings configured to display a warning dialog before submitting data insecurely via HTML forms, however. And some browsers provide advanced options which, if enabled, could block users from submitting information via forms.)

Drawbacks . Among the drawbacks of this method are:

  • Making your site's visitors complete on-screen forms to send you mail is not as convenient as giving them simple "mailto" links they can click. The latter permit your visitors to use their familiar email programs to type, edit, and even spell check their messages prior to submission, as well as to keep copies of outgoing messages.

  • If users mistype their "from" email address in one of your form's fields, you may have no way to know who the message was from to correspond further with them.

  • Whenever there are any problems with the program or script that handles the data from your contact forms, your site's visitors won't be able to send you mail.

  • If your site's contact forms accept input for more than one email address, each form will need to include information that lets the forms handling program or script know which of these addresses your visitors' data should be mailed to. However, if these addresses are contained in the HTML form itself - even in hidden form fields - they are vulnerable to being harvested by a spammer's harvesting tool.

  • Finally, any programming code that runs behind a web server can potentially lead to security vulnerabilities. As just one example, some forms handling programs or scripts may use unsafe methods of invoking a mail program (such as the Unix "sendmail" program) running on your web server computer in order to send your visitors' data, which can allow intruders to execute arbitrary commands on that computer. For this reason, there are a variety of security considerations you - or the programmer of the forms handling program, if not yourself - must take into account when writing such programs or scripts, or when choosing which one to install on your server.

Resources :

  • Matt Wright's FormMail (http://www.scriptarchive.com/formmail.html). A free, flexible script for accepting data from a contact form and sending it via email. If you should decide to use this script, be sure to install version 1.92 or later, which resolves several major security issues, and to carefully read and follow the author's setup instructions.

  • The World Wide Consortium's World Wide Web Security FAQ (http://www.w3.org/Security/Faq/wwwsf4.html). Provides information about how to safely write programs or scripts to handle data from HTML forms.

3. Obscure email addresses

"Obscuring" addresses - by rewriting them in various ways - doesn't offer nearly the degree of protection against harvesting tools as the more effective methods discussed above. The alternatives of using JavaScript to insert addresses and offering HTML contact forms for sending email are both far more effective in protecting your addresses against spammers' harvesting tools. On the other hand, obscuring addresses is often much simpler, as this approach typically does not require any programming.

The following are four techniques that you can use to "obscure" email addresses on your web pages:

3.1. Use alternate methods of representing characters

By substituting alternate methods of representing characters in HTML (or specifically within URLs) for some or all of the text in your email addresses, simple-minded harvesting tools may not be able to spot email addresses on your web pages.

The following example randomly substitutes two alternate methods of representing characters in HTML and in URLs, respectively - "HTML decimal numeric character references" and "characters escaped in hexadecimal (%HH) notation" - in the address "webmaster@yourhost.calpoly.edu", while leaving a few characters in this address unmodified for further randomness:

In HTML:

<a href="mailto:%77e%62%6d%61&#115;t&#101;&
#114;%40y&#111;u&#114;%68%6Fs&#116;&#046;%62%65%
72%6b%65l&#101;%79%2ee%64%75">Webmaster</a>

This will appear in a web browser as:

Webmaster

While you could painstakingly obscure your email addresses in this way by hand, it is much easier to simply use a tool to do this. Several web-based and desktop tools are available for this purpose. (See Resources , below, for some suggestions.)

Benefits . This technique will foil simple harvesting tools, and should work with nearly all browsers. To your site's visitors, your email addresses and "mailto" links simply work as expected.

Drawbacks . While this technique effectively obscures addresses to human eyes, it provides little protection against the most powerful harvesting tools. These tools, which offer pattern matching capabilities, can readily find your encoded characters and decode them, effectively unmasking the obscured email addresses on your web pages. So while this technique is likely to fool some simpler harvesting tools, it won't be effective against all of them.

Resources :

  • Dean Peters's Mean Dean's Anti-spam Obfuscator (http://www.healyourchurchwebsite.com/obfuscator/). Another web-based tool that obscures email addresses. This tool randomly substitutes either numeric character references or hexadecimal notation, while leaving a few characters unencoded to try to further confound harvesting tools.
  • The Web Design Group's Entities page (http://www.htmlhelp.com/reference/html40/entities/). Information about using HTML numeric character references on your web pages.
  • The World Wide Web Consortium's implementation notes for the HTML 4.01 specification (http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1). Information about hexadecimal escaping of characters in URLs.

3.2. "Munge" email addresses

Address "munging" typically consists of substituting words for symbols in the domain name parts of email addresses - "at" (or variations thereof) for the "at sign" and "dot" or "period" for the periods, and the like - as well as adding whitespace between each part of the address. Sometimes extraneous text is also added to the address, which a human reader would ostensibly know to remove.

Using this technique, webmaster@yourhost.calpoly.edu could be munged as:

webmaster -at- yourhost dot calpoly dot edu

or perhaps:

webmaster -at- NOSPAM yourhost dot calpoly dot edu

Benefits . This technique is also likely to foil simple-minded harvesting tools.

Drawbacks . This technique places burdens on your site's visitors to understand how to "unmunge" your email addresses and to take the time and effort to edit or retype these addresses. This makes it less likely your site's visitors will use these addresses to correspond with you, and opens up the possibility of errors during unmunging. This technique also doesn't offer your site's visitors the convenience of clicking a "mailto" link on your web pages and have the correct address inserted into a new message in their email program.

Finally, sophisticated harvesting programs with pattern matching capabilities may still be able to retrieve a fairly high percentage of the addresses trivially obscured in this manner. If carefully written, common patterns used by these tools would also be capable of detecting common variants, such as cases where " --AT-- " was used in place of " -at- ", "period" in place of "dot", or "REMOVETHIS" in place of "NOSPAM".


3.3. Substitute a graphic image for all or part of the email address

Using this technique, webmaster@yourhost.calpoly.edu could be obscured by substituting a graphic for the "at sign" or for the entire address:

Benefits: Replacing an email address with a graphic image is almost certain to foil current email address harvesting tools.

Drawbacks: As in the case of address "munging" (above), this technique places burdens on your site's visitors, requiring them to manually type your email address or paste together portions of the address. This makes it less likely your site's visitors will use these addresses to correspond with you, and opens up the possibility of errors during typing or editing. This technique also doesn't offer your site's visitors the convenience of clicking a "mailto" link on your web pages and have the correct address inserted into a new message in their email program. Finally, if a graphic is substituted only for the "at sign", rather than the entire email address, a sophisticated harvesting program could conceivably be able to detect this.

3.4. Add legal but unusual elements to email addresses

According to the Internet specification for email addresses, Internet Message Format (RFC 2822, http://www.faqs.org/rfcs/rfc2822.html), quoted or parenthesized comments, plus signs, and even whitespace are permitted in certain specified places within email addresses.

Using this technique, webmaster@yourhost.calpoly.edu might be obscured in a "mailto" link as:

In HTML:

<a href="mailto:(Webmaster)%20webmaster+(spam%20
whammy)randomdigits12345@%20yourhost.calpoly(not%20.com).edu">
Webmaster</a>

This will appear in a web browser as:

Webmaster

Benefits . This technique, if used with care, should stymie many harvesting tools. It should also work with most browsers. To your site's visitors, your email addresses and "mailto" links will mostly work as expected.

Drawbacks . At least a few browsers, desktop email programs, and mail servers might not correctly handle these legal - but unusual - forms of email addresses.

4. Require authentication to view private pages

If you have "private" areas of your website, you can require visitors to "authenticate" themselves - typically by presenting a username and password - before they are permitted to view pages in these areas. This will also block spammers' harvesting tools from visiting these areas of your website.

Benefits . This method can globally block harvesting tools from seeing email addresses on the private areas of your website. As long as these areas of your site require authenticated access, you don't need to employ any other means of protecting email addresses within those areas.

Drawbacks . This method only works for private portions of websites, not for public areas. Additional work is required to set up and maintain these authentication methods. Careless use of these methods might inadvertently expose user passwords. (For this reason, consider using different passwords to provide access to your website than you use elsewhere for other, more secure purposes.) If the authentication method is removed or otherwise isn't working properly, portions of your site may be exposed to public access, including access by email address harvesting tools.

Resources :

  • Apache Week's Using User Authentication (http://www.apacheweek.com/features/userauth). Another tutorial on this topic which offers more detail.

5. Redirect harvesting tools

Both the Apache and Microsoft IIS web servers have plug-in modules available which can automatically redirect your visitors to different pages on your site or elsewhere. This redirection can occur based on a variety of attributes, including the characteristics of your visitor's browser or their Internet connection.

Because some email address harvesting tools don't bother to mask their identities, you can use this feature to redirect those tools to a "dead-end" page on your site or any other appropriate location, as well as blocking them outright.

Benefits . This method works globally, without requiring modifications to any of your pages.

Drawbacks . This method will only redirect harvesting tools which declare their true identities, rather than masquerading as other web browsers. As such, redirection should be considered to be a supplement to the other address protection methods discussed above, rather than as a sole means of protection.

Resources :

  • A Close to perfect .htaccess ban list (http://www.webmasterworld.com/forum13/687.htm). Provides a similar example of a mod_rewrite configuration which blocks named harvesting tools altogether.

In addition, for those using Microsoft's IIS web server, several third-party filters offer functionality similar to Apache's mod_rewrite:

  • Qwerksoft's IISRewrite (http://www.qwerksoft.com/products/iisrewrite/).
  • Helicon's ISAPI URL Rewrite (http://www.isapirewrite.com/).
  • Opcode's OpUrl (http://www.opcode.co.uk/components/rewrite.asp).

Limitations of these protective methods

Even if you employ one or more methods of protecting your websites from email address harvesting, this does not confer immunity from spam. For one thing, spammers may have already harvested your website's email addresses and added them to mailing lists, which then may be repeatedly sold or traded. Also, spammers have many other means of obtaining addresses, in addition to harvesting them from your websites. Some of their tactics are described in Uri Raz's How do spammers harvest email addresses? (http://www.private.org.il/harvest.html).

Nonetheless, by using these methods, you may be able to slow or even reduce the amount of spam messages sent to these addresses over time. Furthermore, any new addresses you add to your websites will - from the start - be better protected from being harvested.

Service Catalog

abstract illustration

See our Service Catalog

Get Help

Go to help & support

Improve Skills

Lynda Logo

Cal Poly faculty and staff can now receive free, unlimited access to the lynda.com tutorials. Learn more or log in.

More about lynda.com

Office 365

Microsoft Office 365 Education provides improved mail, calendar, file storage & sharing. Coming Fall 2014.

Learn More

Get Connected

 

Secure Mustang Wireless