X

Block bad referrers with htaccess

If you do analyze your Apache server logs daily, you’ll notice some strange referrer info from time to time. For example: all caps, something not in the form of a URL (just some text or random alphanumeric characters), etc.

Such requests are not usually signs of hacking attempts, more like some bad bots, content scraping or blackhat SEO. Whatever the requests are, you want to keep them out.

Let’s see some examples from actual Apache logs. Referrer data is in the second block between double quotes.

  • “GET /cdn-cgi/l/www.google.com HTTP/1.1” “www.google.com” – the Google URL does not start with “http” or “https”, so it is a fake referrer and should be blocked.
  • “GET / HTTP/1.1” “toto” – now this is someone trying to be funny. Dude, you’re not allowed here.

Here’s how to block bad referrers with the .htaccess file.

First, whitelist all your custom error pages (if you have some) with REQUEST_URI. This is for preventing possible loops and 500 errors.

Second, whitelist a blank referrer and how all known good referrers begin. Use the [NC] (no case) flag for referrers starting with “http”, because a typical user does not know that an URL starts in lowercase (http://yoursite.com, not HTTP://yoursite.com or Http://yoursite.com). Such uppercase stuff usually creates a redirection request, so the referrer data can contain uppercase letters in protocol type.

Please do note that Apache has only one letter “R” in the middle of HTTP_REFERER header for RewriteCond.

Let’s see an example:

## Block suspicious referrer activity example by winhelp.info
# Add your custom error pages to prevent loops
RewriteCond %{REQUEST_URI} !badreferrer\.html$
# Blank referrer is permitted
RewriteCond %{HTTP_REFERER} !^-?$
# Exclude known good referrers
RewriteCond %{HTTP_REFERER} !^about
RewriteCond %{HTTP_REFERER} !^android-app
RewriteCond %{HTTP_REFERER} !^file
RewriteCond %{HTTP_REFERER} !^http
RewriteCond %{HTTP_REFERER} !^read
# Block the request
RewriteRule ^.*$ - [F,L]

These rules above basically say: if the requested URL is not this and referrer is not blank and referrer does not start with this and this, etc… then block the request with 403 error and stop processing other rules.

There are – of course – ways around the previous example, and the most common ones are IP-addresses (for example, http://11.22.33.44) and fake google.com referrers (such as https://www.google.com/blank.html)

Here’s how to make your referrer detection a tad more powerful:

## Block bad referrer strings example by winhelp.info
# Add your custom error pages to prevent loops
RewriteCond %{REQUEST_URI} !badreferrer\.html$
# Common fake URL-s
RewriteCond %{HTTP_REFERER} /blah/ [NC,OR]
RewriteCond %{HTTP_REFERER} /blank\.html [NC,OR]
# Prevent double .com in referrer string, such as google.comfacebook.com
RewriteCond %{HTTP_REFERER} \.com.*\.com [NC,OR]
# Prevent using just http:// and https:// without anything following as a referrer
RewriteCond %{HTTP_REFERER} ^http(s)?://$ [NC,OR]
# Block referrers with IP-addresses (often trying to look like your own server's public IP)
RewriteCond %{HTTP_REFERER} ^http(s)?://\d\d [NC,OR]
# Known fake google.com referrer
RewriteCond %{HTTP_REFERER} foobar$ [NC]
# Block the request
RewriteRule ^.*$ - [F,L]

You can also redirect such requests to a simple, informative HTML page instead of blocking them by replacing the last line with RewriteRule .* /badreferrer.html? [R,L] . The question mark in the end removes all queries from the URL – for example, hacking attempts often use queries to do code or SQL injections and other bad stuff.

Please remember to add the <meta content=”noindex, nofollow, noarchive” name=”robots”> line to the head section of your error page to prevent Google, Bing, and other search robots from indexing it. You might also want to add analytics code to the page if you use such services.