X

Stop fake and bad user agents with htaccess

This tutorial is a follow-up to the Create browser whitelist with htaccess guide: while you can filter which bots and browsers to allow through, you still need to create a blacklist of the ones that might sneak around the whitelist.

Fake user agents are commonly used for attacking, crawling, and scraping your site. Sure, a few of these are also results of someone trying to be funny or administrators who have no clue they’re breaking the standards by customizing UA strings the wrong way.

The following example rules are based on my servers’ access logs – you actually need to analyze logs daily if you want to be a good site owner/admin.

Examples of fake user agents

Alright, let’s get this started with a few case studies from my logs.

  1. Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.17 Safari/537.11 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko – double entries for Windows NT 6.1, trying to look like Chrome 23 and Internet Explorer 11 at the same time.
  2. Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US) – double Windows, and there is no such thing as Windows NT version 9.0.
  3. Mozilla/9.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/5340.50 (KHTML, like Gecko) Version/12.1 Safari/5340.50 – oh, is Mozilla version 9 out already? Not really, still at 5.0.
  4. Mozilla/5.0 (MSIE 10.0; Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586 – it’s trying to be both Internet Explorer 10 and Microsoft Edge on Windows 10… but you cannot run IE 10 on Windows 10. Please remember: Internet Explorer compatibility mode always identifies itself as MSIE 7.0.
  5. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x – double Windows; rv and Firefox versions as letters X. Seems legit… not! 😀

You can browse common user agent strings at http://www.useragentstring.com/pages/useragentstring.php.

.htaccess example for blocking fake user agents

First, you might want to create an informative page (for example, strangebrowser.html) and redirect visitors with suspect browsers there. After you’ve thoroughly tested your list, you can replace the redirection rule with a deny (error 403) rule. Please remember to add the <meta content=”noindex, nofollow, noarchive” name=”robots”> line to the head section of your information page to prevent Google, Bing, and other search robots from indexing it. You might also want to add analytics code to the page if you use such services.

This example list has a rule for each blacklisted item for better readability. After testing, you might want to merge this list into one or more longer rules and remove most comments.

Create a backup of your .htaccess file before making any modifications!

## Detect abnormal user-agents by winhelp.info
## Version 1.657, 2020-10-03
## Part 1 - basic rules
# Add your information page to prevent redirection loops
RewriteCond %{REQUEST_URI} !ads\.txt
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteCond %{REQUEST_URI} !rules\.abe
RewriteCond %{REQUEST_URI} !strangebrowser\.html
## Exclusions
# Avast
RewriteCond %{REMOTE_ADDR} !^77\.234\.46\.
RewriteCond %{REMOTE_ADDR} !^185\.51\.229\.
# Covenant Eyes parental monitoring
RewriteCond %{REMOTE_ADDR} !^69\.41\.14\.
# Google
RewriteCond %{REMOTE_ADDR} !^66\.249\.90\.
RewriteCond %{REMOTE_ADDR} !^72\.14\.199\.
# VirusTotal Cloud uses MSIE 9.0; Windows NT 9.0 user agent
RewriteCond %{HTTP_USER_AGENT} !virustotalcloud\)$
# Google-SearchByImage uses mismatching version numbers
RewriteCond %{HTTP_USER_AGENT} !\ Google-SearchByImage\)
## UA blacklist
# Known bad bots ignoring or not reading robots.txt
RewriteCond %{HTTP_USER_AGENT} "centurybot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "cognitiveseo\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "DnyzBot/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "evc-batch/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Facebot\ Twitterbot/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Gluten\ Free\ Crawler" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Gowikibot/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "IndeedBot\ " [OR]
RewriteCond %{HTTP_USER_AGENT} "linkdexbot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MS\ Search\ 6\.0\ Robot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "O/x\.d3v" [OR]
RewriteCond %{HTTP_USER_AGENT} "PaperLiBot/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "PowerMapper\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "raventools\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "RukiCrawler" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "SemrushBot" [OR]
RewriteCond %{HTTP_USER_AGENT} "SeoBotM6" [OR]
RewriteCond %{HTTP_USER_AGENT} "seocharger" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "SEOkicks-Robot" [OR]
RewriteCond %{HTTP_USER_AGENT} "SMTBot/" [OR]
RewriteCond %{HTTP_USER_AGENT} "spyonweb" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "XoviBot/" [OR]
RewriteCond %{HTTP_USER_AGENT} "ZoomBot" [OR]
# Known bad WordPress login page bot; Firefox does not use full version numbers anymore
RewriteCond %{HTTP_USER_AGENT} "rv:40\.0\)\ Gecko/20100101\ Firefox/40\.1$" [NC,OR]
# Known site analysis bots that try to reveal sensitive data about your server
RewriteCond %{HTTP_USER_AGENT} "Wappalyzer" [OR]
# Non-standard beginnings
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteCond %{HTTP_USER_AGENT} "^\(" [OR]
RewriteCond %{HTTP_USER_AGENT} ^\' [OR]
RewriteCond %{HTTP_USER_AGENT} "^\ " [OR]
RewriteCond %{HTTP_USER_AGENT} ^\" [OR]
RewriteCond %{HTTP_USER_AGENT} ^- [OR]
RewriteCond %{HTTP_USER_AGENT} ^= [OR]
RewriteCond %{HTTP_USER_AGENT} ^\.$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^\\$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^\d [OR]
RewriteCond %{HTTP_USER_AGENT} "^Chrome" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Empty" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Firefox" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^IE" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Internet\ Explorer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^MSIE" [NC,OR]
# Non-standard endings
RewriteCond %{HTTP_USER_AGENT} "Chrome$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Firefox$" [NC,OR]
# Way too short user agent strings
RewriteCond %{HTTP_USER_AGENT} "^Mozilla$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\d\.\d$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mozilla\ compatible$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\d\.\d\ \(compatible\)$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\d\.\d\ \(compatible;\)$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Opera$" [NC,OR]
# Repeating same stuff
RewriteCond %{HTTP_USER_AGENT} "compatible.*compatible" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Gecko.*Gecko" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla.*Mozilla" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE.*MSIE" [NC,OR]
# Missing space after closing parenthesis
RewriteCond %{HTTP_USER_AGENT} "\)([a-z|A-Z])" [NC,OR]
# Fake mixtures of browsers
RewriteCond %{HTTP_USER_AGENT} "Firefox.*Netscape" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Firefox.*Opera" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE.*Chrome/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE.*Firefox" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE.*Edge/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE.*rv:" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Opera.*Trident/" [NC,OR]
# Letters instead of version numbers
RewriteCond %{HTTP_USER_AGENT} "Chrome/([a-z]|[A-Z])\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Firefox/([a-z]|[A-Z])\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Gecko/([a-z]|[A-Z])\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/([a-z]|[A-Z]) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ ([a-z]|[A-Z])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Opera/([a-z]|[A-Z])\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "\ rv:([a-z]|[A-Z])\." [NC,OR]
# Impossible Mozilla versions
RewriteCond %{HTTP_USER_AGENT} "Mozilla/([0-3]|[6-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|6[0-9]|7[0-9]|8[0-9]|9[0-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla/\d\.([1-9])" [NC,OR]
# Impossible MSIE versions
RewriteCond %{HTTP_USER_AGENT} "MSIE\ \d\.([1-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ \d\d\.([1-9])" [NC,OR]
# Impossible MSIE versions on certain Windows versions
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 5\..*\ NT\ 6\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 5\..*\ NT\ 10" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 6\..*\ NT\ 6\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 6\..*\ NT\ 10" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 8\..*\ NT\ 6\.2" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 8\..*\ NT\ 6\.3" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 8\..*\ NT\ 10" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 9\..*\ NT\ 6\.2" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 9\..*\ NT\ 6\.3" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 9\..*\ NT\ 10" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 10\..*\ NT\ 6\.3" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 10\..*\ NT\ 10" [NC,OR]
# Impossible MSIE and Trident combinations
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 8\..*\ Trident/([0-3]|[5-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 9\..*\ Trident/([0-4]|[6-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ 10\..*\ Trident/([0-5]|[7-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ".*\ Trident/([0-6]|[8-9]);\ rv:11\.0" [NC,OR]
# Other impossible browser or engine version numbers
RewriteCond %{HTTP_USER_AGENT} "Chrome/99" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Chrome/\d\d\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Firefox/99" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Firefox/\d\d\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\ \d\d\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Opera/9\.99" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Opera/\d\d\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Presto/9" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Trident/\d\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Trident/([0-3]|[8-9])" [NC,OR]
# Mismatching Firefox and rv versions
RewriteCond %{HTTP_USER_AGENT} ".*rv:1\..*Firefox/(0|[2-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ".*rv:2\..*Firefox/([0-1]|[2-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ".*rv:3\..*Firefox/([0-2]|[4-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ".*rv:4\..*Firefox/([0-3]|[5-9])" [NC,OR]
# Fake user agents used while testing programs and apps
RewriteCond %{HTTP_USER_AGENT} "BUILDDATE" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^MyApp$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Synapse\)" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Test\ Certificate\ Info" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Unknown" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "WinHTTP" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "WinHttpRequest" [NC,OR]
# Other standards-breaking fake user agents
RewriteCond %{HTTP_USER_AGENT} "\ \(Chrome\)" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "\(Win/$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "compatible\ ;" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "CAIMEO\ Artificial\ Intelligence" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Gecko/\ " [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Gecko/20([2-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "-IE\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "\ IE\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla\ ([0-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla/\d\.\d\(" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Mozilla/\d\.\d\ \(\ " [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MSIE\d" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^QuickTime/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "SuperCleaner" [OR]
RewriteCond %{HTTP_USER_AGENT} "User\ Agent" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows/" [NC,OR]
# Fake Windows strings and versions
RewriteCond %{HTTP_USER_AGENT} "Windows/" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ ([0-2]|[4-8])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\)" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ 0" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ 1\." [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ ([2-3]|[7-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ 5\.([3-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ 6\.([4-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ 10\.([1-9])" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Windows\ NT\ (1[1-9]|4[0-9]|5[0-9]|6[0-9])" [NC,OR]
# Suspicious user agents
RewriteCond %{HTTP_USER_AGENT} "TO-Browser/TOB" [NC]
# Redirect during the testing period
RewriteRule .* /strangebrowser.html? [R=307,L]
# Comment out the line above and remove the comment mark below to block fake browsers after the testing period
# RewriteRule ^.*$ - [F,L]

## Part 2 - additional bad browser rules with conditions
# 1. Double Trident is used by MSNBot
# Add your informative page to prevent redirection loops
RewriteCond %{REQUEST_URI} !ads\.txt
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteCond %{REQUEST_URI} !rules\.abe
RewriteCond %{REQUEST_URI} !strangebrowser\.html
RewriteCond %{REMOTE_ADDR} !^131\.253\.25\.
RewriteCond %{HTTP_USER_AGENT} "Trident/.*Trident" [NC]
# Redirect during the testing period
RewriteRule .* /strangebrowser.html? [R=307,L]
# Comment out the line above and remove the comment mark below to block fake browsers after the testing period
# RewriteRule ^.*$ - [F,L]

# 2. Firefox full version numbers are used by WaterFox
# Add your informative page to prevent redirection loops
RewriteCond %{REQUEST_URI} !ads\.txt
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteCond %{REQUEST_URI} !rules\.abe
RewriteCond %{REQUEST_URI} !strangebrowser\.html
RewriteCond %{HTTP_USER_AGENT} !Waterfox/\d\d\.
RewriteCond %{HTTP_USER_AGENT} !Waterfox\)
RewriteCond %{HTTP_USER_AGENT} ".*Firefox/\d\d\.\d\."
# Redirect during the testing period
RewriteRule .* /strangebrowser.html? [R=307,L]
# Comment out the line above and remove the comment mark below to block fake browsers after the testing period
# RewriteRule ^.*$ - [F,L]

# 3. MSIE user string must also have a Trident version
# Add your informative page to prevent redirection loops
RewriteCond %{REQUEST_URI} !ads\.txt
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteCond %{REQUEST_URI} !rules\.abe
RewriteCond %{REQUEST_URI} !strangebrowser\.html
## Exclusions
# VirusTotal Cloud uses MSIE 9.0; Windows NT 9.0 user agent
RewriteCond %{HTTP_USER_AGENT} !virustotalcloud\)$
RewriteCond %{HTTP_USER_AGENT} !Trident/
RewriteCond %{HTTP_USER_AGENT} MSIE\ .*Windows
# Redirect during the testing period
RewriteRule .* /strangebrowser.html? [R=307,L]
# Comment out the line above and remove the comment mark below to block fake browsers after the testing period
# RewriteRule ^.*$ - [F,L]

# 4. PaleMoon has higher Gecko version numbers than other browsers
# Add your informative page to prevent redirection loops
RewriteCond %{REQUEST_URI} !ads\.txt
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteCond %{REQUEST_URI} !rules\.abe
RewriteCond %{REQUEST_URI} !strangebrowser\.html
RewriteCond %{HTTP_USER_AGENT} !PaleMoon/
RewriteCond %{HTTP_USER_AGENT} "Gecko/201([1-9])"
# Redirect during the testing period
RewriteRule .* /strangebrowser.html? [R=307,L]
# Comment out the line above and remove the comment mark below to block fake browsers after the testing period
# RewriteRule ^.*$ - [F,L]

Fake browser blacklist rules should be right after the whitelist rules in the <IfModule mod_rewrite.c> section of your .htaccess file.

Keep scanning access logs daily to find out fake UA-s you might have missed.