Alan Doherty's Webserver Administration Tips
As well as the other stuff I do, I admin this very webserver {as a live test envirnment) and many other critical ones for clients.
The lessons learned from this, and more importantly the attempted abuses/attacks observed daily {reason for running a live-test environment}, have given me many oportunities for insight into the mindset/tactics of the attackers/abusers, and methods to counter/thwart and otherwise frustrate their attempst
Some of these tactics are apache specific, my webserver of choice, but most are generally applicable and usefull in any http server
So this section of the website I will use to share these with you the public, but as always, for a more comprehensive check and secure system please contact myself and arrange an audit to have it done proffessionally
General Truths of HTTP
Limit your attack surface
To make any system as secure as possible, load/run only the bare minimum modules apache needs to serve your site(s), as less features/code less potential undiscovered {as yet} bugs and even less potentially exploitable ones.
In IIS terms disable/uninstall all unused features
To further limit your attackable surface rid your server of DUST {different URLs with similar text}
This alone will shield you from discovery by 90% of scan attacks, save bandwith, improve search ranking, and avoid ranking sabotage
make use of name-based virtual hosts, ie. if your site(s) are supposed to be available to users via http/https://www.example.com/ and http/https://example.com/ {running on a server with an IP of 19x.120.12x.7 that also has the hostname of webserver.example.com and possibly others}
You would run a minimum of 3 virtual-hosts best practice is 4
scenario 1 normal minimal
- name-virtual-host (http OR https) www.example.com, serving your actual content, application or other site
- name-virtual-host(s) (http OR https whichever is NOT used above) www.example.com and (http and https) example.com, hosting a 301 {permanent} redirect to http(s)://www.example.com/
{side note: if possible running the http-server and redirect for example.com on a machine that is an MX for example.com also ensures old/badly-broken mailservers always can reach your users, and conversely pointing an A record for example.com at a pure-webserver will loose mail from old/badly-broken mailservers, {as RFC's mandate when no MX records mail should be delivered to the A record for example.com, but some try the A record first or when the MX's are unreachable}
- default, name-virtual-host, (http and https) * (all other names) 301 redirect to elsewhere http://unknown.invalid/ , this will be responding only to scanners and hackers, so you don't want to let them steal too much processing/bandwith, and on these you can be as unhelpfull in your error as you wish
scenario 2 unusual but used by the pedantic like me
verifying
To avoid scans ensure people cannot get to your server via http(s)://19x.120.12x.7/ and ensure this url's 404/301 page dosn't re-direct {even in a link} to your real sites name as that just ensures scanners find out how to bypass your security
There are many advantages to using name-based-virtual-servers properly apart from security, removing DUST also stops PR loss due to duplicate content {side note: this linked articles solutions are pointed at web-developers not server-admin's good server-admin work saves time hassle and processor power by doing this for them}
Related point it also stops the possiblity for ranking sabotage, when a competitor finds you havn't implemented content on a single name, so they maliciously link to {and sometimes even register } alternative names that point to your servers IP to ensure search engines see many many "copies" of your content on many pages thus diluting the "originality" of the content in any ranking
NB. the above PR related issues are not google specific they are universal to any/all search engines
Catch/Ban malicious bots/harvesters ASAP
Simple truth they are sucking your processor cycles/bandwith attempting to exploit your servers/content/users so why would you not want to defend yourself
Many systems for this are available for many arcitectures, I have written one in PHP for mounting on apache, once a malicious-bot is detected {in my case when they access a URL explicitly denied in the robots.txt for the site that is not visably linked to anywhere on the site}, they are "banned" from the server {all the urls are known vulnerabilities commonly probed for by bots that are aliases for my "ban-me" script}, additionally to foil address harvesters webmasters are encouraged to have the first and last link on each page "hidden" and pointed at these URL's {thus no users see or click on them, no search engines follow them either due to the robots.txt, but malicious bots do see and folow them and get self-blacklisted as a response}
{that said there is a longstanding issue wherby "Google Wireless Transcoder" reveals them to all its users, this bug has been repeatedly reported, silence is the answer so most of us say "let their users ban themselves" and maybe they will hear/listen to their complaints}
Ensure non-malicious bots are instructed about your defences
Google and every other search engine are your friends ;), so ensure your robots.txt tells them where not to go on your site, if site content is someone elses departmentensure the robots.txt is generated dynamically from server-wide and sites-own preferances {don't let a botched robots.txt re-write result in google getting unwittingly banned from your server}
Ensure malicious IP's are not permanently banned
Infected pc's IP's will be re-used by others, permanent bans do no-one any good, given long enough you will ban the entire internet, best to either:
- Ban for a period of time {ensures that when they probe you for 200+ known vulnerabilities they hit your traps on first or second and all 198+ others never even get through {thus if 1 of them might have effected the version of web-application/cms/etc you are using, they never find out as the ban ensures the probe never gets past the ban}}
- Offer {as part of the 403 error returned to banned IP's} a method of self-unbanning {with some anti-automiton protection} [this is part 2 of the google wireless transcoder fiasco, it dosn't display your 403's to its users, so you have to use php to generate the 403's and when serving them to GWT it needs to lie and send a status code 200 instead]
SEO and DUST related optimisations
php {and othere} consume more resources
how much more?
differs from script to script page to page but on average php == 10*shtml shtml == 10*html thus a single static html page can have 100 times the simultanious viewers, which means 500 rather than 5 on a poor server, or 500000 instead of 5000 on a more decent one either way no small difference
Use the best tool for the job but don't tie yourself down
design your sitemap and url structure to be type independant ie. use http://www.example.com/the-page/ not http://www.example.com/the-page.htm or http://www.example.com/the-page/index.htm
As the url http://www.example.com/the-page/ will work to access http://www.example.com/the-page/index.htm http://site.domain/the-page/index.php http://www.example.com/the-page/index.asp http://www.example.com/the-page/index.pl http://www.example.com/the-page/index.shtm or any other type on any other server if your needs/solutions/hosting change
To achieve this you just ensure all internal links are to the directories that your differnt index pages reside in, if old links point to http://your.site/index.htm then ensure all your internal links point to http://your.site/ or / and then re-name index.htm to index.html {for php index.php to index.php5} and then redirect with status 301 all attempts for /index.htm to http://your.site/ so all old-links work but users {and search engines} see the old url is also wrong and adjust
Pick the best names for your page uri's
Stick to lower case as no one wants to change case mid typing, regardless of the filesystem {some are case sensitive some are not} mixing case is fairly obnoxious and adds to the diffuculty of users relaying url's non-electronicly {via phone/mail/fax/sms}
Use meaningfull words and punctuation in urls The URL http://www.example.com/red-car/ is more useful to everyone than http://www.example.com/redcar/ I recommend that you use hyphens (-) instead of underscores (_) in your URLs {as _ is unsupported by some filesystems and few simpler users}
If any content is dynamic in nature avoid the possibility of "infinite spaces" by using robots.txt and/or rel=nofollow to stop robots continually hitting next on a calendar to get another empty page far into the future
simmilarilly avoid multiple url's to the same content by allowing multiple views via url parameters or session id's in the url{use cookies insead}, consider using robots.txt and/or rel=nofollow to exclude the refined-views urls and just let search engines see the full/expande/default view only, additionally offer users a [link] button or code that gives them the one 'true' url to the content, for bookmarking and linking
Obviously using static html where possible avoids most of these dynamic-content issues
Last updated Dec. 2008 Alan Doherty