6 Simple Ways to Protect Your Website
There are many things you have to do to protect online assets. These are important things I am referring to, like preventing dupe content and making sure you don’t succumb to a 302 mishap.
I was asked the other day if all this checking and securing was going to make any money or make the site appear higher in the serps. I said plainly and clearly ‘no it wont’. I then went onto to explain that the first part of SEO is about protecting your assets. I asked him to imagine making his house really nice, lots of new paint, funky furniture and ornaments that he and his wife loved. Now I said “would you go out leave the door open for a load of crack heads to come rounds have a big party and piss all over the floor?â€. Naturally he said no.
So what can you do to protect yourself from the crack heads?
OK we are trying to stop crack heads having a big party, so whilst we should be concerned about cat burglars getting in (proper hackers/script kiddies), we are only interested in general house keeping, as after all if a cat burglar wants into your gaff, they probably will succeed.
So, in no particular order.
1. Protect against non www and www version of websites using a 301 redirect of the non www to the www version.
So if someone requests the following website : http://threadwatch.org they get served a 301 and end up on : http://www.threadwatch.org.
This can be achieved by using the htaccess file and adding the following code:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^domain.co.uk
RewriteRule ^(.*)$ http://www.domain.co.uk/$1 [R=301,L]
Put that into your htaccess file and try it out. If you get a 500 error, you have a syntax error, which is cool as htaccess editing is quick, just edit it and fix the problem. As a final test, go to a server header checker and put in both versions and make sure one shows a 200 code and the other a 301.
2. Check for wildcard subdomains, by simply going to the website and adding crazy phrases instead of the http://ebaysucks.google.com/ and check that they don’t give a 200 response. I used this example as I know I would not happen. There are plenty out there that do this, and as a result show an entire copy of the site, which can a do get indexed! I am not going to say who they are, that aint playing fair, but they are out there.
3. Check for urls that don’t exist, but the 404 page gives 200 code. Yep it seems crazy but some people configure custom 404 error pages, to give a 200 code. A 200 code is basically saying that this page exists, so links to that page could produce duplicate content. Servers that give a 200 for everything are a bad bad idea, as it is very unlikely that a website will always have page. As a side note I would suggest you put something useful on your custom 404 error page, just make sure you serve the correct header. This way you give the search engine the right signal and you can help the user get to the part of the site they want. So offer a search box, a link home etc, and apologise, even if it is not really your fault!
4. If you have website logs, in a publicly accessible folder, password protect it NOW. It only takes one referrer for these logs to be spidered and indexed. So people may know your inner most log secrets and you could find you get log spammed to hell as some of these links can give link juice. All a bad person has to do it become top referrer to your website and they get a link back.
5. Make sure you protect against people looking for flaws with mod_rewrite queries. Quite often the query is made up of only part of the URL, For example the following url:
/dvds/horror/chainsaw-massacre.php
Often you will find that only the last elements grabbed from the url are actually put into the query:
/dvds/horror/chainsaw-massacre.php
This means pages can be served with duplicate content by someone frigging with the URL:
/dvds/i-like-beer/chainsaw-massacre.php
/especially-fosters/horror/chainsaw-massacre.php
So a user or a search engine will see the chainsaw massacre page, but under a different URL, so dupe content.
So to protect against this, build the query up using all the elements of the url which can be done in many ways. You could get the full URL and do a check that dvd and horror are present and if they are not throw a 404. The best way in this case would be to build up the query to include the categories with this pseudo code:
$film = ‘chainsaw massacre’;
$cat = ‘horror’;
$type = ‘dvds’;
SELECT * FROM FILMS
WHERE films_name = ‘chainsaw massacre’
AND films_category = ‘horror’
AND films_type = ‘DVD’
This way, only when all the cases have been met do you get the data returned.
6. Check for 302 redirects and remove them. 302 redirects. Google and Yahoo have had issues with this type of redirect. In some cases people have been able to point their site via a 302 redirect and hijack your site. God only knows why this has been allowed to occur, but occur it does. I saw one friend knacker their site for 3 months due to one badly placed 302 redirect.
Well that’s the first set of check checks, I will add things to look for next.
Don’t nausea up your hard work, get the basics right first!
Hi,
thank you for these simples 6 ways. They are quite useful and I’ve used them myself.
Best regards,
Filipe.
Admirable attempt, and I thank you for it.
However, if I knew what you were talking about I would have already implemented it, and if, as is the case here, I didn’t know what you were talking about then I cannot use the information.
For example — “you get log spammed to hell as some of these links can give link juice”
If I was familiar with these terms; “log spammed” and “link juice” then I think I would be on top of whatever problems they cause, which you do not specify.
Your discourse seems to be preaching to the converted.
Judge your audience. Who, exactly, are you writing for?
Thanks anyway and watch that link juice.
Mal
Hello Mal
Sorry you did not follow some or all of the post.
Log spamming is finding usage statistics packages (eg webalizer) and leaving them open to be viewed and indexed by a search engine.
http://www.google.co.uk/search?hl=en&client=firefox-a&channel=s&rls=org.mozilla%3Aen-GB%3Aofficial&hs=dut&q=inurl%3Ausage_200702+Webalizer+Version+2.01&btnG=Search&meta=
See how this shows lots of detailed usage stats for February. We look at that first one:
edtech.teacherhosting.com
Scroll down and look for “referrers”. This is the websites that have referred a visitor. Now what some people do is fake these referrers so that they get a free link. Look at the titles of the referrers, quite adult. So not only do they get a link from your site, they also make lots of requests to your server. In these cases all over 1200. So a lot of waste of resource. This can be prevented in the most part by adding the htaccess or other password.
Link juice, well that means that the search engines count it as a link. Effectively you are voting for these sites.
Next time I will try and make it clearer and use less personal jargon 🙂
>>Your discourse seems to be preaching to the converted.
OK, point taken. But in my defence a large proportion of people who are technically experienced miss these gaping holes. I was reviewing two major websites last week as part of client work and found nearly all these holes and some. These are BIG BIG names, well BIG 🙂 and that’s what prompted me to write the article.
>>Judge your audience. Who, exactly, are you writing for?
Noted, thanks
>>Thanks anyway and watch that link juice.
Ohh yeah, will do, the great link juice epidemic (some say pandemic) of 2007 🙂
Very minor problems but surly effects your website. keep all these things in mind and make your domain error free.
Thanks for sharing the information. Basic precautions are necessary to ensure one is protected. The www and not www redirect is probably the most common cause of duplicate content. With regards to redirect i think one should always only use 301 redirect as all others have their limitation.