So you have created a web site or a web page and you don’t want anyone else to be able to access it right? That’s a bit of a problem once Google, Yahoo, MSN, or some other search engine indexes it!
Once a web page or website is indexed, it can be found by anyone on the planet with an Internet connection. If you want to hide a page or website from search engines, you can do it in several ways.
I’ll try to walk you through the easier method first because it requires less technical knowledge. Basically, you can add a line of code to your HTML page or you can setup your web server to protect a file or directory.
Luckily, just about all search engines follow a web robots standard while crawling websites called Robots Exclusion Protocol. As a website owner, you can use the robots.txt file to give instructions to a search engine on what to index and what not to index.
So how does this work? It’s actually super simple! First, you create a text file called robots.txt using Notepad or any text editor. Now let’s say you want to block your entire website from being indexed by the search engines, so you would add these lines to your text file:
User-agent: * Disallow: /
The User-agent refers to the robot that is crawling your website, i.e. Google, Yahoo, etc. * means all robots. Note that a robot, such as a spam robot, can ignore your file altogether if it feels like.
Only use a robots.txt file to block content from being indexed by major search engines, not for hiding information. If someone comes to your website, a robots.txt file will not prevent them from accessing that webpage and viewing it. So just make sure you understand what the file does, it prevents your site from showing up in Google search results pages (Yahoo and MSN also).
You can also block directories or individual pages on your site using a robots.txt file instead of blocking the entire website. To block a directory, you could add the following lines:
Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~secret/
Note that you only need to add the user-agent line once, unless you want each robot to get a different set of instructions. If you want to block a page, you could use this:
Also, check out the Help section at Google to learn more on how to create a robots.txt file. Once you have finished writing up the file, you just need to upload it to the root of your website so that it can be accessed as follows:
The next time the robot visits your site, it will read the information and follow the instructions. If this seems too complicated, you can also block access to your website or webpage using META tags.
The noindex meta standard is also followed by all of the major search engines. To use it, you have to add a line of code to the HEAD section on the webpage. To prevent all robots from indexing a page on your site, add this line to the HEAD section:
<meta name="robots" content="noindex">
When Google or any other search engine sees that line on the page, it will automatically drop the page from the search results, even if other pages link to it.
So those are the two ways you can hide a page from Google and other search engines. If you are not able to get this to work, post a comment and I will try to help you out.
Also, check out my previous post if you are looking for a way to remove your name from search engines like Google, etc that are on other peoples websites. Enjoy!
- Common Search Engine Optimization Mistakes
- How to remove a web page from Google index and other search engines
- SEO’s please help me!? Should I try this crazy shit with my blog!?
- How to remove your name from search engines
- Customize Google home page and search results using CustomizeGoogle
- The Best Way to move from Blogger Beta to WordPress
- A big problem if you’re using expandable posts with short summaries on your blog
Copyright © 2007
Online Tech Tips.
Aseem Kishore (digitalfingerprint: a59a56dce36427d83e23b501579944fcakmk1980 (22.214.171.124) )