What is robots.txt file ? How and where to use it ?


What is robots.txt file ? 

A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. This is used mainly to avoid overloading your site with requests.
When a crawler visits a website, such as https://www.cloudkicks.com/, it first checks for the robots.txt file located in https://www.cloudkicks.com/robots.txt. If the file exists, it analyzes the contents to see what pages it can index.

How to allow and prevent crawling ?

You can allow all or specific user-agents to allow or prevent crawling, let's understand it with an example.

In above image we are having two conditions in the robots.txt file. Text with # are comments, so in Group 1 we are disallowing "Googlebot" to crawl "/nogooglebot/" folder.
In the Group 2 we are allowing all the agents to crawl all the pages & files on our website.

How to Create a Robots.txt file !

robots.txt file is just a text file so you can simply create a text file and follow the syntax to create a new file.
If you are in B2C commerce salesforce than you can create it via Business Manager in B2C commerce

Where to upload it ?

For general websites you can directly upload it to the root folder of your website.
For b2c commerce  you must upload the file to the cartridge/static/default directory in a custom storefront cartridge on your B2C Commerce server. Use your integrated development environment (IDE), such as CodeJS or Eclipse.

How to test it ?

Testing a robots.txt file is very easy, you just have to enter the root URL and add /robots.txt, If you get a blank page that means the file isn't there.

Key Points : 

1. The file must be named robots.txt

2. Your site can have only one robots.txt file.

3. A robots.txt file can apply to subdomains (for example, http://website.example.com/robots.txt) or on non-standard ports (for example, http://example.com:8181/robots.txt).

4. Comments are any content after a # mark.

5. It should be always on the root of your site. You can not have it in a sub directory !

6. It is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, you should use noindex directives, or password-protect your page.

Checkout complete video below

 If you have any question please leave a comment below.

If you would like to add something to this post please leave a comment below.
Share this blog with your friends if you find it helpful somehow !

Keep Coding 

Post a Comment