Welcome to the SEO tutorials, and today we will see how to a create robots txt file. We will try to cover all the important aspects of robots.txt file. We will move step by step to let our reader’s to understand the ‘ How to’ create robots txt.

Before creating a robots text you should at least know:

  • What is robots text?
  • Check whether you have a robots.txt file or not?

Because, if you are not aware of robots text, and you want it, just because it is good to have robots text. Then it will create problem for you, because improper knowledge of the robots.txt file will hurt you in many ways. For example, if you will disallow Googlebot, because you saw others have done the same will affect your traffic upfront.

Better read it, before moving on to ‘ how to create robots txt’:

how to create robots txt

Tools to create robots txt

Webmasters are lucky here, they need not to lose their pocket. You can create robots txt file by using a simple text editor and just save it as robots.txt. And you are done, so simple it is, never forget to add rules you wish to be followed by robots (spiders).

Once you are done with setting a rule, just upload the robots.txt into the root directory of your website. Now onward, whenever spider will crawl, they will first look for robots.txt. The spiders will crawl as per the rules set by you.

Create robots txt – Learn the basics

Let’s first see what are the directives, you can use while creating a rule.

  • User-agent – It will let search engine spiders to see the rules where ever applicable.
  • Disallow – Tell search engines that, they should not crawl the information mentioned here.
  • Allow – Search engines have a pass to crawl the information mentioned here.
  • Host – some spiders support host directive to specify the preferred mirror.
  • Crawl-delay – You can set crawl priorities.
  • Sitemap – It will tell the location of your sitemap.

The user-agent and disallow are the two main standard directives and rest are non-standard directives. We have divided directives into two (standard and non-standard directives) parts, now we will try to understand these parts with easy to understand example.



Create robots txt – Standard Examples

Now we will try to understand the use of standard directives first,

All Welcome – Example

If you want to allow all search engine crawl your website, then you can use below lines.

User-agent: *

Disallow:

Above line will not stop any search engines from crawling your site, cool thing no? No, why?

Because, if you don’t have a robots.txt, it will still work the same way like above code.

Not Welcome – Example

Let’s suppose your website is not yet complete or it is under updating, then you can use below lines in your robots.txt to stop the search engines from crawling your website.

User-agent: *

Disallow: /

Not allowed to specific URL – Example

In case you want search engines not to crawl a particular page or directory, then you can use below lines:

User-agent: *

Disallow: /about/

We assumed that www.xys.com is having about page, and they don’t want to show it to the search engines. For that we just used relative path, and stopped searching engines from crawling.

Specific Search engine – Example

If you want to stop crawling from a particular search engine to your website or a part of it, then you can use, below lines.

User-agent: Bingbot

Disallow: /about/

Now the above line will not allow bing to crawl the specified area. Just remember the name of the search engine’s user agent, you want to disallow from crawling.

Create robots txt – Non-Standard Examples

Now we will try to understand the use of non-standard with examples, “ Non standard directives are not supported by the all spiders”. But yes, major search engines support these directives.

All Welcome – Example

We used disallow to do that, let’s see how doing the same thing from allow directive.

User-agent: *

Allow: /

What we did is allowed all spiders to crawl a website, but again, it is good for nothing. Because if you don’t have a robots.txt, spiders will by default crawl your website.

Let’s see how we can use allow directive with disallow directive.

User-agent: *

Disallow: /plugins

Allow: /plugins/search.php

You can clearly see that we have disallowed the plugins directory, but allowed the search option for spiders. You can specify same thing for a particular spider, and can have multiple directives for the same spider.

Sitemap directive – Example

If you will include sitemap in your robots.txt, then it will be easy for search engines to locate your sitemaps quite easily.

Sitemap: http://xyz.com/sitemap.xml

We have just defined sitemap in the xyz.com, now search engines can easily find it, when they will crawl xyz.com.

Note: You can forget to use user-agent, if you are defining sitemap, not at all mandatory.

You can use the sitemap directive to define multiple Sitemaps.

Stop crawling wp directories – Example

You can create robots txt to stop spiders to crawl your wp-content, wp-includes, and wp-admin directories by putting these lines:

User-agent: *

Disallow: /wp-admin/

Disallow: /wp-content/

Disallow: /wp-includes/

The above code will upfront, tell search engines that you are not having permission to crawl the mentioned directories.

Points to remember

  •  The crawl delay command is not directly supported by the google, you can use crawl delay inside the Google webmaster tools.
  • You can use the asterisk(*) with one of the main directive (user-agent) to refer all.
  • The robots.txt file uses relative paths(relative to the directory of the application) and not the absolute paths( full URL).
  • Paths are Case sensitive, so better use with caution.
  • Want to block specific user agents? Do it at the start of robots.txt file.
  • Avoid using allow directives to allow spiders to crawl your website, they will do it by default. You can use allow directive for sub-directories or pages, if its directory is hidden or not allowed.
  • You can use the sitemap directive to define multiple Sitemaps.

Create robots txt – For your website

Now we will see, what we can consider to include in creating a robots.txt, If you want to create robots txt like this, then you are free to use.

User-agent: Mediapartners-Google*

Disallow:

User-agent:  *

Disallow: /cgi-bin/

Disallow: /wp-admin/

Disallow: /wp-content/

Disallow: /wp-includes/

Disallow: /comments/

Disallow: /trackback/

Disallow: /xmlrpc.php

Disallow: /wp-content/plugins/

Disallow: /feed/

Allow:  /wp-content/uploads/

Sitemap: http://www.yoursite.com/sitemap.xml

Note: it’s up to you whether you want to add more code or remove the code as per your need.

Conclusion

We have covered how to create robots txt with the examples, hopefully you must have cleared your doubt. Before applying it, just look at the robots.txt information, and whether you already have a robots.txt or not.


What next? Next we will see BlackHat SEO


If you liked our content on how to create robots txt, then does share, like, tweet for us. You can bookmark as well for your future references.