Other Terms
Organic Search
Table Of Contents
What is Robots.txt?
Robots.txt is a text file on a website that tells search engine robots how to crawl and index pages. It controls whether the page is ready to be indexed for natural search or not.
Robots.txt in Details
The robots.txt file, also known as the robots exclusion protocol or standard, is a simple text file that follows a specific syntax.
Each rule in a robots.txt file specifies a user agent (the crawler or robot) and one or more directories or files on the website to be included or excluded from crawling.
Here’s how it works in 2 simple phases:
Each rule in a robots.txt file specifies a user agent (the crawler or robot) and one or more directories or files on the website to be included or excluded from crawling.
Here’s how it works in 2 simple phases:
- When a crawler visits a website, it starts by checking the robots.txt file;
- If it finds any instructions, it follows them.
robots.txt can be incredibly useful if you want to prevent crawlers from indexing large files or sections of your site that are not relevant to search engine results.
How to Use Robots.txt
To use robots.txt effectively, you will need to:
- Create your robots.txt file. If it’s not in the root directory of your website already, you can just create a plain text file named robots.txt and put it there;
- Specify allow and disallow directives. For example, disallow: /private/ tells crawlers not to index anything in the /private/ directory;
- Be careful with wildcards. You can use asterisks (*) as wildcards to match sequences and dollar signs ($) to indicate the end of a URL.
If you’re not sure whether your robots.txt is set up right, test it in Google Search Console. Make sure that you’re not accidentally blocking pages from being indexed.
