What Is Robots.txt and Its Uses in SEO? - Webriology

Crucial Insights and Information About the Industry to Our Audiences

What Is Robots.txt and Its Uses in SEO?

Share This

 

What Is Robots.txt and Its Uses in SEO - Webriology

Robot.txt is a text file, webmasters create to instruct web robots (typically Search Engine robots) how to crawl pages on their website…


Robot.txt file is part of the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.


The Robots Exclusion Protocol (REP) also includes directives like Meta robots, as well as Page-, Sub Directory-, or Site-Wide instructions for how search engines should treat links (such as “Follow” or “noFollow”).


In practice, robots.txt files indicate whether certain user agents (Web-crawling Software) can or can’t crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior or certain (or all) user agent.



Basic Format: 

User-agent: [user-agent name]

Disallow: [URL String not to be crawled]

Here is a simple robots.txt file with two rules, explained below:

# Rule 1

User-agent: Googlebot

Disallow: /nogooglebot/


# Rule 2

User-agent: *

Allow: /

Sitemap: http://www.example.com/sitemap.xml

 

Explanation:

1. The user agent named "Googlebot" crawler should not crawl the folder http://example.com/nogooglebot/ or any subdirectories.

2. All other user agents can access the entire site. (This could have been omitted and the result would be the same, as full access is the assumption.)

3. The site's Sitemap file is located at http://www.example.com/sitemap.xml

Source: https://support.google.com/webmasters/answer/6062596?hl=en

 

1 comment:

  1. Great post, this is a very informative article. SEO has become the most popular component of digital marketing. People can also check out some SEO related post on https://seo-services.searchmedia.co.za

    ReplyDelete