Google has recently made updates regarding unsupported fields within the robots.txt file. To understand this update, you must also understand what a robots.txt file is. This simple text file will tell web crawlers (or “bots”) how to interact with their site or on a specific page. This file resides in the root directory of your website and has rules about which pages should be and should not be crawled. Keep reading to learn more about the critical functions of robots.txt files and the latest Google update.
Example
User-agent: *
Disallow: /private/
Sitemap: https://www.example.com/sitemap.xml
Critical Functions of Robots.txt Files
As mentioned above, the primary function of the robots.txt file is to provide web crawlers (bots) with instructions on how to interact with a website or page. Some essential functions include:
- Controls Web Crawling: Website developers use the robots.txt file to specify which site areas can be crawled by search engine bots. The “disallow” directive tells search engines not to access certain website pages, files, or sections.
- Optimize Crawl Budget: By blocking unnecessary pages (category pages, inactive pages, etc), the robots.txt file helps search engines focus on more important pages, improving the site’s efficiency.
- Direct Bots to Sitemaps: Another function of robots.txt files is to help bots discover and crawl all the essential pages of your site efficiently. This is done through the XML sitemap within robots.txt, which is a file that lists the pages you would like to be discovered on the search engine results page.
The Latest Google Robots.txt Update
Google has released a new update for robots.txt files. They will now emphasize that crawlers will ignore any unsupported directives, limited to these four: user-agent, allow, disallow, and sitemap. This will ensure clarity and ensure that unsupported fields previously gone unnoticed will not influence how Google crawls and indexes your site.
Site owners should audit their robots.txt files to ensure they use only supported fields, as any unsupported fields will be disregarded. These “unsupported fields” can include custom directors or those used by third-party tools—for example, ‘crawl-delay’ or ‘archive’. These changes reinforce Google’s efforts to streamline how robots.txt files are interpreted, ensuring the same approach for all websites.
Get Found on Google with Boston Web Marketing
We are still learning more about this update, so to ensure your website meets Google’s standards, work with our team at Boston Web Marketing. Google releases various updates throughout the year that are important to watch out for.
If you need assistance ensuring your website is found on Google and using robots.txt files correctly, work with our experts at Boston Web Marketing. We can help you get found quickly on Google—call us at 857.526.0096.