Amazonbot to enforce robots.txt from June 15—sites get direct control

Amazon has informed publishers that its web crawler, Amazonbot, will begin complying with robots.txt directives starting June 15, 2026. The company emailed site owners on May 14, 2026, explaining that crawl preferences will now be governed solely by industry-standard protocols, replacing the previous manual request process. Sites that do not implement robots.txt rules by the deadline will experience Amazonbot defaulting to standard web crawling practices, removing manual control from site owners ¹.

The notification from Amazon Publisher Support, sent on May 14, 2026, detailed the update, highlighting that site owners can now manage Amazonbot’s access at the page, directory, or entire site level. The message directed recipients to Amazon’s developer portal for comprehensive instructions on implementing robots.txt directives. This shift follows longstanding criticism of Amazonbot’s prior non-compliance with web scraping standards, which forced site owners to rely on manual interventions to restrict crawler access ¹.

Amazon’s move reverses its earlier approach, which required site owners to submit manual requests to limit Amazonbot’s crawling activities. The email warned that if robots.txt directives are not in place by June 15, 2026, Amazonbot will revert to default crawling behavior. This adjustment aligns Amazonbot with other major crawlers, including Googlebot, which have long adhered to robots.txt protocols to control access and respect site owners’ preferences ¹.

Included in the email was a link to Amazon’s developer documentation, which explains how to apply robots.txt rules specifically for Amazonbot. The documentation clarifies that directives can restrict access to individual pages, directories, or entire websites, offering granular control over crawler activity. Amazon’s adoption of this protocol is expected to reduce friction for publishers who previously depended on ad-hoc manual requests to manage Amazonbot’s crawling ¹.

Xe Iaso, author of a blog post on the announcement, expressed mixed feelings about the change, noting that Amazon’s scraper was a primary reason for creating Anubis, a tool designed to manage unwanted crawler traffic. Iaso intends to update Anubis to incorporate the new robots.txt changes, ensuring it remains effective against Amazonbot’s updated behavior. The post also pointed out the irony that Amazon’s email included Outlook for Mac headers despite being sent from an official Amazon address ¹.

The robots.txt standard is widely used to instruct web crawlers on which parts of a website to access or avoid. Major crawlers such as Googlebot, Bingbot, and now Amazonbot rely on these directives to respect site owners’ preferences. Amazon’s adoption of robots.txt is viewed as a move toward standardizing crawler behavior industry-wide, which should reduce the need for manual interventions and improve transparency in crawler management ¹.

Amazonbot’s previous disregard for robots.txt directives had drawn criticism from webmasters and developers who argued that the crawler ignored established web standards. The shift to robots.txt compliance addresses these concerns, though some remain skeptical about Amazon’s commitment to consistently follow the protocol. The email’s wording, which mentioned default crawling practices if no directives are set, suggests Amazon is framing the change as a convenience for site owners ¹.

The blog post also humorously noted that the email’s formatting retained a "sent from my iPhone" signature, despite being sent via Outlook for Mac. Examination of the headers revealed Exchange-specific metadata, highlighting the inconsistency and informal tone of Amazon’s communication. This contrast between the casual email style and the significance of the policy change caught the attention of observers ¹.

Editorial standards. Reported and edited at Startupniti's news desk from the sources listed in the right rail. Every fact traces to a citation. If something looks wrong, write to corrections.