AI applications require access to vast amounts of fresh, relevant, and trustworthy data, but the web’s original design limits automated discovery and retrieval of this information. To address this, a new web data infrastructure layer is emerging that can navigate hundreds of millions of web domains and billions of new URLs weekly, providing real-time data to AI models, according to technologyreview.com.

This infrastructure layer aims to overcome the web’s inherent design constraints by enabling AI models to discover and map the rapidly expanding digital realm. Or Lenchner, CEO of Bright Data, a web data collection platform, highlighted the challenge by comparing data availability to the universe: "It’s out there, but you don’t know what you don’t know." This layer is critical for grounding AI outputs in current and verifiable information.

The need for this infrastructure arises as AI development moves beyond scaling model size and training data to addressing the dynamic, unstructured, and evolving nature of web data. Organizations face a bottleneck in keeping AI models updated with real-time information, making this infrastructure essential for maintaining AI performance and reliability. The shift underscores the importance of data quality and accessibility alongside model architecture.

Bright Data and similar platforms are at the forefront of building this infrastructure to support AI’s next frontier. The ability to handle billions of new URLs weekly and provide trustworthy data will be crucial for AI applications across industries, as detailed in the June 24 report by technologyreview.com.

Editorial standards. Reported and edited at Startupniti's news desk from the sources listed in the right rail. Every fact traces to a citation. If something looks wrong, write to corrections.