Google has implemented token-based usage limits for its Gemini AI models since May 17, 2026, affecting both free and paid tiers. This change caps usage based on prompt complexity, model choice, and chat length, resetting every five hours with a weekly maximum. The move responds to a compute shortage impacting Google’s largest enterprise customers, including Meta, which faced restrictions starting in March, according to medianama.com.

The token-based limits emerged as Google struggled to meet growing demand for AI compute power. Meta was the first major client to be capped after requesting more capacity than Google could provide, disrupting some of its internal AI projects. CEO Sundar Pichai acknowledged the compute constraint during the company’s first-quarter earnings call in April, noting that cloud revenue growth was limited by supply. Google’s signed-but-undelivered cloud contracts nearly doubled to over $460 billion, reflecting demand outpacing capacity, medianama.com reported.

This compute shortage highlights the challenges cloud providers face in scaling AI infrastructure amid surging enterprise interest. Google’s move to token-based limits contrasts with traditional usage caps by focusing on computational cost rather than raw usage time. The restrictions on major clients like Meta underscore the intensity of resource competition in the AI sector. Google’s cloud revenue surpassing $20 billion for the first time signals strong demand despite supply constraints, medianama.com detailed.

Google’s token-based limits on Gemini AI models remain in effect as of June 2026, with the weekly cap resetting every five hours. The company’s compute shortage has reshaped how much free AI access ordinary users receive and continues to impact enterprise customers’ AI projects, according to medianama.com.

Editorial standards. Reported and edited at Startupniti's news desk from the sources listed in the right rail. Every fact traces to a citation. If something looks wrong, write to corrections.