A recent cost analysis published on May 17, 2026, reveals that running large language models (LLMs) locally on an Apple Silicon M5 MacBook Pro is approximately three times more expensive per million tokens than using OpenRouter’s cloud-based service. The analysis, conducted by independent researcher William Angel, compares electricity, hardware depreciation, and token output speeds to determine the total cost of local versus cloud-based LLM inference.
The analysis highlights that an M5 MacBook Pro with 64GB RAM, priced at $4,299, consumes between 50-100 watts under load. At an electricity cost of $0.20 per kWh (rounded up from the U.S. average of $0.1730 per kWh in 2025), the hourly electricity cost for running inference at 100% load ranges from $0.009 to $0.018. Over a 24-hour period, this translates to approximately $0.48 per day solely for electricity, excluding hardware depreciation.
Hardware depreciation significantly impacts the total cost of local inference. Assuming a lifespan of 3, 5, or 10 years for the M5 MacBook Pro, the annual hardware cost ranges from $430 (10-year lifespan) to $1,433 (3-year lifespan). The hourly hardware cost over these lifespans is $0.04908, $0.09815, and $0.16358, respectively. The analysis suggests a 5-year lifespan as a reasonable estimate for normal use, though intensive inference workloads may shorten this to 3 years.
Token output speed is a critical factor in determining cost efficiency. The M5 MacBook Pro achieves 10-40 tokens per second for models like Gemma 4 31B, which is described as nearly comparable to Anthropic’s Sonnet in performance. At 10 tokens per second, the device produces 36,000 tokens per hour, while at 40 tokens per second, it generates 144,000 tokens per hour. These speeds result in a cost per million tokens ranging from $0.40 to $4.79, depending on hardware lifespan and electricity consumption.
In contrast, OpenRouter’s cloud-based service offers Gemma 4 31B at a cost of 38-50 cents per million tokens, with inference speeds reaching 60-70 tokens per second. This makes OpenRouter not only cheaper but also significantly faster than local inference on Apple Silicon. The analysis notes that under optimistic conditions (50 watts, 40 tokens per second, and a 10-year hardware lifespan), the M5 MacBook Pro’s cost per million tokens could match OpenRouter’s. However, under pessimistic conditions (100 watts, 10 tokens per second, and a 3-year lifespan), local inference costs can be up to 10 times higher.
The analysis emphasizes that speed is the most significant advantage of cloud-based inference. OpenRouter’s providers achieve 3-7 times faster token generation than the M5 MacBook Pro, which typically operates at 10-20 tokens per second. For professional use cases, such as those involving human employees, the cost of local inference remains negligible compared to salary expenses. The author concludes that, in most scenarios, investing in cloud-based services like Anthropic or OpenRouter is more cost-effective than relying on local hardware.
Despite the cost disadvantages, the analysis acknowledges the impressive capability of consumer-grade Apple Silicon hardware. The M5 MacBook Pro can run models like Gemma 4 31B, which deliver performance close to Anthropic’s Sonnet, a high-end commercial LLM. This demonstrates the rapid advancement of consumer hardware in supporting sophisticated AI workloads, even if it is not yet the most economical option for large-scale inference.
The electricity cost data used in the analysis is based on the U.S. Energy Information Administration’s (EIA) average residential electricity price for 2025, which is $0.1730 per kWh. The author rounded this up to $0.20 per kWh for simplicity. The analysis also references a personal electricity bill from Northern Virginia, which reported a cost of $0.18 per kWh, aligning closely with the EIA’s national average.