I am trying to figure out where the pendulum for AI LLM inference hosting will swing to.

Will we have more cloud service usage of the cloud hyperscalers, or will inferencing be on local devices such as laptops or in corporate datacenter?

Here is my thinking.

I believe Moore’s law will be relevant for a few more years, implying that an affordable supply of local capacity will grow.

I also believe that there is a minimum LLM size to make them functional, in the same way that there is a minimum number of bytes for a decent picture or audible sound. But above a certain size, there is going to be a diminishing return. Depending on the scenario, we are talking about billions to trillions of parameters in the LLM.

This implies inferencing memory requirements from dozens of Gigabytes to Terabytes. Impressive, but locally hosting this is already affordable for regular users.

On the demand side, cloud based inferencing may be cheaper than self hosted inferencing, though there is a tendency to treat hardware investments in local equipment as a sunk cost. It does not appear that cost is a serious discriminator, though capacity may well be.

Privacy and sovereignty requirements are strongly driving the demand for local inferencing, and I do not expect this to diminish in the next couple of years. In fact, the strategic geopolitical momentum is towards more autonomy, not less.

In summary, I believe that the future of inferencing is going to be local, even though there may still remain a relevant market for cloud services. Consequently, investments by hyperscalers in AI datacenters might be overhyped from this perspective, while it would be wise for companies and service providers to build up some expertise in local inferencing and the associated ecosystem of open source tools.

What are the counter arguments to or omissions in this conclusion that I am overlooking?