Core Runtime
llama.cpp
The default local runtime for GGUF inference when latency predictability and deployment control matter more than framework convenience.
These are not random links. Each repository is mapped to where it fits in shipping iOS AI products, what can go wrong, and how to combine it with product constraints.
Core Runtime
The default local runtime for GGUF inference when latency predictability and deployment control matter more than framework convenience.
Swift Integration
A Swift wrapper around llama.cpp that speeds up iOS integration and reduces glue-code complexity for app teams shipping local models.
Speech Layer
For on-device speech recognition on Apple platforms, WhisperKit is one of the fastest ways to add practical voice input loops in local-first products.
Apple Native AI
For supported devices, Apple Foundation Models are useful as a native completion layer beside GGUF runtime paths, especially for rewrite and concise summarization.