Writing

Writing

Technical writing about on-device AI, architecture, App Review, and product engineering.

The Ultimate GGUF Guide: Why it became the de facto standard for local LLM inference

A practical deep dive into GGUF architecture, quantization patterns, and deployment tradeoffs.

GGUFOn-Device AIQuantization
Read article

Running Gemma 2B on iOS: Reducing Metal shader startup from 10s to 1s

Runtime startup strategy and practical iOS inference optimization.

iOSOn-Device AIMetal
Read article

Why I moved SwiftSuDoKu from CoreData to SwiftData in production

Concurrency-focused migration notes and reliability outcomes.

iOSSwiftDataCoreData
Read article

On-device models vs cloud APIs: Cost and latency from a real iOS app

A practical look at retention, inference cost, and interaction speed.

On-Device AICloud APICost Analysis
Read article

AI wrapper architecture: Stop using monolith patterns for streaming products

A resilient stream architecture pattern for AI wrappers.

ArchitectureBFFAI Wrapper
Read article

Prompt engineering for 2B models: Think like a compiler

Constraint-first prompt design for small model stability.

Prompt EngineeringSmall ModelsGemma 2B
Read article