Research / Build / Publish
Sri Harsha Gouru
I take systems apart until they explain themselves.
I work on local AI systems, runtime tooling, training experiments, and protocol-heavy software.
Most of what I publish starts with a question I can't let go of. Lately that has meant watching models at inference time, tracing browser AI traffic, and testing what Apple Silicon can actually do when you stop treating it like a black box.
The through-line is simple: understand what is really happening, then build from there.
Writing
all posts →I ran a full security assessment against LiquidAI's LFM 2.5 350M — a model small enough to run anywhere. A third of the tests broke it, and the worst failures were exactly the ones that matter for agents.
I took apart how six real apps move your identity around — from broker tokens sitting in plaintext IndexedDB to four-layer encrypted envelopes — and the spread is enormous.
We give coding agents and on-device LLMs the run of our machines — so I took four of them apart to find out where the agent loop actually runs and who actually holds the keys.
Selected work
all projects →Real-time visualization of LLM internals — trace token generation, attention patterns, hidden states, and probability distributions as they happen. Built to understand what's actually going on inside these models.
Experiments for OpenAI's 16MB language model challenge. Focused on compact architectures, training dynamics, evaluation, and practical iteration on tiny models under hard artifact constraints.
Benchmarking and experimentation setup for local LLM inference. Covers KV cache behavior, quantization tradeoffs, batching, FlashAttention, speculative decoding, and serving paths.
Write-up on getting training loops running on Apple's Neural Engine and what it took to make the surrounding CPU pipeline fast enough to matter.
Local tooling for auditing browser-based AI traffic through Chrome DevTools Protocol. Captures requests, classifies telemetry, compares streaming protocols, and turns noisy web app behavior into something queryable.