Security Gating as a Control Problem
·10 min readA practical approach to securing LLM agents by treating tool invocation as a control boundary. Why fail-closed gates, monotonic risk checks, and adversarial benchmarks are more …
GPT-OSS-20B Sampling & Prompting for Style Control
·9 min readBest practices for sampling parameters and system prompt design to optimize style compliance and throughput with local LLMs like GPT-OSS-20B.
Execution Is Cheaper, Comprehension Is Not
·7 min readLLMs have collapsed the cost of execution, but verification, comprehension, and ownership remain the binding constraints in software engineering and adjacent knowledge work.
LLM Fingerprints v1.5: Redistribution with 4chan Data
·8 min readThe latest nullbench run shows no step-function gains in base model capability. Instead, fine-tuning, abliteration, and long-context extensions primarily redistribute behavior …
Agentic AI Raises The Floor More Than The Ceiling
·9 min readAI agents make easy software tasks cheaper but don’t automate hard engineering. This post tries to explain why autonomy claims overreach and where AI actually helps.
2025 Learnings on LLMs and Software Work
·3 min read2025 suggests that the gap between expectations and measured outcomes around LLMs in software engineering is no longer subtle. Large investments and confident narratives imply …
Block-Floating FP4 for Local Inference in llama.cpp
·7 min readMXFP4, aka Microscaling Format for 4-bit Floating-Point, is essentially a very small floating-point format that borrows one big trick from signal processing and older “block …
Raspberry Pi Inference: Tiny Quantized Models at the Edge
·10 min readThis post reports a small, benchmark run on a Raspberry Pi 4B (8GB RAM) device using
llama.cppacross five compact GGUF models spanning ~270M to ~1.2B parameters, with aggressive …Short Model Horizons Revisited
·13 min readSince our earlier “short horizons, fragile state, orchestration first” note 1, there have been more data points published and the picture is becoming a bit sharper, …
Tiny Quantized Models On Device
·13 min readOn-device LLMs are compact, optimized large language models that run directly on local hardware like smartphones or edge devices, instead of on a remote cloud server. This allows …