LAUNCH ETA: 2026 May
  • 2025 Learnings on LLMs and Software Work

    ·3 min read

    2025 suggests that the gap between expectations and measured outcomes around LLMs in software engineering is no longer subtle. Large investments and confident narratives imply …

  • Block-Floating FP4 for Local Inference in llama.cpp

    ·7 min read

    MXFP4, aka Microscaling Format for 4-bit Floating-Point, is essentially a very small floating-point format that borrows one big trick from signal processing and older “block …

  • Raspberry Pi Inference: Tiny Quantized Models at the Edge

    ·10 min read

    This post reports a small, benchmark run on a Raspberry Pi 4B (8GB RAM) device using llama.cpp across five compact GGUF models spanning ~270M to ~1.2B parameters, with aggressive …

  • Short Model Horizons Revisited

    ·13 min read

    Since our earlier “short horizons, fragile state, orchestration first” note 1, there have been more data points published and the picture is becoming a bit sharper, …

  • Tiny Quantized Models On Device

    ·13 min read

    On-device LLMs are compact, optimized large language models that run directly on local hardware like smartphones or edge devices, instead of on a remote cloud server. This allows …

  • Switching our Inference Backend from Ollama to llama.cpp

    ·10 min read

    For pragmatic reasons, Ollama has been the default local backend in our prior benchmark runs. Our recent article on Ollama and Open WebUI practices1 illuminated the need for an …

  • Local AI Capture: Ollama, Open WebUI, and llama.cpp

    ·14 min read

    We have seen examples such as Red Hat placing RHEL sources behind customer portals and contracts, and Canonical combining GPL code with contributor license agreements, trademark …

  • Practical Long-Context LLM Inference with llama.cpp

    ·7 min read

    We can run serious long-context inference on commodity Apple silicon, but long context is hard. In this post we’ll touch on what Grouped-Query Attention (GQA) changes, and …

  • LLM Fingerprints v1.4: The Cost of Quality, and routing decides winners

    ·10 min read

    nullbench is our ongoing evaluation run across a rotating set of current large language models1. We score models across practical dimensions—factual depth, reasoning, software …

  • nullbench Update: Iterating the Compliance Judge Panel

    ·4 min read

    The compliance benchmark within nullbench serves as a fine-grained audit of a model’s capacity to follow instructions under constraint. Unlike accuracy tests, it measures the …