nullmirror

Switching our Inference Backend from Ollama to llama.cpp
2025-11-02 ·10 min read
For pragmatic reasons, Ollama has been the default local backend in our prior benchmark runs. Our recent article on Ollama and Open WebUI practices¹ illuminated the need for an …
Local AI Capture: Ollama, Open WebUI, and llama.cpp
2025-11-01 ·14 min read
We have seen examples such as Red Hat placing RHEL sources behind customer portals and contracts, and Canonical combining GPL code with contributor license agreements, trademark …
Practical Long-Context LLM Inference with llama.cpp
2025-11-01 ·7 min read
We can run serious long-context inference on commodity Apple silicon, but long context is hard. In this post we’ll touch on what Grouped-Query Attention (GQA) changes, and …
LLM Fingerprints v1.4: The Cost of Quality, and routing decides winners
2025-10-29 ·10 min read
nullbench is our ongoing evaluation run across a rotating set of current large language models¹. We score models across practical dimensions—factual depth, reasoning, software …
nullbench Update: Iterating the Compliance Judge Panel
2025-10-05 ·4 min read
The compliance benchmark within nullbench serves as a fine-grained audit of a model’s capacity to follow instructions under constraint. Unlike accuracy tests, it measures the …
nullbench Update: Expanding the Mid-Field and Refining the Judge Panel
2025-10-04 ·4 min read
The nullbench framework continues to evolve toward reproducible, interpretable behavioral analysis of language models. Since version 1.3, which introduced a revised judge panel …
LLM Fingerprints v1.3: GLM-4 Judiciary, Summarization Collapse
2025-09-27 ·4 min read
The latest nullbench run used the revised judge panel with glm-4-9b as the mid judge. The swap cut family correlation and exposed variance that earlier Qwen-anchored panels …
LLM Fingerprints v1.2: efficient mid tier models, judge and lineup refresh
2025-09-14 ·6 min read
Building on the previous nullbench¹ methodology and early refinements², we expanded the pool, kept decoding and scoring fixed, and asked if these additions change routing. The …
Retrievability is not discovery
2025-09-05 ·4 min read
RAG pipelines retrieve a narrow slice of documents that look closest in vector space to a query, but treating that narrowing as discovery ignores the fact that anything framed …
The Reshaping of the Information Ecosystem
2025-09-04 ·6 min read
The way people reach information is undergoing a structural break. For two decades, the web grew on the back of search engines sending users outward: a query produced links, …