Switching our Inference Backend from Ollama to llama.cpp
·10 min readFor pragmatic reasons, Ollama has been the default local backend in our prior benchmark runs. Our recent article on Ollama and Open WebUI practices1 illuminated the need for an …
Local AI Capture: Ollama, Open WebUI, and llama.cpp
·14 min readWe have seen examples such as Red Hat placing RHEL sources behind customer portals and contracts, and Canonical combining GPL code with contributor license agreements, trademark …
Practical Long-Context LLM Inference with llama.cpp
·7 min readWe can run serious long-context inference on commodity Apple silicon, but long context is hard. In this post we’ll touch on what Grouped-Query Attention (GQA) changes, and …
LLM Fingerprints v1.4: The Cost of Quality, and routing decides winners
·10 min readnullbenchis our ongoing evaluation run across a rotating set of current large language models1. We score models across practical dimensions—factual depth, reasoning, software …nullbench Update: Iterating the Compliance Judge Panel
·4 min readThe compliance benchmark within nullbench serves as a fine-grained audit of a model’s capacity to follow instructions under constraint. Unlike accuracy tests, it measures the …
nullbench Update: Expanding the Mid-Field and Refining the Judge Panel
·4 min readThe nullbench framework continues to evolve toward reproducible, interpretable behavioral analysis of language models. Since version 1.3, which introduced a revised judge panel …
LLM Fingerprints v1.3: GLM-4 Judiciary, Summarization Collapse
·4 min readThe latest nullbench run used the revised judge panel with
glm-4-9bas the mid judge. The swap cut family correlation and exposed variance that earlier Qwen-anchored panels …LLM Fingerprints v1.2: efficient mid tier models, judge and lineup refresh
·6 min readBuilding on the previous nullbench1 methodology and early refinements2, we expanded the pool, kept decoding and scoring fixed, and asked if these additions change routing. The …
Retrievability is not discovery
·4 min readRAG pipelines retrieve a narrow slice of documents that look closest in vector space to a query, but treating that narrowing as discovery ignores the fact that anything framed …
The Reshaping of the Information Ecosystem
·6 min readThe way people reach information is undergoing a structural break. For two decades, the web grew on the back of search engines sending users outward: a query produced links, …