Blog — nullmirror

Agent Systems with SLMs, Workflow Engines and Adaptive Routing
2025-09-02 ·5 min read
NVIDIA’s June 2025 paper Small Language Models Are the Future of Agentic AI argues that agent systems run better when built around small language models (SLMs) rather than …
nullbench Fingerprints v1.1: Stability Updates and New Routes
2025-08-31 ·3 min read
We revised nullbench with a stability metric that avoids collapse on single spikes and a category set split into more granular writing style and language comprehension tracks. The …
Trust Boundaries for LLM Agents
2025-08-31 ·5 min read
Large language models capable of acting as agents introduces a new layer of risk. These systems are more than passive text generators but are often equipped with tools and can …
nullbench Fingerprints: Starting an Operational Playbook
2025-08-30 ·8 min read
Control of execution defines ownership. When a large language model file sits on your machine, it runs the same way until you decide to replace it. No silent updates, no routed …
nullbench: Judge Panel and Methodology
2025-08-24 ·7 min read
Evaluation of large language models (LLMs) is often presented as a leaderboard problem, ranking systems by performance on tasks with clear, objective answers. Prior work showed …
nullbench: Bias Benchmarking for Large Language Models
2025-08-23 ·6 min read
Nullbench is a controlled, reproducible benchmarking framework for large language models (LLMs) designed to isolate inherent response tendencies by evaluating models in a …
The Goldilocks Horizon: Scaling, Reasoning, and the Stopwatch
2025-08-23 ·4 min read
For a number of years, LLM coding assistants are now marketed as labor multipliers. Investors pitch productivity gains, startups chase valuations, and executives present slides …
Structural Overgeneralization in LLM Summarization
2025-08-10 ·4 min read
It is well understood that large language models overgeneralize, carrying patterns too far and turning small ambiguities into catastrophic failures. The April 2025 study by Peters …
Tools and Filters for Short Horizons, Fragile State
2025-07-06 ·5 min read
Star Trek showed computer systems breaking in ways that once looked cartoonish: the machine takes a command too literally, or it finds a boundary no one anticipated, and suddenly …