Agent Systems with SLMs, Workflow Engines and Adaptive Routing
·5 min readNVIDIA’s June 2025 paper Small Language Models Are the Future of Agentic AI argues that agent systems run better when built around small language models (SLMs) rather than …
nullbench Fingerprints v1.1: Stability Updates and New Routes
·3 min readWe revised
nullbenchwith a stability metric that avoids collapse on single spikes and a category set split into more granular writing style and language comprehension tracks. The …Trust Boundaries for LLM Agents
·5 min readLarge language models capable of acting as agents introduces a new layer of risk. These systems are more than passive text generators but are often equipped with tools and can …
nullbench Fingerprints: Starting an Operational Playbook
·8 min readControl of execution defines ownership. When a large language model file sits on your machine, it runs the same way until you decide to replace it. No silent updates, no routed …
nullbench: Judge Panel and Methodology
·7 min readEvaluation of large language models (LLMs) is often presented as a leaderboard problem, ranking systems by performance on tasks with clear, objective answers. Prior work showed …
nullbench: Bias Benchmarking for Large Language Models
·6 min readNullbench is a controlled, reproducible benchmarking framework for large language models (LLMs) designed to isolate inherent response tendencies by evaluating models in a …
The Goldilocks Horizon: Scaling, Reasoning, and the Stopwatch
·4 min readFor a number of years, LLM coding assistants are now marketed as labor multipliers. Investors pitch productivity gains, startups chase valuations, and executives present slides …
Structural Overgeneralization in LLM Summarization
·4 min readIt is well understood that large language models overgeneralize, carrying patterns too far and turning small ambiguities into catastrophic failures. The April 2025 study by Peters …
Tools and Filters for Short Horizons, Fragile State
·5 min readStar Trek showed computer systems breaking in ways that once looked cartoonish: the machine takes a command too literally, or it finds a boundary no one anticipated, and suddenly …