Let’s talk.

2026 is make-or-break for reasoning capability claims. Every lab says their model reasons better. I document whether that’s true – and where competitors are actually ahead.

Are you shipping models and wondering what your evals aren’t catching? Let’s talk.