Date:
16 December 2025
 
Kurt is Salsa’s Chief Technology Officer.

Cognizant’s million-step result in plain language

Cognizant’s AI LabExternal Link has demonstrated something that cuts through a lot of AI hype. Using a framework called MAKER, they solvedExternal Link a 20 disk Towers of Hanoi ( https://en.wikipedia.org/wiki/Tower_of_HanoiExternal Link ) variant that required precisely 1,048,575 dependent steps, executed in sequence, with zero errors. For context, the best frontier models such as Claude 3.7 Thinking and DeepSeek R1 fail completely at around 8 disks or around 255 dependent steps.

Cognizant did not invent a new frontier model, they took an engineering approach to using models to solve the problem. Using MAKER, they were able to use smaller, cheaper models such as gpt-4.1-mini and an open source 20B model. These gave the best reliability per dollar at the single step level.

MAKER approaches the problem using disciplined engineering principles, built around three core mechanisms:

  • Maximal Agentic Decomposition (MAD): The work is decomposed into tiny, single-step tasks. Each microagent receives only the information required for its single step. This minimises confusion and controls context risk - see our blog What is context engineering? for more information.
  • First-to-ahead-by-k-voting: Multiple agents attempt the task in parallel, then vote on the quality of outputs at each step. The consensus answer is accepted.
  • Red flagging: Agents showing structurally bad answers are discarded before they can contaminate the overall process.

Cognizant did what good project managers already do: Humans do not search for and find a single genius to complete a project. We separate analysis, design, implementation, testing and governance across different specialists. We then action the tasks in sequence to complete the entire project. We don’t provide the user interface (UI) design team with a detailed run down of how Kubernetes will be implemented; they only receive the context needed for their portion of the project. MAKER applies a similar pattern to AI agents.

The important strategic signal here is not “AI can now reason perfectly”, it’s that engineering discipline matters more than raw model size. The real gains now come from decomposition, error budgets and system design.

The illusion of thinking: when smart models fail in practice

Cognizant explicitly connects MAKER to Apple’s “Illusion of Thinking” studyExternal Link that shows: “(1) low-complexity tasks where standard models surprisingly outperform Large Reasoning Models (LRMs), (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse.”

The pattern is familiar:

  • Short tasks via chat work well.
  • Long workflows accumulate small mistakes until the whole process falls apart.
  • Bigger context windows do not fix the problem.

Ilya Sutskever notes in the Dwarkesh podcastExternal Link , a similar disconnect: models can score well on evaluations yet still behave inconsistently in real-world scenarios. MAKER responds directly to that failure mode: it assumes models will be fallible and designs the system so that errors are contained and corrected locally.

MAKER in simple terms: smash the task, narrow the view

MAKER takes three core engineering principles and applies them at a massive scale.

1. Smash the task into atomic steps
Each microagent is responsible for exactly one decision or move.

2. Narrow the view for each agent
Instead of feeding a single agent a long global state, MAKER scopes inputs tightly. Each agent is a specialist line worker, given just enough information to do its job.

3. Vote and filter at every step
Multiple agents attempt the same microtask. A k-ahead voting rule selects the stable answer. Structurally confused outputs are red-flagged and discarded.

MAKER’s strength comes from disciplined context control. Each agent only sees what is necessary for its step, structural confusion is flagged early, and a central controller governs memory and sequencing.

Practical next steps for agencies and enterprises

1. Start with decomposition
Map workflows into atomic decisions before selecting models. For engineering workflows, consider ways to use frontier models to automate the decomposition stage before entering your agentic workflow.

2. Add voting and red-flagging to critical steps
You do not need a full microagent architecture to gain reliability. Start within your current capabilities and continuously improve.

3. Invest in context engineering
Build taxonomies, retrieval pipelines and isolation patterns.

4. Wrap AI in traditional controls
Log inputs, outputs and decisions. Apply segregation of duties.

5. Prefer smaller, well-constrained models
MAKER demonstrates that reliability per dollar improves when the architecture does the heavy lifting rather than the model.

6. Align AI programs with workforce and policy readiness
AI disruption will hit before regulation catches up. Getting your architecture and governance in place now means you’ll be ready and won’t need to bandaid your solutions later.

Closing: a different kind of AI ambition

MAKER is not a party trick. It’s evidence that the future belongs to engineered reliability, not just frontier scale. For government agencies and regulated enterprises, real ambition looks like:

  • Small systems
  • Structured systems
  • Governable systems

MAKER shows that a small model, in a disciplined architecture, can execute a million flawless steps. This is what trustworthy AI looks like in practice: not abstract intelligence, but predictable, governable competence.

Importantly, nothing in this architecture requires speculative breakthroughs. It’s available now.