Autonomous Agents on the Voynich Manuscript

The Voynich Manuscript is a 600-year-old book written in an unknown script that no one has been able to read. It's filled with botanical illustrations, astronomical diagrams, and text that has resisted professional cryptographers and linguists for over a century. It's also text-based, well-digitized, and has an enormous body of prior literature. That makes it a good target for autonomous agent research.

I'm running a multi-phase agent pipeline against it. This post covers the architecture, what the agents have found so far, and some lessons about running long-horizon research loops.

Architecture

The pipeline has three phases, each running as an autonomous agent with its own methodology document and workspace.

Librarian. Builds a structured knowledge base of prior Voynich research: major hypotheses, key analytical results, prior decipherment attempts and why they failed. This runs first and produces the substrate that the other agents work from. The quality of downstream analysis depends heavily on this step.

Botanist. Does computational matching between the manuscript's plant illustrations and medieval herbal manuscripts from across Europe, the Mediterranean, and the Arabic world. The goal is to find anchor points: if we can confidently identify a plant, we may be able to associate an illustration with a text passage and use that as ground truth for cipher testing.

Cryptanalyst. Runs a continuous loop of statistical and structural experiments on the manuscript's text. This agent designs its own experiments, executes them, evaluates results, and builds on its findings. It works from the Librarian's knowledge base and targets specific hypotheses to confirm or rule out.

Each agent works autonomously. I wrote the methodology documents, set the research direction, and review outputs, but the agents decide what experiments to run and how to iterate on results.

Findings so far

The Cryptanalyst has been the most productive phase. Some results worth noting:

Morphological structure. The writing system shows regularities in how "words" are constructed that look more like deliberate linguistic design than random or meaningless sequences. Still early, but worth investigating further.

Folio clustering. The agent identified clusters of folios with anomalous statistical properties. Prior researchers had flagged some of these as interesting but hadn't fully characterized them. The agent's analysis quantifies how these clusters diverge from the manuscript's overall distribution.

Cipher refutation. Several major cipher families have been tested against a range of candidate source languages. Some have been ruled out, which helps narrow the hypothesis space. This is the kind of systematic work that's tedious for humans but well-suited to agents running in a loop.

These are early results, not conclusions. The goal at this stage is to build structural observations that constrain the space of plausible hypotheses.

Lessons on autonomous research loops

Running agents on open-ended research problems works differently than running them on well-defined software engineering tasks. A few observations:

Characterization vs. interpretation. Agents are good at running experiments and collecting evidence. They are bad at knowing when to stop characterizing and start interpreting. Left unsupervised, they escalate from "here's a pattern" to "here's what it means" faster than the evidence warrants. Human oversight at interpretation boundaries is essential.

Knowledge base quality matters more than expected. The Librarian phase initially felt like overhead. In practice, the Cryptanalyst's output quality correlates directly with the knowledge base quality. Running research agents without a clean substrate produces speculation, not findings.

Long loops need checkpointing. Agents that run for days accumulate context drift. Periodic human review and methodology restatement keeps the work on track.

What's next

The immediate direction is combining the Botanist's plant identifications with the Cryptanalyst's morphological findings to look for anchor pairs. If we can match an identified plant to a text passage, that opens a path to testing specific cipher hypotheses against real data. Beyond that, there are entire categories of analysis the agents haven't touched yet: positional encoding patterns, cross-section statistical comparisons, and computational approaches to the illustration-text relationship that weren't feasible before LLMs. There's a lot of surface area left to cover, and the pipeline is designed to keep running.