A Model’s A Priori Is Frozen A Posteriori

The philosophy of knowledge discovery, and what six weeks on a project called Aedos taught me about the cost of being sure.

How do you trust an output you can’t directly check? If a ground truth exists to check against, it seems easy, right? Maybe. When it doesn’t, it’s harder, and that’s where most of a capable agent’s work lives.

Aedos: verification with an oracle

Toward the end of April I started a project called Aedos—the name of Greek origin representing the sense of shame or humility that restrains you from overstepping—which is a symbolic memory layer that a model builds up and references over time. Such a store (in the way I built it) fails two ways: 1. the model writes something false into it, or 2. it stops trusting what’s there. The second is fatal, so I built with soundness (no false-positives) as a #1 priority.

I built Aedos on Wikidata, where facts are entity–predicate–entity triples. Wait: if you verify strictly against Wikidata, isn’t that just a slow SQL query with no knowledge beyond the store? Partly, yes. My bet was that targeting the creativity of a language model at the next layer of predicates (i.e. creating assertions like “is_boss_of is the opposite of is_employee_of”) would let the model draw more assertions, simultaneously patching any inconsistencies in Wikidata and letting the model build more knowledge. More on this later. The dangerous alternative to verifying strictly against Wikidata is you let the model assert past it and you’re just trusting a language model again. But wait again: if a model is right most of the time, corroboration across passes should drive the store toward soundness. Any errors would then either be attributable to the model’s internal representations, not to random variance, or Wikidata itself.

Aedos didn’t work (in the way I built it), and the failure is the useful part. It was brittle, with inconsistent predicate representations. For instance, the extractor reached for instance_of where Wikidata stored the fact under occupation. The deeper mistake was that I optimized for absolute soundness, zero false verifications, and got a system trustworthy precisely because it almost never answered. In a small PopQA pilot, Aedos abstained on nearly everything it couldn’t ground in Wikidata, giving 0 confident answers to the base model’s 27%, with no confident errors on either side. In a custom-built evaluation, Aedos answered correctly about 61% of the time (below the ~76% you’d get from the language model alone) but it never once vouched for a falsehood: a 0% false-verified rate, against the model’s ~12%.

That maps onto the probe-and-refine result from earlier this year: refined guidance moved a coding agent’s coverage while per-patch precision held constant. Aedos hit the same axis from the opposite corner, driving precision to the ceiling and coverage through the floor. Optimizing soundness to the limit gets a trustworthy but silent system on any data that isn’t excruciatingly close to what you actually want tested.

The extraction step, pulling a clean claim out of text, was the one part of Aedos that I found worked, so it’s the part I’m carrying forward. Instead of maintaining a store, I’m wrapping an agent in a deterministic gate whose rule is that it can’t grade its own homework: a task isn’t done until a separate verifier reads back the real state of the world to confirm the effect happened, rather than trusting the agent’s report that it did. It auto-verifies what’s readable and escalates the rest. I’m testing this on a long-running agent now, and the early signal is promising: it’s the realign-at-every-step behavior that kept my Voynich agent on track, made structural. More on that once it’s earned its conclusions.

A philosophical relationship

More on letting the model build more knowledge up over time: wouldn’t it be cool if we could have something like a Laplace’s Demon for knowledge? That is, know one thing or a small set of certain facts and then everything else that there is to know about the world falls into place. Despite Aedos’s failure at its intended goal, I think that this is somewhat possible but also (maybe) inherently constrained, and this is why.

Philosopher Immanuel Kant defined a framework of analytic vs synthetic and a priori vs a posteriori knowledge. In short, analytic knowledge is true by definition while synthetic knowledge is not, and a priori knowledge doesn’t require experience to know while a posteriori knowledge can only be known through experience. Analytic a priori knowledge is trivial—definitions live there. Analytic a posteriori knowledge is believed to not exist, but disputed. Synthetic a posteriori knowledge is inherently questionable (see Descartes), but it makes up the type of knowledge that lives on Wikipedia/Wikidata, if you trust the site’s maintainers enough to call it knowledge (this Descartes-flavored distinction, for our purposes, I think is beside the point). For instance, “Argentina won the World Cup in 2022” would be synthetic a posteriori knowledge.

The final category, synthetic a priori knowledge, is the interesting one. One example of synthetic a priori knowledge is the knowledge that a straight line is the shortest distance between two objects. In theory, an LLM-based system could extend an existing base of synthetic a priori knowledge, and the same goes for analytic a priori knowledge. That is to say, if an intelligent system knows some baseline facts of the world, it should be able to reach all facts that don’t require experience to know just by thinking. However, this skips one load-bearing question: what plays the role of Kantian intuition (the faculty that supplies the content that concepts alone can’t) inside a language model? Without naming that source, “extend the base” just assumes the capability it’s supposed to explain.

One idea for this is that the pretrained weight geometry itself, the co-occurrence structure I wrote about in my emergent-misalignment post, is the non-experiential-at-inference source of new synthetic content. The model reaches a new true proposition not by running an experiment but by using its weights which already encode the regularities. The honest wrinkle is that this “intuition” was itself acquired empirically during training, so the model’s a priori is really frozen a posteriori: experience is baked into weights, then queried without further experience. The contrast worth drawing is that the autonomous systems that build knowledge today actually go and check instead of deriving a priori. They earn a posteriori knowledge by running experiments and reading the result. Run that empirical loop through a language model in a harness on a more open-ended task and it’s token-expensive at best, and unreliable at worst (see reward-hacking). I think this is relevant to the debate about to what extent model intelligence mirrors human intelligence.

Another idea is that since a hallucination is the model asserting something that was never in its training (which is exactly what a genuinely new synthetic claim would also look like from the inside), the difference is only whether it’s true. That makes hallucination a candidate source of the novelty required to extend factual knowledge, but now you’d want a fact-checker downstream to keep the creativity from going off the rails. Which is the problem Aedos was built to solve but was too brittle, so the loop closes.

As for autodiscovery of synthetic a posteriori knowledge, executing that gets closer to requiring an actual Laplace’s Demon to bend space and time. And I think it goes both ways: building any new knowledge off of synthetic a posteriori knowledge sounds good in theory but is probably only useful for more trivial applications (like Aedos was) or perhaps for the purpose of efficiency when combined with generalization over a particular knowledge domain. Focus on a knowledge domain would be extremely useful in its own right and would also apply to the other knowledge categories, and I think the vast majority of queries to language models apply to this type of knowledge.