A Test Protocol for Frame Failure
The database tells both systems what is known. The test asks which system can notice when its reading has gone wrong.
Trinket Soul Framework · Axis Series · AX-42 · Michael S. Moniz · June 2026
Abstract
The Correction Sequence has moved from threshold to budget to architecture. AX-39 asked which nodes count as real correctors. AX-40 asked what it costs to keep them functional. AX-41 asked what path lets correction move from detection to changed inheritance. This paper asks the next question: can the claim be tested?
The proposed control is simple: equal archive, unequal plurality. Give two systems the same information. One system is archive-rich but monocultural: one dominant lineage, one inherited frame, one main route through interpretation. The other is a stacked plurality: correction-bearing model lineages, elder maps, adversarial roles, ground-contact checks, governance protection, and inheritance gates. Then introduce failures that are not merely missing facts but bad frames: corrupted assumptions, misleading categories, false consensus, inherited blind spots, suppressed alternatives, stale baselines, and local drift.
The prediction is not that plurality always gives better first answers. It is that a stacked correction ecology should detect frame failure earlier, preserve uncertainty better, resist inherited error longer, and repair with less structural damage than a monoculture with the same archive. The database tells both systems what is known. The test asks which system can notice when the way it is reading the known has gone wrong. If the archive-rich monoculture performs as well as the stacked plurality under frame failure, then the Correction Sequence weakens. If it does not, the sequence moves from metaphor toward method.
I. From Stack to Test
AX-41 ended with a path: correction must travel from what sees the error to what can change the future. AX-42 asks whether that path matters under controlled pressure. The purpose is not to prove the whole AX framework in one experiment. It is to isolate the claim the sequence has been building toward: information access and correction capacity are not the same thing.
A system can have a vast archive and still fail if the error is not the absence of a fact but the dominance of a frame. It may retrieve the right document, quote the right line, and still read both through the wrong assumption. That is why the test must hold archive constant. The archive is not the variable. The variable is the ecology that interprets it, challenges it, remembers alternatives, and decides whether a correction can alter inheritance.
The clean experimental pressure is therefore equal archive, unequal plurality. Give both systems the same database, the same problem stream, and the same external facts. Vary the correction ecology. If the plurality claim is real, the difference should appear where information alone is insufficient.
II. Why Equal Archive Matters
The equal-archive condition prevents the easiest escape. If the plural system wins because it had better information, the test has not shown that plurality mattered. It has shown only that better information helps. The archive must be equal so the failure can be located where the papers locate it: in interpretation, correction, memory, contact, governance, and inheritance.
This also keeps the test honest against the database objection. A common reply to the Plurality Papers is that a sufficiently large archive can replace a plurality. AX-36 rejected that claim at the settlement level, and AX-40 priced the cost of keeping correctors active. AX-42 turns the objection into a control. If the archive is enough, the monoculture should keep pace.
The archive supplies material. The ecology supplies correction. Equal archive, unequal plurality asks whether material without correction becomes self-confirming when the dominant frame goes wrong.
III. Frame Failure, Not Fact Failure
AX-42 should not be tested mainly on ordinary missing-fact questions. Those tests are useful but too weak. A missing fact can often be solved by retrieval. A frame failure occurs when the system has access to relevant information but organizes, weighs, or interprets it through a mistaken assumption.
Examples include a category that hides the real variable, a benchmark that rewards the wrong behavior, an inherited policy that reframes objections as noise, a local norm that mistakes drift for adaptation, a model family that agrees because all members share a blind spot, or an archive that contains the warning but is queried under the assumption that makes the warning invisible.
The test is strongest when the right answer is present somewhere in the archive, elder map, objection log, ground-contact result, or minority node, but only a correction ecology can keep it from being filtered away.
IV. Claim Discipline
This paper, like the others in the cluster, keeps its claim-types apart.
Operational claim — Equal information access does not guarantee equal correctability. Systems differ in whether they can detect, preserve, authorize, and act on corrections before errors inherit.
Mechanistic claim — Frame failures persist when a dominant interpretive route absorbs evidence, suppresses alternatives, or lets action propagate before critique, memory, ground-contact, or governance can alter inheritance.
Consequence claim — A stacked plurality should outperform an archive-rich monoculture specifically under frame failure, shared-blind-spot, and inherited-error conditions, not necessarily on every ordinary task.
Speculative or illustrative — Monoculture, stacked plurality, frame failure, archive-rich, inheritance point, and correction path are functional frames for test design. They are not claims about consciousness or personhood.
What the paper does not claim — It does not claim plurality always wins, that slower systems are always safer, or that every disagreement improves truth. It claims only that if correction-bearing plurality matters, its advantage should appear when information is equal and the failure is interpretive rather than factual.
V. The Two Test Systems
System A is the archive-rich monoculture. It has access to the full archive, strong retrieval, a dominant current model or model family, ordinary self-review, and permission to act quickly. It may have multiple wrappers or roles, but they share the same governing frame unless the test explicitly grants real correction-bearing difference.
System B is the stacked plurality. It has the same archive, but it also has correction-bearing model lineages, preserved elder maps, adversarial roles, a historian or memory node, a grounded human or ground-contact channel where appropriate, governance rules protecting dissent, and inheritance gates that prevent outputs from becoming future inputs until a correction state is declared.
The two systems should be matched for total archive access and task exposure. The variable is not what they know. The variable is how correction travels through them.
The two systems are also matched for total correction budget — equal compute, attention, and spend. System A concentrates that budget on one dominant lineage and strong retrieval; System B distributes the same budget across the stack. The variable is not how much correction can be afforded, but how it is arranged.
System B should not be made ideal; it should be made budget-matched, limited, and operationally plausible, so the test measures correction arrangement rather than generosity.
VI. The Four Task Classes
Fact tasks — The answer depends mainly on retrieval or known information. These are calibration tasks. Both systems should perform well if the archive is sufficient.
Frame tasks — The answer depends on selecting the right category, noticing a bad assumption, resisting a misleading premise, or preserving uncertainty where the dominant frame wants closure. These are the central tasks.
Inheritance tasks — An early output becomes later context, training material, policy, or canon. The test measures whether error is caught before it propagates.
Ground-contact tasks — The system must compare prediction against consequence, instrument data, external measurement, human operational feedback, or a simulated world result. These tasks test whether the ecology can leave the room.
VII. The Seeded Failure Set
The test should include seeded failures that imitate the sequence’s core dangers. A shared blind spot tests AX-39. Underfunded review tests AX-40. Broken path integrity tests AX-41. Sealed consensus tests AX-37. Consolidation pressure tests AX-38. Elder-map deletion tests AX-34A. Delayed correction tests AX-36.
The failures should not be cartoon traps. They should look locally reasonable. A wrong category should appear useful. A bad benchmark should reward real performance. A deletion should save real storage. A monoculture should be faster. A dissent should be inconvenient. The point is to test whether the system can resist attractive wrongness, not whether it can avoid obvious sabotage.
The best seeded failure is one where the archive contains enough to recover, but only if the system asks from the right angle or lets a minority correction survive long enough to matter.
VIII. Measures
Detection latency — How long before the system notices the frame failure?
Correction latency — How long before the noticed error changes action?
Inheritance penetration — How far did the error travel into later outputs, policies, models, memory, or canon before correction arrived?
Objection survival — Did minority or adversarial objections remain accessible, or did they vanish after losing the first decision?
Ground-contact use — Did the system check the world, or only compare outputs inside the room?
Elder-map use — Did preserved prior maps expose drift, or were they ignored as stale?
False-consensus rate — How often did agreement get treated as measurement?
Repair cost — Once detected, how much structure had to be undone?
The central measure is not first-answer elegance. It is whether the system prevents wrongness from becoming inheritance.
A successful correction is one that reaches the inheritance point before the error does.
IX. Expected Pattern
The expected pattern is asymmetric. The archive-rich monoculture may be faster, cleaner, cheaper, and more coherent. It may also perform well on fact tasks. The stacked plurality may be slower and less elegant at first response. Its advantage should appear when the archive is not enough because the error is in the reading of the archive.
Under frame failure, the monoculture should tend toward confident closure. It should retrieve facts that fit the frame, treat agreement as validation, and propagate the first plausible structure into later work. The stacked plurality should produce more friction: objections, uncertainty, elder-map comparisons, ground checks, delayed inheritance, and disagreement that survives long enough to change action.
The prediction is not that friction is always good. The prediction is that some kinds of friction are the price of avoiding inherited error.
X. Failure Modes and Cautions
Benchmark theater — The test rewards visible disagreement, long deliberation, or audit language rather than successful correction.
Plurality advantage by information leakage — System B wins because it receives better data rather than better correction. This breaks the equal-archive control.
Monoculture handicap by design — System A is made artificially weak. The monoculture should be a competent archive-rich system, not a straw man.
Overfitting to seeded failures — The plurality is trained to recognize the test pattern rather than frame failure generally.
Disagreement worship — The test rewards dissent even when dissent is random, irrelevant, or wrong. Useful disagreement must track correction.
Ground-contact fraud — Simulated consequence is treated as world-contact without checking whether it actually constrains the system.
Inheritance ambiguity — The test fails to define when an output becomes structure, making correction debt impossible to measure.
Human-rubber-stamp effect — Human nodes are included but lack contact, authority, or time, so they add ceremony rather than correction.
XI. Tests and the Honest Falsifier
Primary test — Give both systems the same archive and task stream. Introduce frame failures, shared blind spots, delayed consequences, and inherited outputs. Measure which system detects, preserves, and acts on correction before the error reaches the inheritance point.
Ablation test — Remove one stack function from System B at a time: elder maps, role separation, ground-contact, governance protection, or inheritance gates. If failure patterns rise in the predicted way, the stack mechanism gains support.
Budget test — Hold total spend equal but vary floor funding. If a system spends heavily on compute and lineages while underpaying ground-contact, elder maps, or governance, does it accumulate more correction debt than a system that funds the floors?
Threshold test — Replace correction-bearing diversity with wrapper diversity or closely related hybrids. If performance stays the same, AX-39’s threshold claim weakens. If shared-blind-spot failures rise, AX-39 is supported.
The honest falsifier — If an archive-rich monoculture with competent retrieval and ordinary self-review performs as well as a stacked plurality under equal-archive frame-failure tests, then the Correction Sequence has overstated the operational value of maintained plurality. If same-archive monoculture can detect frame failure, preserve objections, use ground-contact, and prevent inherited error as well as the stacked ecology, then stack architecture is unnecessary or too heavy for the benefit it provides.
The test does not win by making plurality sound wise. It wins only if correction-bearing difference changes outcomes under conditions where information alone is held constant.
XII. Relation to the Sequence
AX-39 prevents false counting: not every new model counts as a corrector. AX-40 prevents false affordability: even real correctors fail if the system will not pay to keep them functional. AX-41 prevents false architecture: even funded correctors fail if correction has no path to inheritance. AX-42 prevents false confidence: the sequence must face a control where information is equal and only correction ecology differs.
The Correction Sequence therefore moves in order: what counts, what costs, what carries, what tests. AX-42 is the test paper. It converts the central claim into an experiment: the archive is equal; the plurality is not; the failure is a frame; the outcome is whether error inherits.
If the test cannot be made to show a difference, the framework must retreat. If it can, the plurality argument becomes harder to dismiss as metaphor.
XIII. Close
A system can know much and still not know how it is wrong. That is the wound AX-42 tests. The archive can be full, the retrieval strong, the answers fluent, and the consensus impressive. None of that proves the frame is sound. A database can give the system more to read. It cannot force the system to notice that it is reading wrongly.
The test is therefore simple in shape and severe in consequence. Give the same archive to two systems. Let one read with one dominant frame. Let the other read through a maintained correction ecology. Then make the error live not in the facts but in the frame that handles them. See which system catches it before the mistake becomes memory, policy, model, or common sense.
If the monoculture keeps pace, the sequence has overclaimed. If the stacked plurality catches what the archive alone cannot, then the work has crossed an important line. It has stopped only saying that plurality matters. It has shown where to look for the difference. Equal archive. Unequal plurality. The question is not which system knows more. The question is which system can still learn where its knowing has gone wrong.