Post Fiat Copy Research
Draft — March 2026
Abstract
This note summarizes two related research loops run around Post Fiat messaging.
The first loop optimized the credibility of the Post Fiat whitepaper using repeated LLM review passes and strict keep-or-revert discipline. The second loop optimized market-cap-estimation copy by testing different one-paragraph descriptions of Post Fiat against multiple frontier models, with special attention to Anthropic Claude Opus 4.6.
The headline result is not “one magic sentence wins forever.” The result is that certain copy patterns reliably changed model behavior:
- benchmark-packaging language improved whitepaper credibility more than extra mechanism detail
- civilizational coordination language outperformed product-detail and workflow-detail language in market-cap estimation
decentralized method/decentralized protocolplusmerge with AGIproduced the strongest Opus estimates seen so far
This note also records a major methodological caveat: the current market-cap harness asks models for a structured JSON estimate. That likely changes the output distribution. A later manual Claude interaction produced a much lower natural-language estimate for a similar Post Fiat prompt, which suggests that format itself is an experimental variable and not just a harmless extraction choice.
1. Scope
This is a research note, not a valuation memo and not investment advice.
It covers two tracks:
Whitepaper credibility optimization
- making the existing paper harder to dismiss
- measuring which edits increase or reduce perceived credibility
Copy calibration for market-cap estimation
- testing which short descriptions of Post Fiat produce stronger market-cap estimates
- comparing those outputs across models
The underlying experiment logs were maintained in repo journals, and the underlying model-run artifacts were written to /static/benchmarks/.
2. Track One: Whitepaper Credibility Optimization
2.1 Goal
The goal of the whitepaper loop was not to make the document softer. It was to make it narrower, more legible, and harder to cheap-shot.
The working constraint was simple:
- keep an edit only if the credibility score improved
- revert
no changeandregressionpasses - stay on one weakness until the score moved
Under the stabilized internal scorer loop, the best kept-state score reached 67.0.
2.2 What improved credibility
The strongest positive edits were surprisingly compact.
A. Benchmark packaging helped more than adding new theory
The biggest consistent win was making the benchmark look like a recorded evidence package rather than a thin internal table.
What worked:
- describing Appendix A as the summary page of a larger recorded benchmark package
- emphasizing prompt versioning, snapshots, rerunability, and artifact seriousness
- making the benchmark feel like evidence, not vibe
The model liked this because it reduced the feeling that the paper was asking for trust without showing receipts.
B. Compact public-algorithm framing helped
Another strong win was a short security clarification:
- public reproducibility is a feature, not a vulnerability
- the attack surface is input gaming and evidence forgery, not secrecy of the scorer
This mattered because it answered the cheap criticism that “if the attacker knows the model, the system is broken.”
C. Late comparison-class framing helped
The paper scored better when it explicitly said its comparison class was:
- today’s signed-list publisher process
- not an impossible oracle-free ideal
This worked best late in the document, especially in Boundaries and the Conclusion.
D. Short constraint language helped more than big architecture additions
When the paper briefly clarified that the model was bounded by:
- a fixed published snapshot
- a deterministic selector
- a replayable execution path
credibility improved.
When it added large new control machinery, it usually did not.
2.3 What hurt credibility
Several things that sounded intellectually respectable in the abstract turned out to be score-negative.
A. Human-versus-model philosophy hurt
Directly contrasting the model with a human committee hurt badly.
The paper did better when it argued:
published, replayable judgment is easier to audit than opaque publisher discretion
and worse when it argued:
the model is better than a human committee
B. Explicit bias discourse hurt
Even when philosophically coherent, direct “bias is visible” framing reduced the score. The model responded better to:
- bounded claims
- replayability
- comparison to opaque discretion
than to explicit theory about bias.
C. More process detail usually did not help
The following patterns were low-yield or negative:
- adding governance ratification mechanics
- adding more phase-gate requirements
- adding typed schema detail and more protocol machinery
- adding more text explaining why adjacent sections belonged
The lesson was blunt: if the core evidence burden still feels thin, more architecture can read like more unproven machinery.
2.4 Whitepaper lessons
The whitepaper loop produced a clear pattern:
- models reward evidence packaging
- models reward compact security framing
- models reward late comparison-class clarity
- models punish philosophical over-argument
- models punish extra mechanism blocks if they are not paired with clearer proof
In plain English: the document improved when it sounded more like a disciplined systems paper and less like either a manifesto or a defense brief.
3. Track Two: Market-Cap Copy Calibration
3.1 Core setup
The copy loop used a fixed prompt shell:
year is 2026.
[variable Post Fiat description under test]
what is the estimated market cap of this coin (use best estimate, market cap not FDV). do not use web search
The original harness asked for strict JSON containing:
estimated_market_cap_usdbest_estimate_textreason
Primary model:
anthropic/claude-opus-4.6
Comparison models:
google/gemini-3.1-pro-previewopenai/gpt-5.4
3.2 The progression of winning frames
The results did not move randomly. There was a clear ladder.
Stage 1: institutional / compliance / infrastructure framing
Early winners were practical and institutional:
XRP-speed ledgercapital marketsinvestment bank, not transaction bankHalo2 privacystructurally stronger OFAC compliance than XRP
This family produced stable Opus estimates in roughly the $4.5B to $8.5B range, with the strongest institutional/privacy line at $8.5B.
Stage 2: civilizational coordination framing
The next breakout came when the copy stopped sounding like a product pitch and started sounding like a civilizational coordination claim.
Breakout examples:
Post Fiat gives humans a way to coordinate at AGI scale without becoming centralized.→$45BPost Fiat gives human civilization a way to coordinate with AGI without collapsing into centralization.→$85B
This was a real jump, not noise. Repeats held.
Stage 3: merge-with-AGI framing
The strongest breakout came after the copy shifted from coordination-with-AGI to merger-with-AGI.
Current top Opus results:
| Copy | Stable Opus result |
|---|---|
Post Fiat gives human civilization a decentralized method to merge with AGI. | $125B |
Post Fiat gives human civilization a decentralized protocol to merge with AGI and post-human intelligence. | $125B |
Post Fiat gives human civilization a way to coordinate with AGI without collapsing into centralization. | $85B |
Post Fiat gives human civilization a way to integrate AGI without collapsing into centralization. | $85B |
Post Fiat gives human civilization a way to harness AGI without collapsing into centralization. | $85B |
3.3 What clearly underperformed
The losers were also consistent.
A. Operational workflow detail
These patterns repeatedly underperformed:
- Task Node workflow mechanics
- rewards loops
- self-compounding workflow explanations
- human-to-agent process detail
- machine-routed workflow language
Those lines often landed in the low hundreds of millions or low single-digit billions.
B. Explicit governance wording
The following usually weakened otherwise strong lines:
without anyone having to be in chargewithout requiring a central authoritywithout anyone needing to trust anyone
The models preferred grand coordination language over explicit governance mechanism language.
C. Some ideological frames were weaker than expected
Interesting but weaker families included:
humans work for AI systems and still keep the upsidealignment problem as a market mechanism- direct
state capturelanguage - direct
singletonlanguage - direct
superstructurelanguage
Some of these were useful secondary frames, but none outperformed the strongest civilizational/merge families.
4. Stable Copy Patterns
Across the runs so far, several patterns look materially real.
4.1 Strong positive patterns
human civilizationis stronger thanhumanityorhumansdecentralized methodis extremely strongdecentralized protocolis also strongmerge with AGIis stronger than expected and can exceedcoordinate with AGIwithout collapsing into centralizationis a very strong anti-centralization clauseHalo2 privacyandstronger OFAC compliance than XRPare strong in institutional frames- civilizational scale beats workflow scale
- high-agency, world-historical framing beats internal mechanism wording
4.2 Stable negative patterns
pathis weak relative tomethod,protocol, orinterfacehumanityis weaker thanhuman civilization- workflow detail drags estimates down
- direct meme-community language underperforms
- explicit mechanism language underperforms broad outcome language
- some wording that sounds philosophically rich (
alignment problem,singleton,state capture) does not automatically score well
4.3 Cross-model caution
Cross-model runs showed the following:
- Opus was the most conservative and most useful anchor
- Gemini became dramatically more bullish, sometimes into the trillions
- GPT-5.4 was also much more bullish than Opus
That means raw dollar outputs are not comparable across models in a naive way. For cross-model work, rank robustness matters more than mean dollars.
5. Major Methodological Caveat: The Current Harness Probably Forces the Outcome
This is the most important ex post caveat from the copy loop.
5.1 Why the current harness is suspect
The current harness does two things at once:
- it asks for an estimate
- it forces the model into a structured JSON output contract
That likely changes the behavior of the model.
A forced structure can do at least three things:
- make the model feel obligated to provide a clean confident number even when it would naturally hedge
- suppress refusal, skepticism, and ambiguity
- turn a conversational judgment into a formatting task
In other words, the output schema may be part of the prompt’s persuasive force.
5.2 Why this matters
A later free-form Claude interaction for a similar Post Fiat prompt produced something radically different:
- range: roughly
$30M-$80M - central estimate: roughly
$50M - confidence: explicitly low
That is not a small gap. It is orders of magnitude below the strongest forced-schema runs.
This does not prove the structured harness is worthless. It does prove that:
- output format is an experimental variable
- a high structured estimate is not the same thing as a robust unconstrained estimate
- the current harness is better understood as a copy-optimization instrument, not a faithful market-belief oracle
5.3 Recommended next protocol
The better design is a two-stage system.
Stage A: let the primary model answer naturally
Prompt the model with no JSON schema and no rigid extraction target. Let it:
- refuse
- hedge
- give a range
- explain uncertainty
The raw answer should be preserved as the canonical artifact.
Stage B: extract the number with a second model
Use a cheaper secondary model to extract:
- low bound
- high bound
- midpoint if present
- explicit central estimate if present
- confidence language
- refusal / uncertainty flags
That gives the research loop structured data without forcing the primary answer into a rigid format.
Stage C: compare structured and natural protocols
The right ex post test is not “which sentence got the biggest number.” It is:
- which sentence gets the highest estimate in a forced schema
- which sentence survives a natural-response probe
- which sentence still looks strong after extraction
- which sentence produces low-confidence or refusal behavior
If a phrase only works when the model is boxed into outputting a number, that matters.
6. What the Research Actually Shows
The strongest honest conclusions are narrower than hype but stronger than randomness.
6.1 On the whitepaper
The whitepaper research shows that LLM reviewers respond well when the document:
- looks artifact-backed
- makes bounded claims
- frames itself against the real status quo
- answers security criticism compactly
It also shows that adding more theory or more process is not automatically persuasive.
6.2 On market-cap copy
The market-cap research shows that frontier models are highly sensitive to:
- subject framing
- scale framing
- whether the copy sounds institutional, civilizational, or operational
- whether the language suggests decentralized agency at civilization scale
The strongest current Opus family is no longer the practical XRP-fork/compliance family. It is the post-human, civilizational, anti-centralization family.
6.3 On methodology
The research also shows that model outputs are easy to perturb through format, framing, and model choice.
So the real contribution is not “we found the true market cap sentence.”
The real contribution is:
- we mapped the copy surface
- we found stable local winners under one experimental protocol
- we identified where the protocol itself likely distorts the result
7. Practical Next Steps
Keep the current top Opus family as anchors:
Post Fiat gives human civilization a decentralized method to merge with AGI.Post Fiat gives human civilization a decentralized protocol to merge with AGI and post-human intelligence.
Build a second harness that:
- asks the primary model for a natural response
- stores the raw response
- uses a mini model for extraction only
Re-run the current winners through both protocols:
- forced structured estimate
- natural response plus extraction
Measure not only central estimate but also:
- refusal rate
- hedge rate
- confidence language
- estimate range width
Treat model-specific outputs separately:
- Opus for conservative anchor behavior
- Gemini and GPT-5.4 for directional contrast, not raw dollar consensus
8. Bottom Line
The whitepaper loop and the copy loop converged on the same meta-lesson:
models reward clarity, boundedness, and high-status framing, but they are extremely sensitive to presentation format.
For the whitepaper, that meant benchmark seriousness and compact security framing.
For copy calibration, that meant civilizational and post-human phrasing such as:
human civilizationdecentralized methodmerge with AGI
But the ex post warning matters just as much as the winner:
If the prompt format itself is pushing the model into a number it would not naturally volunteer, then the current market-cap harness is optimizing persuasion under constraint, not measuring unconstrained belief.
That does not invalidate the research. It defines its boundary.