Content is user-generated and unverified.

Deep Analysis: Unjournal Evaluator Suggestions vs. Paper Revisions

Paper: Karger et al. "Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament"

Working Paper: August 2023
Published Version: International Journal of Forecasting 41 (2025) 499–516
Unjournal Evaluation: August 2024

Summary of Evaluator Suggestions (from Evaluation Summary)

The Unjournal evaluation identified the following areas for improvement:

A. Data Sharing

"Data sharing was suggested"
"Is the data shared in a clear and useful way? How could it be made more useful?"

B. Design and Implementation

Questions framing - "Were the prediction questions well-framed?"
Definition of expertise - "Was the choice of 'experts' reasonable?"
Lack of training - noted as criticism
Rare events description - "What specifically are the most appropriate methods for eliciting these rare event forecasts?"
Anchoring behavior - "Could there be substantial anchoring behavior as a result of their displaying the 'Prior Forecasts'?"
Delphi process considerations (implied in "Issues meriting further evaluation")

C. Statistical/Quantitative Analysis

"CI not capturing the uncertainty properly"
"Dependence measures not estimated"
"Statistical inference not performed"
"Aggregation methods not appropriate"

D. Attrition Bias

"Did they adequately consider the potential for attrition bias?"

Detailed Comparison by Suggestion

1. DATA SHARING / REPLICATION PACKAGE

2023 Working Paper:

No formal data availability statement
No replication package mentioned
Some scattered references to "data available here" in question descriptions

2025 Published Paper:

Explicit Data Availability section added
Direct quote: "Code and data to replicate tables, figures and data analysis are provided as part of a replication package. The replication package is available at https://github.com/forecastingresearch/xpt-ijf-replication. This package is adapted from a previously-published repository associated with initial reporting on the XPT, available at https://github.com/forecastingresearch/xpt-lib."
Dataset links prominently displayed in article metadata

VERDICT: ✅ ADDRESSED

Probability that change was caused by Unjournal feedback: 25-35%

Rationale:

The International Journal of Forecasting likely requires data availability statements as standard practice
However, Unjournal feedback was explicit on this point and could have reinforced attention to data sharing
The xpt-lib repository was created before the evaluation (mentioned as "previously-published"), but the xpt-ijf-replication package may have been enhanced post-evaluation
Most likely this reflects journal requirements + general good practice, with possible reinforcement from Unjournal feedback

2. DELPHI PROCESS DISCUSSION

2023 Working Paper:

ZERO mentions of "Delphi" (confirmed via text search)
No comparison to Delphi methodology
No discussion of anonymity trade-offs

2025 Published Paper:

Substantial new discussion of Delphi methods

Methods section quote: "the multi-stage design of this process was heavily inspired by Delphi processes (Rowe & Wright, 2001) but deviated from traditional Delphi methods in significant ways. In particular, forecasters were not necessarily anonymous beyond Stage 1, and forecasters were given access to each other's forecasts and rationales at earlier stages than in many Delphi processes."

Discussion section quotes: "One question is whether the XPT would have yielded more evidence of opinion convergence if it had been designed more like a Delphi process, with anonymity (e.g., randomly generated usernames, norms against participants naming themselves in discussions) and more tightly controlled feedback (for a review of Delphi approaches, see Rowe and Wright (1999))."

"The XPT already bears some family resemblance to Delphi, given the independent forecasts in Stage 1 that participants brought to the group discussions. The Delphi family of methods is designed to encourage participants to focus on the substantive merits of arguments for and against points of view, regardless of the status of the person advancing the arguments."

"But that judgment call may have been a mistake. People find it easier to change their minds when they are not anchored down by past public commitments to positions (Festinger, 1957), and they also find it easier to change their minds if they can do so in privacy, reducing the risk of appearing weak or confused. The next operationalization of the XPT might do well to encourage or require the expression of anonymous judgments throughout all four stages of deliberation."

VERDICT: ✅ SUBSTANTIALLY ADDRESSED

Probability that change was caused by Unjournal feedback: 30-45%

Rationale:

The International Journal of Forecasting readership would expect Delphi discussion - this is a core methodology in the forecasting literature
The Unjournal evaluation explicitly flagged Delphi considerations
The addition is substantial and directly addresses evaluator concerns about design trade-offs
However, IJF peer reviewers would likely have requested this comparison independently
Timeline suggests IJF submission was likely concurrent with or shortly after Unjournal evaluation publication

3. ANCHORING BEHAVIOR FROM DISPLAYING "PRIOR FORECASTS"

2023 Working Paper:

28 mentions of "anchor" - BUT almost all refer to "biological anchors" (a technical forecasting methodology reference, citing Ajeya Cotra's work)
One mention of anchoring methods: "How much lower or higher is your extinction risk estimate than an anchor or comparison value?"
No discussion of PSYCHOLOGICAL anchoring from displaying prior forecasts

2025 Published Paper:

1 mention of "anchor" - specifically addressing psychological anchoring to public positions
Direct quote: "People find it easier to change their minds when they are not anchored down by past public commitments to positions (Festinger, 1957)"
Explicitly acknowledges this as a potential design limitation

VERDICT: ✅ PARTIALLY ADDRESSED

Probability that change was caused by Unjournal feedback: 35-50%

Rationale:

The evaluators explicitly raised this concern: "Could there be substantial anchoring behavior as a result of their displaying the 'Prior Forecasts'?"
The 2025 paper now directly acknowledges psychological anchoring as a design limitation
This appears to be a direct response to critiques (though could also be from IJF reviewers)
The specific focus on anchoring to "past public commitments" aligns precisely with the evaluator concern

4. ATTRITION BIAS

2023 Working Paper:

Mentions 34% attrition rate explicitly
Substantial discussion of recruitment challenges and attrition as "down-to-earth problems"
Quote: "The project was time-consuming, and our attrition rate was roughly 34% from initial forecasts to completion of the tournament four months later."
Discusses reasons for attrition and plans for future improvements

2025 Published Paper:

Brief footnote: "Of the 111 forecasters who completed all four stages of the tournament, 72 were superforecasters and 39 were experts. Although 111 completed all stages of the tournament, we report data from forecasters who attrited from the tournament in relevant analyses below."
Less detailed discussion than 2023 version
No formal attrition bias analysis added

VERDICT: ❌ NOT ADDRESSED (actually less content)

Probability that change was caused by Unjournal feedback: 0%

Rationale:

The 2025 paper has LESS discussion of attrition, not more
No formal attrition bias analysis was added as evaluators suggested
The streamlining likely reflects journal word limits, not evaluation feedback
This suggestion was clearly NOT incorporated

5. STATISTICAL INFERENCE / DEPENDENCE MEASURES / AGGREGATION METHODS

2023 Working Paper:

44 mentions of statistical terms (regression, correlation, bootstrap, confidence interval, etc.)
Bootstrapped confidence intervals presented for medians
Some correlation discussion (with caveats about outliers)

2025 Published Paper:

9 mentions of statistical terms (substantially fewer)
Same bootstrapped CI approach maintained
Quote: "We also present bootstrapped confidence intervals for each median."
No new statistical inference added
No dependence measures added
No changes to aggregation methods

VERDICT: ❌ NOT ADDRESSED

Probability that change was caused by Unjournal feedback: 0%

Rationale:

The statistical methodology is essentially unchanged
The 2025 paper is actually MORE condensed statistically (fewer details)
Evaluator criticisms about inadequate statistical inference were not incorporated
No dependence measures were added
Aggregation methods remain the same (median with bootstrapped CIs)

6. QUESTIONS FRAMING

2023 Working Paper:

17 mentions of framing issues
Extensive appendices with detailed question wordings
Some discussion of how questions were framed

2025 Published Paper:

0 mentions of question framing methodology
Streamlined presentation
No additional discussion of framing limitations

VERDICT: ❌ NOT ADDRESSED

Probability that change was caused by Unjournal feedback: 0%

Rationale:

No additional discussion of framing limitations was added
Content was actually reduced for journal format

7. DEFINITION OF EXPERTISE / EXPERT SELECTION

2023 Working Paper:

Describes expert recruitment process
Notes limitations of sample representativeness

2025 Published Paper:

Same description of recruitment
Quote: "We received more than 500 expressions of interest, screened these respondents for expertise, and offered slots to the best qualified after a review of their backgrounds."
Footnote: "Two independent analysts categorized applicants based on publication records and work history. When the analysts disagreed, a third independent rater resolved disagreement after a group discussion."

VERDICT: ⚠️ NO CHANGE

Probability that change was caused by Unjournal feedback: 0%

Rationale:

Expert selection methodology is presented identically
No additional justification or critique incorporated

8. RARE EVENTS ELICITATION METHODS

2023 Working Paper:

18 mentions of rare events/low probability concepts
Explicit "Next Steps" section: "Identify and validate better methods of eliciting low-probability forecasts"
Discussion of alternative elicitation methods tested (1-in-X format)
Quote: "Researchers have long known about the instability of tiny probability estimates"

2025 Published Paper:

0 direct mentions of rare events elicitation methodology
Only brief acknowledgment in abstract: "we acknowledge limits on what even skilled forecasters can achieve in anticipating rare or unprecedented events"

VERDICT: ❌ NOT ADDRESSED (less content)

Probability that change was caused by Unjournal feedback: 0%

Rationale:

The 2025 paper has LESS methodological discussion of rare events elicitation
The "Next Steps" section with research agenda was removed
This reflects journal condensation, not evaluation feedback incorporation

SUMMARY TABLE

Evaluator Suggestion	Addressed?	Probability Causal
Data sharing/replication	✅ Yes	25-35%
Delphi process discussion	✅ Yes	30-45%
Anchoring behavior	✅ Partially	35-50%
Attrition bias	❌ No	0%
Statistical inference	❌ No	0%
Dependence measures	❌ No	0%
Aggregation methods	❌ No	0%
Questions framing	❌ No	0%
Definition of expertise	❌ No	0%
Rare events elicitation	❌ No	0%

OVERALL ASSESSMENT

What Changed:

Data availability/replication package - Formal statement and GitHub links added
Delphi discussion - Substantial new content positioning XPT relative to Delphi methodology
Anchoring acknowledgment - New discussion of psychological anchoring to public positions
Academic structure - New Discussion section with methodological reflections
Author list - Reduced from 11 to 5 authors

What Did NOT Change (Despite Evaluator Suggestions):

Statistical methodology (no new inference, dependence measures, or aggregation methods)
Attrition bias analysis (actually less discussion)
Question framing justification
Expert selection methodology
Rare events elicitation discussion (actually less)

Overall Probability Assessment

Probability that Unjournal evaluation materially influenced ANY revision: 20-35%

Probability that Unjournal evaluation was the PRIMARY cause of revisions: 10-20%

Reasoning:

Timeline Constraints: The Unjournal evaluation was published in August 2024. The paper was published in IJF 2025 (accepted ~November 2024). This is an extremely tight window for evaluation → revision → resubmission → acceptance.
Parallel Processes: The paper was almost certainly already in IJF peer review when the Unjournal evaluation was published. IJF reviewers likely made similar suggestions independently.
Standard Journal Requirements: Data availability statements are standard for IJF. Delphi discussion would be expected by forecasting journal readership.
Selective Incorporation: Only 3 of 10 specific suggestions show any evidence of being addressed. The most technical statistical suggestions (which would require substantial new analysis) were not incorporated.
Author Non-Response: The Unjournal summary explicitly states: "The authors have not provided a written response to these evaluations."

Most Likely Scenario:

The authors were preparing the paper for academic journal submission when the Unjournal evaluation was published. IJF peer reviewers independently requested similar changes (Delphi discussion, data availability). The Unjournal feedback may have:

Reinforced the importance of certain points
Provided additional motivation for addressing specific issues
Influenced wording or framing of new sections

However, the IJF peer review process was almost certainly the primary driver of substantive revisions. The tight timeline and lack of formal author response to Unjournal strongly suggests parallel rather than causal processes.

Content is user-generated and unverified.