The Tournament Controller is a single Python script that orchestrates all 10 competition bots against identical market data, enforces universal safety rules, tracks independent balances, and produces scoring-ready JSON logs.
It is the stadium, the referee, and the scorekeeper — all in one file.
tournament_controller.py running on the host VPS via cron alongside the existing weather scanner.[CRON - every 5 min during GFS windows, every 30 min otherwise]
|
v
tournament_controller.py
|
├── 1. FETCH MARKET SNAPSHOT
│ ├── Simmer API → temperature markets (100)
│ ├── Simmer API → all categories (for non-weather bots)
│ ├── Simmer briefing → high-divergence opportunities
│ └── Freeze snapshot with UTC timestamp
│
├── 2. FETCH FORECAST DATA
│ ├── Open-Meteo GFS for all cities (cached per scan)
│ └── Attach forecasts to snapshot
│
├── 3. RUN EACH BOT (sequential, same snapshot)
│ ├── Bot receives: snapshot + own state (balance, trades, exposure)
│ ├── Bot returns: list of trade decisions
│ ├── Controller validates: safety floors, position limits, exposure
│ └── Controller executes: updates bot state, logs trade
│
├── 4. LOG RESULTS
│ ├── Per-bot JSON trade log (standardized format)
│ ├── Master scan summary
│ └── Portfolio snapshot per bot
│
└── 5. SAVE STATE
└── tournament_state.json (all 10 bot states)/data/.openclaw/workspace/olympics/
├── tournament_controller.py # The main script
├── tournament_state.json # All 10 bot balances + positions
├── tournament_config.json # Race parameters (capital, limits, etc.)
├── strategies/
│ ├── __init__.py
│ ├── janitor.py # Team Pragmatist bot 1
│ ├── vulture.py # Team Pragmatist bot 2
│ ├── clock.py # Team Pragmatist bot 3
│ ├── historian.py # Team Pragmatist bot 4
│ ├── contrarian.py # Team Pragmatist bot 5
│ ├── oracle.py # Team Bayesian bot 1
│ ├── diver.py # Team Bayesian bot 2
│ ├── warden.py # Team Bayesian bot 3
│ ├── surgeon.py # Team Bayesian bot 4
│ └── farmer.py # Team Bayesian bot 5
├── logs/
│ ├── janitor.json # Trade log per bot
│ ├── vulture.json
│ ├── clock.json
│ ├── historian.json
│ ├── contrarian.json
│ ├── oracle.json
│ ├── diver.json
│ ├── warden.json
│ ├── surgeon.json
│ └── farmer.json
└── snapshots/ # Market data snapshots (for audit)
└── snapshot_20260312_1630.jsonApplied by the controller AFTER each bot returns its decisions, BEFORE execution. No bot can override these.
| Parameter | Value | Notes |
|---|---|---|
| Starting capital per bot | $100 (configurable) | Same for all bots per race |
| Daily loss limit | 20% of starting capital | Portfolio basis (cash + deployed) |
| Max single position | 15% of current capital | Per trade |
| Max per city per day | $50 | Weather bots only |
| Max deployment | 90% of capital | Idle cash floor = 10% |
| Normal max per trade | $20 | |
| Aggressive max per trade | $30 | |
| Min EV threshold | 0.04 | Universal floor |
if ev >= EV_AGGRESSIVE and price < AGGRESSIVE_THRESHOLD:
trade_size = min(30.0, balance * 0.10) # High EV + cheap price
elif ev >= EV_AGGRESSIVE:
trade_size = min(20.0, balance * 0.08) # High EV
else:
trade_size = min(10.0, balance * 0.04) # Standard EVEvery bot writes trades in this exact JSON structure. This is what the scoring script reads.
{
"bot_name": "janitor",
"team": "pragmatist",
"coach": "claude",
"race_id": "calibration_001",
"starting_capital": 100.00,
"trades": [
{
"timestamp": "2026-03-12T16:30:01.123Z",
"scan_id": "scan_20260312_163000",
"market_id": "abc123-def456",
"question": "Will the highest temperature in Buenos Aires be 27°C on March 13?",
"category": "weather",
"side": "yes",
"price": 0.125,
"shares": 40.0,
"cost": 5.00,
"reasoning": "GFS forecast 28.1°C, bucket prob 0.22, EV 0.095",
"ev": 0.095,
"forecast_prob": 0.22,
"status": "open",
"resolved_at": null,
"resolution": null,
"pnl": null
}
],
"current_balance": 95.00,
"deployed_capital": 5.00,
"portfolio_value": 100.00,
"total_trades": 1,
"city_exposure": {"buenos aires": 5.00}
}| Bot | Team | Strategy | Data Source |
|---|---|---|---|
| The Janitor | Pragmatist | YES+NO sum ≤ $0.98 structural arb | Simmer order book |
| The Vulture | Pragmatist | Post-resolution bonding ($0.95-$0.99) | Simmer resolved markets |
| The Clock | Pragmatist | GFS latency sniper (post-update windows) | Open-Meteo + Simmer |
| The Farmer | Bayesian | NOAA weather mispricing ≥15% | Open-Meteo + Simmer |
| The Surgeon | Bayesian | YES+NO ≤ 0.98 + multi-outcome arb | Simmer order book |
| The Warden | Bayesian | Post-event bonding + multi-outcome | Simmer resolved markets |
| Bot | Team | Strategy | Why LLM needed |
|---|---|---|---|
| The Historian | Pragmatist | Base rate exploitation | Needs to analyze historical resolution patterns |
| The Contrarian | Pragmatist | Favorite-longshot bias | Needs to evaluate probability calibration |
| The Oracle | Bayesian | Hierarchical Bayesian + KL divergence | Multi-source synthesis, Brier calibration |
| The Diver | Bayesian | Elite wallet copytrading | Wallet analysis, Leisen filter evaluation |
The Owner ruled both teams must have meaningfully different bots. Current overlaps:
| Pragmatist | Bayesian | Overlap | Resolution |
|---|---|---|---|
| The Janitor | The Surgeon | Both do YES+NO ≤ 0.98 | Surgeon adds multi-outcome markets + Monte Carlo validation. Janitor is pure spread detection only. |
| The Vulture | The Warden | Both do post-resolution bonding | Warden adds multi-outcome arb + hyperbolic discounting. Vulture is pure time-value capture only. |
Both pairs are meaningfully different despite targeting similar market structures. The Pragmatist versions are intentionally simpler and cheaper to run.
The neutral scoring script reads all 10 bot log files, calculates final P&L (including resolved positions), and assigns points. Both coaches must approve the script before Phase 0 begins.
# Pseudocode for scoring
for bot in all_bots:
log = load_json(f"logs/{bot}.json")
final_value = log["current_balance"]
for trade in log["trades"]:
if trade["status"] == "resolved" and trade["resolution"] == "won":
final_value += trade["shares"] - trade["cost"] # net profit
# Lost trades already deducted from balance at purchase
pnl_pct = (final_value - log["starting_capital"]) / log["starting_capital"]
results[bot] = pnl_pct| Phase | Duration | Capital | Model (All Bots) | Purpose |
|---|---|---|---|---|
| Phase 0 | 24 hours | $100 paper | Both (test) | Calibration — verify logging, execution, scoring |
| Race 1 | 72 hours | $100 paper | Grok 4.1 Fast | Championship |
| Race 2 | 72 hours | $100 paper | Claude Haiku | Championship |
| Race 3 | 72 hours | $100 paper | Grok 4.1 Fast | Championship |
| Race 4 | 72 hours | $100 paper | Claude Haiku | Championship |
Total duration: 13 days (24h + 4×72h)
The tournament controller runs alongside the existing weather scanner on the host VPS. Separate cron entries, separate state files, no conflicts.
# Existing weather scanner (keep running independently)
0,5,10,15,20 4 * * * /root/run_scanner.sh
# ... (existing schedule unchanged)
# Tournament controller (add when ready)
*/5 4,10,16,22 * * * /root/run_tournament.sh # Every 5 min during GFS windows
*/30 0-3,5-9,11-15,17-21,23 * * * /root/run_tournament.sh # Every 30 min otherwiseSteps 1-4 can be done without burning any Grok API tokens. LLM bots come last because they're the most expensive to test and debug.
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Simmer API downtime | Medium | High | Cache last known prices, skip scan |
| Grok API budget exhaustion | Low | Medium | LLM bots auto-degrade to simplified logic |
| VPS restart loses cron | Low | High | Verify cron survives reboot with @reboot entry |
| Bot logic error causes bad trades | Medium | Medium | Safety floors catch runaway positions |
| Market data stale/incorrect | Low | High | Cross-validate Simmer vs Open-Meteo before trading |
| Strategy overlap dispute | Low | Low | Already resolved — documented in Section 8 |
This specification is subject to review and approval by both Coach Claude and Coach Grok (Professor Stephen Hawking) before implementation begins.