Create a tool that transforms complex infectious disease forecast model performance metrics into accessible, visually appealing "baseball card" style visualizations, while simultaneously generating structured YAML documentation compatible with machine learning standards. This bridges the gap between epidemic modelling and ML communities, enabling standardised performance reporting and model discovery.
The modelscorecard package generates both static image files (PNG/PDF) and Hugging Face-compatible YAML documentation from a single scorecard object.
Each visual card displays model identification, five key performance metrics with trends, a performance timeline, and achievement badges.
The YAML output provides machine-readable evaluation results with epidemic-specific extensions.
┌─────────────────────────────────────────────────────────────────┐
│ [Logo] MODEL NAME PAR: +2.3%/+1.9% (overall) │ <- Header (Rows 1-3)
│ Team/Organization Nat: ▁▃█▂ (1-4w) │
│ Log: ▂▄█▃ (1-4w) │
├─────────────────────────────────────────────────────────────────┤
│ ┌──────────┬──────────┬──────────┬──────────┬────────────────┐│ <- Metrics (Rows 4-8)
│ │Coverage │ WIS │ Rel │ Bias │ Ensemble ││
│ │ 50%: 48%↓│ Nat: 42↑│ Skill │ -0.02 ↓ │ Contrib ││
│ │ 90%: 87%↓│ Log:0.38│Nat: 0.95↑│ │ Nat: +3.2%↑ ││
│ │ │ │Log: 0.87↑│ │ Log: +2.8%↑ ││
│ └──────────┴──────────┴──────────┴──────────┴────────────────┘│
├─────────────────────────────────────────────────────────────────┤
│ [Performance Timeline Graph - Model vs Others] │ <- Timeline (Rows 9-11)
├─────────────────────────────────────────────────────────────────┤
│ Forecasts: 127 | Since: 2023-01 | Target Coverage: 95% │ <- Footer (Row 12)
│ Best: 2-week ahead | Most consistent Q3 2024 │
└─────────────────────────────────────────────────────────────────┘Note on trend arrows: ↑ indicates improvement (or increase for neutral metrics), ↓ indicates deterioration (or decrease) compared to the previous evaluation period (default: 30 days)
Note on PAR sparklines: The mini bar charts (▁▂▃▄█) show relative PAR performance across forecast horizons, with taller bars indicating better performance
model_id: "EuroCOVIDhub-ensemble"
model_name: "European COVID-19 Forecast Hub Ensemble"
tags: ["epidemic-forecasting", "covid-19", "ensemble-model", "scoringutils"]
license: "cc-by-4.0"
library_name: "scoringutils"
model-index:
- name: "EuroCOVIDhub-ensemble"
results:
- task:
type: "epidemic-forecasting"
name: "COVID-19 Case & Death Forecasting"
dataset:
type: "covid19-forecast-hub"
name: "European COVID-19 Forecast Hub"
metrics:
- name: "Weighted Interval Score"
type: "wis"
value: 42.3
args: {scale: "natural"}
- name: "Performance Above Replacement (PAR)"
type: "performance_above_replacement"
value: 2.3
args: {scale: "natural"}
# Extended metadata
scoringutils:
evaluation_date: "2024-03-15"
n_forecasts: 127
achievements: ["Best 2-week ahead", "Most consistent Q1 2024"]
model_operations:
team_size: 5
hours_per_week: 20
automation_level: "fully_automated"Logo Section (Columns 1-4)
Model Identity (Columns 5-10)
Performance Visualization (Columns 11-16)
Five equal-width metric cards, each containing:
1. Coverage Metric
2. WIS Metric
3. Relative Skill
4. Bias Metric
5. Ensemble Contribution
Performance Over Time Chart
Left Section (Columns 1-8)
Right Section (Columns 9-16)
Colour Palette
General Styling
modelscorecard/
├── R/
│ ├── create-scorecard.R # Main entry point
│ ├── scorecard-class.R # S3 class definitions
│ ├── scorecard-methods.R # S3 methods for forecast types
│ ├── layout.R # Grid layout engine
│ ├── components-header.R # Header component
│ ├── components-metrics.R # Metrics component
│ ├── components-timeline.R # Timeline component
│ ├── components-footer.R # Footer component
│ ├── themes.R # Theme definitions
│ ├── metrics-calc.R # Metric calculations
│ ├── export-yaml.R # YAML export functions
│ └── utils.R # Helper functions
├── inst/
│ ├── templates/ # YAML templates
│ └── assets/ # Fonts, logos
└── vignettes/
├── getting-started.Rmd
└── customisation.RmdModel Card Functions
# Create model metadata card
create_modelcard <- function(
model_name,
team_name,
model_operations = list(),
model_structure = list(),
data_requirements = list(),
...
)
# Create performance scorecard
create_scorecard <- function(
scores_data, # scores object from scoringutils::score()
target_model, # character: model to visualise
model_col = "model", # column identifying models
comparison_period = 30,# days for trend comparison
theme = "default", # visual theme name
components = NULL # list of custom components
)
# Display methods
plot.scorecard <- function(x, ...)
plot.modelcard <- function(x, ...) # Future: could show metadata summary
# Export methods
to_yaml <- function(x, file, format = "huggingface", ...)
save_scorecard <- function(scorecard, file, width = 5, height = 3.5, dpi = 300)
# Combine metadata and scores
combine_cards <- function(modelcard, scorecard)Component System
function(scores, model_info, theme, ...)Grid Layout Engine
gridExtra or cowplot over patchworkMetric Calculations
scoringutils::summarise_scores()Theme System
ggplot2: Core plottinggridExtra/cowplot: Grid layoutggtext: Rich text formattingragg: High-quality outputshowtext: Custom fontsscales: Formatting utilitiesdplyr: Data manipulationyaml: YAML exportClass Definition
scorecardmodel: Target model nametheme: Applied themecomponents: List of component plotsmetadata: Processing metadatayaml_data: Structured data for YAML exportMethods
print.scorecard(): Display summaryplot.scorecard(): Render visualisationcreate_scorecard.forecast_quantile(): Quantile implementationcreate_scorecard.forecast_sample(): Sample implementation (future)create_scorecard.forecast_binary(): Binary implementation (future)