The Quiet Power of S3 – Why R's Simplest OOP System Is So Effective
R’s object-oriented programming landscape includes several systems (S3, S4, R6, and the emerging S7), but for most practical data science and engineering work in 2025, S3 remains the clear winner for its balance of simplicity, power, and seamless integration with the ecosystem.
After years of building production analytics packages – including for Health New Zealand’s immunisation programmes – I’ve seen firsthand how S3’s minimalism lets us deliver rich, extensible behaviour without fighting the language.
A More Realistic Vaccination Example
Real public health analytics rarely treat all vaccines the same. Rules for “fully vaccinated” or “up-to-date” vary by programme.
Let’s mock an event fact table with two vaccines:
- COVID-19: up-to-date = ≥1 dose in the last 6 months
- MMR: fully vaccinated = exactly 3 doses (ever)
library(tibble)
library(lubridate)
vaccination_events <- tribble(
~event_id, ~person_id, ~vaccine, ~dose_number, ~administration_date,
100001, "P001", "COVID", 1, ymd("2025-02-10"),
100002, "P001", "COVID", 2, ymd("2025-07-15"), # within last 6 months
100003, "P002", "COVID", 1, ymd("2024-11-20"), # >6 months ago
100004, "P003", "MMR", 1, ymd("2010-03-05"),
100005, "P003", "MMR", 2, ymd("2010-04-10"),
100006, "P003", "MMR", 3, ymd("2011-08-20"),
100007, "P004", "MMR", 1, ymd("2015-06-01"),
100008, "P004", "MMR", 2, ymd("2015-07-15"),
100009, "P005", "COVID", 1, ymd("2025-09-01") # within last 6 months
)
vaccination_events
Console output:
# A tibble: 9 × 5
event_id person_id vaccine dose_number administration_date
<dbl> <chr> <chr> <dbl> <date>
1 100001 P001 COVID 1 2025-02-10
2 100002 P001 COVID 2 2025-07-15
3 100003 P002 COVID 1 2024-11-20
4 100004 P003 MMR 1 2010-03-05
5 100005 P003 MMR 2 2010-04-10
6 100006 P003 MMR 3 2011-08-20
7 100007 P004 MMR 1 2015-06-01
8 100008 P004 MMR 2 2015-07-15
9 100009 P005 COVID 1 2025-09-01
Vaccine-Specific Summaries with S3
We create separate S3 classes that share the same raw events but implement different business logic.
# Base constructor – holds the raw events
new_vacc_summary_base <- function(events_df) {
structure(
list(events = events_df),
class = c("vacc_summary_base", "list")
)
}
# COVID-specific class and methods
new_covid_summary <- function(events_df) {
obj <- new_vacc_summary_base(events_df |> filter(vaccine == "COVID"))
class(obj) <- c("covid_summary", class(obj))
obj
}
vaccinated_people.covid_summary <- function(obj) {
recent_cutoff <- today() - months(6)
obj$events |>
filter(administration_date >= recent_cutoff) |>
distinct(person_id) |>
nrow()
}
# MMR-specific class and methods
new_mmr_summary <- function(events_df) {
obj <- new_vacc_summary_base(events_df |> filter(vaccine == "MMR"))
class(obj) <- c("mmr_summary", class(obj))
obj
}
vaccinated_people.mmr_summary <- function(obj) {
obj$events |>
count(person_id, dose_number) |>
count(person_id) |>
filter(n == 3) |>
nrow()
}
# Generic for reuse
vaccinated_people <- function(obj) UseMethod("vaccinated_people")
print.vacc_summary_base <- function(x, ...) {
cat("Vaccination summary for", class(x)[1], "\n")
cat("Total events:", nrow(x$events), "\n")
cat("Unique people:", n_distinct(x$events$person_id), "\n")
invisible(x)
}
Usage – the magic of dispatch:
covid_sum <- new_covid_summary(vaccination_events)
mmr_sum <- new_mmr_summary(vaccination_events)
print(covid_sum)
print(mmr_sum)
vaccinated_people(covid_sum) # → 2 (P001 and P005 – recent doses)
vaccinated_people(mmr_sum) # → 1 (only P003 has all 3 doses)
Example output:
Vaccination summary for covid_summary
Total events: 4
Unique people: 3
Vaccination summary for mmr_summary
Total events: 5
Unique people: 2
> vaccinated_people(covid_sum)
[1] 2
> vaccinated_people(mmr_sum)
[1] 1
Why This Pattern Was So Powerful in Production
During the COVID-19 response and ongoing immunisation work at Health New Zealand, we used exactly this S3 approach in internal packages:
- Different vaccines (COVID, flu, childhood schedule) each got their own lightweight S3 class.
- Custom generics like
vaccinated_people(),coverage_by_age(),equity_gap()dispatched to the correct logic automatically. - Analysts could write
vaccinated_people(obj)without knowing the underlying rules – the object knew how to answer.
At AA Insurance, we applied the same idea to risk models: different product lines (motor, home, contents) shared raw claims data but had product-specific expected_loss() and retention_rate() methods.
The result: clean, extensible code that evolved with policy changes without breaking existing reports or dashboards.
When S3 Wins
For analytics packages, modelling outputs, reporting objects, and any domain where behaviour depends on type but you want minimal ceremony – S3 is ideal.
You get true polymorphism, easy extension by other packages, and perfect tidyverse integration – all with almost no boilerplate.
Need mutable state or private fields? Reach for R6 (or watch S7).
But for most real-world data work, S3’s quiet power is hard to beat.
Enjoying S3 in your own projects? I’d love to hear your favourite pattern.
#rstats #datascience #OOP #dataengineering #publichealth