incomparable_get_episodes() now fetches each per-episode page for
episodes that appear in stats.txt but aren't listed on the show's
archive page yet (the archive renders on a slower cadence than
stats.txt updates, so the newest episode is typically missing from
the archive for hours to weeks). This recovers summary for those
episodes from the per-episode og:description meta tag. topic
remains NA for newest-episode gaps unless the individual page
happens to populate .episode-subtitle.
New exported helper incomparable_parse_episode(episode_url, cache)
returns a one-row tibble (summary, topic) for a given episode
URL — exposed for direct use; called automatically by the
orchestrator's gap-fill.
The gap-fill is lazy: zero extra HTTP requests when the archive is current. Worst case scales with the gap size (typically 0–1 episodes per show per scheduled run).
incomparable_get_episodes() no longer returns NA for year,
month, weekday, and network on episodes that appear in
stats.txt but haven't been added to the archive page yet (the
Incomparable site renders the two surfaces independently and
stats.txt typically leads by hours-to-weeks for new episodes).
These four columns are now derived from the canonical
stats.txt date / a constant after the join, so any row that
has a date also has year, month, weekday, and network.
category, topic, and summary remain NA for episodes the
archive hasn't listed yet — those fields genuinely have no source
to recover them from. Reported by the
podcasts.jemu.name
consumer (2026-05-25).
Same fix also protects against the historical join-key mismatch
case where stats.txt and the archive page disagree on an episode
number (e.g. legacy sub-indexed entries like 123a / 123b):
derived columns are populated from the surviving date regardless
of whether the archive row matched.
HTTP layer migrated from polite to httr2. The cache argument on
atp_get_episodes(), relay_get_episodes(), relay_get_shows(),
incomparable_get_episodes(), incomparable_get_shows(),
incomparable_parse_archive(), incomparable_parse_stats(),
incomparable_get_subcategories(), and relay_parse_feed() now
controls the httr2 HTTP cache only. None of these functions write
RDS/CSV files as a side effect any more.
Callers that need disk artefacts must invoke cache_podcast_data()
explicitly, or use update_cached_data() which bundles fetch + write.
cache_podcast_data(dir = …) defaults to here::here("data_cache")
instead of the literal relative path "data_cache".
New internal request helper centralises user-agent, per-host
throttling (default 1 req / 2 s), transient retries (429/5xx), and
cross-session HTTP caching via tools::R_user_dir("poddr", "cache").
New package options for tuning the request layer:
poddr_user_agent, poddr_throttle_rate, poddr_cache_dir,
poddr_cache_max_age, poddr_cache_max_size.
robotstxt::paths_allowed() is now checked once per host at
orchestrator entry (i.e. inside *_get_shows() and effectively by
the per-show parsers fired from *_get_episodes()).
Added vcr cassettes for the network-touching functions, covering
atp_get_episodes(), relay_get_shows(), relay_parse_feed(),
relay_get_episodes(), incomparable_get_shows(),
incomparable_parse_archive(), incomparable_parse_stats(), and
incomparable_get_episodes(). Cassettes re-record after 30 days.
Parser logic for relay and Incomparable was factored into inner private functions taking already-parsed XML / HTML / text, so the parsing layer is tested offline against synthetic fixtures.
polite, memoise.httr2, xml2, robotstxt, here.vcr, withr.incomparable_parse_stats() now fetches stats.txt through
polite::politely() so that requests respect robots.txt and a
per-host delay, matching the rest of the scrapers.progress package to cli, dropping
one direct dependency.purrr::map_dfr()/pmap_dfr() with
purrr::map()/pmap() + purrr::list_rbind().relay_get_shows() now caches to relay_shows.rds instead of
overwriting the episode cache.tidyselect in gather_people().testthat tests for parse_duration(), label_n(),
gather_people(), and atp_parse_page().R (>= 4.1.0) in DESCRIPTION to match use of |> and
\(x).Show and Network column equal "ATP" for consistency with other podcasts.500 and is not parseable.Empty archive page for Doctor Who Flashcast at https://www.theincomparable.com/dwf/archive/pkgdown site.NEWS.md file to track changes to the package.