| Title: | Collect Metadata for Selected Podcasts |
|---|---|
| Description: | Collecting all the data, but just for The Incomparable, Relay.fm and ATP. |
| Authors: | Lukas Burk [aut, cre] (ORCID: <https://orcid.org/0000-0001-7528-3795>) |
| Maintainer: | Lukas Burk <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.2 |
| Built: | 2026-05-27 14:38:24 UTC |
| Source: | https://github.com/jemus42/poddr |
Retrieve ATP episodes
atp_get_episodes(page_limit = NULL, cache = TRUE)atp_get_episodes(page_limit = NULL, cache = TRUE)
page_limit |
Number of pages to scrape, from newest to oldest episode.
Page 1 contains the 5 most recent episodes, and subsequent pages contain 50
episodes per page. Pass |
cache |
( |
A tibble.
## Not run: atp_new <- atp_get_episodes(page_limit = 1) atp_full <- atp_get_episodes() ## End(Not run)## Not run: atp_new <- atp_get_episodes(page_limit = 1) atp_full <- atp_get_episodes() ## End(Not run)
Parse a single ATP page
atp_parse_page(page)atp_parse_page(page)
page |
Scraped page object ( |
A tibble.
## Not run: html <- poddr:::poddr_get("https://atp.fm", as = "html", query = list(page = 1)) atp_parse_page(html) ## End(Not run)## Not run: html <- poddr:::poddr_get("https://atp.fm", as = "html", query = list(page = 1)) atp_parse_page(html) ## End(Not run)
Writes a tibble to RDS (and optionally CSV) in dir. Default dir is
resolved with here::here() so the path is anchored to the project
root rather than the current working directory.
cache_podcast_data( x, dir = here::here("data_cache"), filename = NULL, csv = TRUE )cache_podcast_data( x, dir = here::here("data_cache"), filename = NULL, csv = TRUE )
x |
Object to cache. |
dir |
Directory to save data to. Default: |
filename |
Optional filename sans extension; defaults to |
csv |
If |
Invisibly returns the path(s) written, or NULL for empty input.
## Not run: atp <- atp_get_episodes(page_limit = 1) cache_podcast_data(atp, csv = FALSE) ## End(Not run)## Not run: atp <- atp_get_episodes(page_limit = 1) cache_podcast_data(atp, csv = FALSE) ## End(Not run)
A thin wrapper around tidyr::pivot_longer() and tidyr::separate_rows().
gather_people(episodes)gather_people(episodes)
episodes |
A tibble containing |
A tibble with new columns "role" and "person", one row per person.
## Not run: incomparable <- incomparable_get_episodes(incomparable_get_shows()) incomparable_wide <- gather_people(incomparable) ## End(Not run)## Not run: incomparable <- incomparable_get_episodes(incomparable_get_shows()) incomparable_wide <- gather_people(incomparable) ## End(Not run)
Retrieve all episodes for The Incomparable shows
incomparable_get_episodes(incomparable_shows, cache = TRUE)incomparable_get_episodes(incomparable_shows, cache = TRUE)
incomparable_shows |
Dataset of shows as returned by |
cache |
( |
A tibble.
## Not run: shows <- incomparable_get_shows() incomparable_get_episodes(shows) ## End(Not run)## Not run: shows <- incomparable_get_shows() incomparable_get_episodes(shows) ## End(Not run)
Retrieve all The Incomparable shows
incomparable_get_shows(cache = TRUE)incomparable_get_shows(cache = TRUE)
cache |
( |
A tibble with columns show, stats_url, archive_url, status.
## Not run: incomparable_get_shows() ## End(Not run)## Not run: incomparable_get_shows() ## End(Not run)
Extract subcategory index for given show
incomparable_get_subcategories( archive_url = "https://www.theincomparable.com/gameshow/archive/", cache = TRUE )incomparable_get_subcategories( archive_url = "https://www.theincomparable.com/gameshow/archive/", cache = TRUE )
archive_url |
E.g. |
cache |
( |
A tibble with subcategory links and category names.
## Not run: incomparable_get_subcategories("https://www.theincomparable.com/gameshow/archive/") ## End(Not run)## Not run: incomparable_get_subcategories("https://www.theincomparable.com/gameshow/archive/") ## End(Not run)
Parse a show's archive page on The Incomparable website
incomparable_parse_archive(archive_url, cache = TRUE)incomparable_parse_archive(archive_url, cache = TRUE)
archive_url |
E.g. |
cache |
( |
A tibble.
## Not run: incomparable_parse_archive("https://www.theincomparable.com/gameshow/archive/") ## End(Not run)## Not run: incomparable_parse_archive("https://www.theincomparable.com/gameshow/archive/") ## End(Not run)
Recovers summary (and topic when present) for episodes that
aren't on the archive page yet. The archive page is re-rendered on
a slower cadence than stats.txt updates, so the newest episode of
an active show is typically missing from the archive for hours to
weeks. incomparable_get_episodes() calls this automatically for
any episode in stats.txt that the archive doesn't list.
incomparable_parse_episode(episode_url, cache = TRUE)incomparable_parse_episode(episode_url, cache = TRUE)
episode_url |
The per-episode URL,
e.g. |
cache |
( |
A one-row tibble with columns summary and topic (either
may be NA_character_ if the page doesn't expose them).
## Not run: incomparable_parse_episode("https://www.theincomparable.com/sophomorelit/190/") ## End(Not run)## Not run: incomparable_parse_episode("https://www.theincomparable.com/sophomorelit/190/") ## End(Not run)
Parse The Incomparable stats.txt files
incomparable_parse_stats(stats_url, cache = TRUE)incomparable_parse_stats(stats_url, cache = TRUE)
stats_url |
URL to the |
cache |
( |
A tibble.
## Not run: incomparable_parse_stats("https://www.theincomparable.com/salvage/stats.txt") ## End(Not run)## Not run: incomparable_parse_stats("https://www.theincomparable.com/salvage/stats.txt") ## End(Not run)
Convenience function to display N
label_n(x, brackets = FALSE)label_n(x, brackets = FALSE)
x |
Data or singular value. |
brackets |
Set |
A character of length 1.
label_n(100) label_n(tibble::tibble(x = 1:10, y = 1:10), brackets = TRUE)label_n(100) label_n(tibble::tibble(x = 1:10, y = 1:10), brackets = TRUE)
hms
Converting HH:MM:SS or MM:SS to hms
parse_duration(x)parse_duration(x)
x |
A duration |
A numeric of durations in hms::hms().
Only needed to parse durations in The Incomparable stats.txt files.
parse_duration("32:12") parse_duration("32:12:04")parse_duration("32:12") parse_duration("32:12:04")
Retrieve all episodes for relay.fm shows
relay_get_episodes(relay_shows, cache = TRUE)relay_get_episodes(relay_shows, cache = TRUE)
relay_shows |
A tibble of shows, from |
cache |
( |
A tibble.
## Not run: relay_shows <- relay_get_shows() relay <- relay_get_episodes(relay_shows) ## End(Not run)## Not run: relay_shows <- relay_get_shows() relay <- relay_get_episodes(relay_shows) ## End(Not run)
Parses the show overview page and returns a tibble of show names
with corresponding feed URLs, which in turn can then be passed to
relay_parse_feed() individually.
relay_get_shows(cache = TRUE)relay_get_shows(cache = TRUE)
cache |
( |
A tibble with one row for each show.
## Not run: relay_get_shows() ## End(Not run)## Not run: relay_get_shows() ## End(Not run)
Parses a single feed and returns its content as a tibble.
relay_parse_feed(url, cache = TRUE)relay_parse_feed(url, cache = TRUE)
url |
A show's feed URL, e.g. |
cache |
( |
A tibble.
## Not run: relay_parse_feed(url = "https://www.relay.fm/ungeniused/feed") ## End(Not run)## Not run: relay_parse_feed(url = "https://www.relay.fm/ungeniused/feed") ## End(Not run)
Convenience entry point used by the scheduled GitHub Action. Calls each
fetch orchestrator and writes its output via cache_podcast_data().
Targets users typically don't want this — call the individual
*_get_episodes() functions instead.
update_cached_data(dir = here::here("data_cache"))update_cached_data(dir = here::here("data_cache"))
dir |
Directory to save data to. Default: |
Invisibly returns the list of paths written.
## Not run: update_cached_data() ## End(Not run)## Not run: update_cached_data() ## End(Not run)