Title: | Collect Metadata for Selected Podcasts |
---|---|
Description: | Collecting all the data, but just for The Incomparable, Relay.fm and ATP. |
Authors: | Lukas Burk [aut, cre] |
Maintainer: | Lukas Burk <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.6 |
Built: | 2025-01-30 06:24:21 UTC |
Source: | https://github.com/jemus42/poddr |
Retrieve ATP episodes
atp_get_episodes(page_limit = NULL, cache = TRUE)
atp_get_episodes(page_limit = NULL, cache = TRUE)
page_limit |
Number of pages to scrape, from newest to oldest episode.
Page 1 contains the 5 most recent episodes, and subsequent pages contain 50
episodes per page. As of December 2020, there are 10 pages total.
Pass |
cache |
( |
A tibble.
## Not run: # Only the first page with the newest 5 episodes atp_new <- atp_get_episodes(page_limit = 1) # The latest and then 50 more atp_latest <- atp_get_episodes(page_limit = 2) # Get all episodes (use wisely) atp_full <- atp_get_episodes() ## End(Not run)
## Not run: # Only the first page with the newest 5 episodes atp_new <- atp_get_episodes(page_limit = 1) # The latest and then 50 more atp_latest <- atp_get_episodes(page_limit = 2) # Get all episodes (use wisely) atp_full <- atp_get_episodes() ## End(Not run)
Parse a single ATP page
atp_parse_page(page)
atp_parse_page(page)
page |
Scraped page object, e.g. from |
A tibble.
## Not run: session <- polite::bow(url = "https://atp.fm") page <- polite::scrape(session, query = list(page = 1)) atp_parse_page(page) ## End(Not run)
## Not run: session <- polite::bow(url = "https://atp.fm") page <- polite::scrape(session, query = list(page = 1)) atp_parse_page(page) ## End(Not run)
Cache episode data
cache_podcast_data(x, dir = "data_cache", filename = NULL, csv = TRUE)
cache_podcast_data(x, dir = "data_cache", filename = NULL, csv = TRUE)
x |
Object to cache. |
dir |
|
filename |
Optional filename sans extension, if not specified the name of |
csv |
If |
Nothing
## Not run: atp_new <- atp_get_episodes(page_limit = 1) cache_podcast_data(atp_new, csv = FALSE) ## End(Not run)
## Not run: atp_new <- atp_get_episodes(page_limit = 1) cache_podcast_data(atp_new, csv = FALSE) ## End(Not run)
A thin wrapper around tidyr::pivot_longer()
and tidyr::separate_rows()
.
gather_people(episodes)
gather_people(episodes)
episodes |
A tibble containing |
A tibble with new columns "role"
and "person"
, one row per person.
## Not run: incomparable <- incomparable_get_episodes(incomparable_get_shows()) incomparable_wide <- gather_people(incomparable) ## End(Not run)
## Not run: incomparable <- incomparable_get_episodes(incomparable_get_shows()) incomparable_wide <- gather_people(incomparable) ## End(Not run)
This combines incomparable_parse_stats()
and incomparable_parse_archive()
to retrieve full episode information including host/guest, durations
including seconds, podcast subcategories and topics.
Use sparingly to limit unnecessarily hammering the poor webserver!
incomparable_get_episodes(incomparable_shows, cache = TRUE)
incomparable_get_episodes(incomparable_shows, cache = TRUE)
incomparable_shows |
Dataset of shows with title and URLs as returned by
|
cache |
( |
A tibble with one row per episode.
## Not run: incomparable_shows <- incomparable_get_shows() incomparable <- incomparable_get_episodes(incomparable_shows) ## End(Not run)
## Not run: incomparable_shows <- incomparable_get_shows() incomparable <- incomparable_get_episodes(incomparable_shows) ## End(Not run)
Parses the show overview page and returns a tibble of show names
with corresponding URLs, which in turn can then be passed to
incomparable_parse_archive()
and incomparable_parse_stats()
individually.
incomparable_get_shows(cache = TRUE)
incomparable_get_shows(cache = TRUE)
cache |
( |
A tibble with following columns:
Columns: 4 $ show <chr> $ stats_url <glue> $ archive_url <glue> $ status <chr>
## Not run: incomparable_get_shows() ## End(Not run)
## Not run: incomparable_get_shows() ## End(Not run)
Not actively used in other functions but could come in handy.
incomparable_get_subcategories( archive_url = "https://www.theincomparable.com/gameshow/archive/" )
incomparable_get_subcategories( archive_url = "https://www.theincomparable.com/gameshow/archive/" )
archive_url |
E.g.
|
A tibble with subcategory links link
and category name category
## Not run: incomparable_get_subcategories("https://www.theincomparable.com/gameshow/archive/") ## End(Not run)
## Not run: incomparable_get_subcategories("https://www.theincomparable.com/gameshow/archive/") ## End(Not run)
Retrieves all episodes for one or more shows passed as a tibble.
The archive page does not include full duration information, as it is
limited to hours and minutes. Use incomparable_parse_stats()
for
accurate episode durations.
incomparable_parse_archive(archive_url)
incomparable_parse_archive(archive_url)
archive_url |
E.g.
|
A tibble, with following format:
#> dplyr::glimpse(incomparable_parse_archive(archive_url)) Columns: 12 $ number <chr> $ title <chr> $ date <date> $ year <dbl> $ month <ord> $ weekday <ord> $ host <chr> $ guest <chr> $ category <chr> $ topic <chr> $ summary <chr> $ network <chr>
## Not run: archive_url <- "https://www.theincomparable.com/gameshow/archive/" incomparable_parse_archive(archive_url) ## End(Not run)
## Not run: archive_url <- "https://www.theincomparable.com/gameshow/archive/" incomparable_parse_archive(archive_url) ## End(Not run)
The stats.txt
files have a slightly different format, especially the
host/guest information may differ from what is returned by
incomparable_parse_archive()
, which implicitly assumes the first person
mentioned to be the host of the episode. However, this data source
does not include podcast subcategories (e.g. "Old Movie Club") or
topic information, which is only available on the archive page.
incomparable_parse_stats(stats_url)
incomparable_parse_stats(stats_url)
stats_url |
URL to the |
A tibble.
## Not run: incomparable_parse_stats("https://www.theincomparable.com/salvage/stats.txt") ## End(Not run)
## Not run: incomparable_parse_stats("https://www.theincomparable.com/salvage/stats.txt") ## End(Not run)
Convenience function to display N
label_n(x, brackets = FALSE)
label_n(x, brackets = FALSE)
x |
Data or singular value. |
brackets |
Set |
A character of length 1.
label_n(100) label_n(tibble::tibble(x = 1:10, y = 1:10), brackets = TRUE)
label_n(100) label_n(tibble::tibble(x = 1:10, y = 1:10), brackets = TRUE)
hms
Converting HH:MM:SS or MM:SS to hms
parse_duration(x)
parse_duration(x)
x |
A duration |
A numeric of durations in hms::hms()
.
Only needed to parse durations in The Incomparable stats.txt
files.
parse_duration("32:12") parse_duration("32:12:04")
parse_duration("32:12") parse_duration("32:12:04")
Retrieves all episodes for one or more shows passed as a tibble.
relay_get_episodes(relay_shows, cache = TRUE)
relay_get_episodes(relay_shows, cache = TRUE)
relay_shows |
A tibble of shows, from |
cache |
( |
A tibble.
## Not run: relay_shows <- relay_get_shows() relay <- relay_get_episodes(relay_shows) ## End(Not run)
## Not run: relay_shows <- relay_get_shows() relay <- relay_get_episodes(relay_shows) ## End(Not run)
Parses the show overview page and returns a tibble of show names
with corresponding feed URLs, which in turn can then be passed to
relay_parse_feed()
individually.
relay_get_shows(cache = TRUE)
relay_get_shows(cache = TRUE)
cache |
( |
A tibble with one row for each show
## Not run: relay_get_shows() ## End(Not run)
## Not run: relay_get_shows() ## End(Not run)
Parses a single feed and returns its content as a tibble.
relay_parse_feed(url)
relay_parse_feed(url)
url |
A show's feed URL, e.g. |
A tibble.
## Not run: relay_parse_feed(url = "https://www.relay.fm/ungeniused/feed") ## End(Not run)
## Not run: relay_parse_feed(url = "https://www.relay.fm/ungeniused/feed") ## End(Not run)
Update and cache data locally
update_cached_data(dir = "data_cache")
update_cached_data(dir = "data_cache")
dir |
|
Nothing
## Not run: update_cached_data() ## End(Not run)
## Not run: update_cached_data() ## End(Not run)