Feel free to try the exercises below at your leisure. Solutions will be posted later in the week! Note: as usual, the answers below are just one way of solving the prompts!

Data Scraping

  1. Using rvest::html_table, scrape the table of City Council members in Washington D.C. from Wikipedia
wiki_url <- 'https://en.wikipedia.org/wiki/Council_of_the_District_of_Columbia'
council_outputs <- rvest::read_html(wiki_url) %>%
  rvest::html_table() %>%
  .[[3]]
council_outputs %>% head
## # A tibble: 6 × 7
##   Councillor   Position Party Party Committee chaired[21…¹
##   <chr>        <chr>    <lgl> <chr> <chr>                 
## 1 Phil Mendel… Chairman NA    Demo… "The Whole"           
## 2 Anita Bonds  At-large NA    Demo… "Executive Administra…
## 3 Doni Crawfo… At-large NA    Inde… ""                    
## 4 Christina H… At-large NA    Inde… "Health"              
## 5 Robert White At-large NA    Demo… "Housing"             
## 6 Brianne Nad… Ward 1   NA    Demo… "Public Works and Ope…
## # ℹ abbreviated name: ¹​`Committee chaired[21]`
## # ℹ 2 more variables: `Term starts` <int>,
## #   `Term ends` <int>
  1. Using the inspector gadget or similar tool, web scrape the news article titles and links from the Climate Change page from the AP News.
url <- 'https://apnews.com/hub/climate-change'
item <- 'h3'

titles <- rvest::read_html(url) %>% 
  rvest::html_elements(item) %>%
  rvest::html_text2()

hyperlinks <- rvest::read_html(url) %>% 
  rvest::html_elements(item) %>%
  rvest::html_elements('a') %>% 
  rvest::html_attr("href") 

data.frame(titles, hyperlinks) %>%
  head
##                                                                                            titles
## 1       King penguins are the rare species benefiting from a warming world. But that could change
## 2               As Iran war shakes energy system, some see powerful argument for renewable energy
## 3              Heat waves that spark damaging droughts are happening more frequently, study finds
## 4                                 Greenland’s sea ice is melting. Fishermen worry what comes next
## 5 Supreme Court agrees to hear from oil and gas companies trying to block climate change lawsuits
## 6                                 Greenland’s sea ice is melting. Fishermen worry what comes next
##                                                                                                                hyperlinks
## 1               https://apnews.com/article/king-penguins-warming-breeding-climate-change-0d4bac686c7159d0c671ebce7f9f1d79
## 2            https://apnews.com/article/iran-war-warming-climate-change-inflation-prices-767a9aace18b23e7d481cde01f3e0d55
## 3                            https://apnews.com/article/heat-wave-drought-climate-change-9248c65a135dc6ab3665cb8b2127d8e2
## 4                                   https://apnews.com/article/greenland-fishing-climate-e3497fa07647d19725fbff5ef82faed9
## 5                       https://apnews.com/article/supreme-court-climate-change-lawsuits-c8982af07855a7a6379e1313ebb71895
## 6 https://apnews.com/video/greenlands-sea-ice-is-melting-fishermen-worry-what-comes-next-ffc6dbf7877f44a487a4784614ed0fae

Working with APIs

  1. Register for an API key with the U.S. Census Bureau. Once it is received, download any data point of interest from the American Community Survey or Decennial Census. (Documentation here)
#api key not printed here, register at link above
#arbitrarily deciding to get the number of citizen voting age population 
#in washington DC according to the 2021 American Community Survey (ACS)
url <- stringr::str_c('api.census.gov/data/2024/acs/acs1/profile?get=NAME,DP02_0001E&for=state:11&key=', api_key) 

api_call <- httr::GET(url)

api_call %>% 
  httr::content(type = 'text') %>%
  jsonlite::fromJSON()  #convert jumbled text to matrix 
##      [,1]                   [,2]         [,3]   
## [1,] "NAME"                 "DP02_0001E" "state"
## [2,] "District of Columbia" "329687"     "11"
  1. Try to replicate #1 using the tidycensus package, which is an API wrapper.
#replicating the same estimate as above
#2021 ACS, 1 year estimate of CVAP population in DC
tidycensus::get_acs(geography = 'state',
                    variables = 'DP02_0001', 
                    year = '2024', 
                    key = api_key, 
                    survey = 'acs1',
                    state = 'DC'
                    )
## # A tibble: 1 × 5
##   GEOID NAME                 variable  estimate   moe
##   <chr> <chr>                <chr>        <dbl> <dbl>
## 1 11    District of Columbia DP02_0001   329687  4510