
The plan for this session is to cover the basic structure of web pages (and HTML and CSS) and then introduce the rvest package, which includes useful tools for scraping data from webpages. We will also briefly discuss additional tools useful for extracting data from the web.
- Introduction to the basic structure of webpages
- Exploring the rvest package, APIs, rselenium
- Additional tools for we scraping
At the end of the session participants will leave with code to scrape basic websites and having scraped a select website.
Optional Reading (1 page):
Bernau, J. A. (2018). Text Analysis with JSTOR Archives. Socius, 4.