Wikipedia Data Scraping with R: rvest in Action
Scraping list of people on bank notes for exploratory data analysis using rvest functions
Introduction
Wikipedia is a a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation, currently having more than 5+ million articles in English. Today, I will work on the data exercise of wikipedia data scraping using rvest, “a new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces” (Wickham, 2014). Before you proceed, it is important for you to have a basic understanding of HTML and XML web structures. I recommend checking out HTML w3schools, which gives a good, simplified tutorial for learning, testing, and training.