Wikipedia Data Scraping with R: rvest in Action

Scraping list of people on bank notes for exploratory data analysis using rvest functions

4 min readJul 22, 2018

Introduction

Wikipedia is a a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation, currently having more than 5+ million articles in English. Today, I will work on the data exercise of wikipedia data scraping using rvest, “a new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces” (Wickham, 2014). Before you proceed, it is important for you to have a basic understanding of HTML and XML web structures. I recommend checking out HTML w3schools, which gives a good, simplified tutorial for learning, testing, and training.

Introduction to HTML

Well organized and easy to understand Web building tutorials with lots of examples of how to use HTML, CSS, JavaScript…

www.w3schools.com

Wikipedia Data Scraping with R: rvest in Action

Scraping list of people on bank notes for exploratory data analysis using rvest functions

Introduction

Introduction to HTML

Well organized and easy to understand Web building tutorials with lots of examples of how to use HTML, CSS, JavaScript…

Written by Korkrid Kyle Akepanidtaworn