Welcome to the inaugural post of our blog "Automated Data Collection with R" – our new outlet for discussing all aspects regarding data collection with R. We are a group of four researchers with a background in political science. Coming from a tradionally data-sparse discipline, we have come to realize the opportunities the internet provides for new and original data sources. Therefore, we have joined forces to write a book on web scraping and text mining with R. We hope to provide an accessible introduction for scholars and practioners who are interested in learning about automatically collecting data. The book will be published with Wiley in January 2015. An e-book version is already available online. If you want to learn more about the book, check out the rest of this site, the table of contents and the introductory chapter.
Building on the themes that are introduced in the book, this blog will cover a broad range of topics related to data collection with R. Among other things, we intend to trace new package releases and write about technical and methodological issues related to web scraping. Some of the posts will build upon materials in the book, but most will treat additional topics, as the web scraping toolbox in R is continuously growing. Some of the future topics include
- Analyzing Wikipedia page view statistics with the
wikipediatrends
package - An introduction to the
htmltable
package – easy scraping of HTML tables in R - Programming a Twitter bot – and the rescue from procrastination
- Web inspector tools and how we can use them for more efficient scraping
- An appreciation of the new
rvest
package - How to behave nicely on the Web – a scraping netiquette
- A Christmas surprise
We hope that some of the posts will attract your interest and look forward to your feedback!