Automated Data Collection with R
The Book
 The rapid growth of the World Wide Web has opened many opportunities in collecting, sharing and publishing data of all kinds. This book shows how to collect and post-process this data with the most popular and easy to use statistical programming language R. It provides a hands-on guide to web scraping and text mining for both beginners and experienced users, featuring examples throughout that explain each of the techniques presented. Fundamental concepts of the main architecture of the Web and databases are discussed along with coverage of HTTP, HTML, XML, JSON, JavaScript and SQL.
The rapid growth of the World Wide Web has opened many opportunities in collecting, sharing and publishing data of all kinds. This book shows how to collect and post-process this data with the most popular and easy to use statistical programming language R. It provides a hands-on guide to web scraping and text mining for both beginners and experienced users, featuring examples throughout that explain each of the techniques presented. Fundamental concepts of the main architecture of the Web and databases are discussed along with coverage of HTTP, HTML, XML, JSON, JavaScript and SQL.
- Presents a practical guide to web scraping and text mining for both beginners and experienced users of R.
- Explores basic techniques to query web documents and data sets (XPath and regular expressions) as well as technologies to gather information from dynamic HTML.
- Demonstrates how to connect to web services/web APIs and collect data in a regular manner.
- Provides a practical perspective on the workflow of data scraping and managment - from choosing the right method to optimizing code and maintaining scrapers
- Features case studies throughout along with examples for each technique presented.
- Provides a multitude of exercises to guide the reader through each technique.
- R code and answers to questions posed in the text featured on this website.
Available from: Wiley, Amazon (in English), Amazon (in Chinese)
The Blog
Visit our blog to read more about all things data collection with R. We post regularly on text manipulation, databases, Web technologies, Web Scraping, and further topics. In R, of course. We also tweet regularly under RDataCollection on a wide spectrum of R and data collection specific topics.
News
- March 2017Learn data collection techniques with R from Matt Loftis at the Essex Summer School in Social Science and Data Analysis (24.-28. July 2017). Register now!
- July 2016Automated Data Collection with R is now available in Chinese! 玩得开心!
- October 7, 2015Register for an introductory course to learn web scraping with R at ECPR Winter School of Methods 2016 in beautiful Bamberg, Germany.
- November 25, 2014Register for an introductory course to learn web scraping with R at ECPR Winter School of Methods 2015 in beautiful Bamberg, Germany.
- November 18, 2014The E-Book version is now available from the Wiley online shop.
- October 28, 2014A guest post on web scraping with R on the predictive heuristics blog.