10/8/2019
General idea
- Thanks to Yu-Han Kao—adapted some of these materials from her!
Data scraping/web scraping
- Pulling data from the internet (web sites, social media, etc.)
- Involves: crawling/searching, extraction, parsing, reformatting
- Often two general approaches:
- Directly scraping (note possibly rude—your program/bot(s) will make requests from their server)
- Use an API!
What is an API
- Application Programming Interface
- A way for programs/software to communicate
- Client/server - can be for web, operating system, databases, etc.
- Web APIs
- APIs for either web browser or web server
- Twitter API, Google API, Facebook API…
Using APIs with R
- Many R packages to make this easier:
rtweet
,twitteR
, Rfacebook
, googleAuthR
, googleAnalyticsR
…
- Directly using R
- Python also has very comprehensive libraries (many directly developed by the companies)
Getting data from Twitter
Getting data from Twitter