Reddit offers JSON feeds for each subreddit. Here’s how to create a Bash script that downloads and parses a list of posts from any subreddit you like. This is just one thing you can do with Reddit’s JSON feeds.
Installing Curl and JQ
We’re going to use
curl to fetch the JSON feed from Reddit and
jq to parse the JSON data and extract the fields we want from the results. Install these two dependencies using
apt-get on Ubuntu and other Debian-based Linux distributions. On other Linux distributions, use your distribution’s package management tool instead.
sudo apt-get install curl jq
Fetch Some JSON Data from Reddit
Let’s see what the data feed looks like. Use
curl to fetch the latest posts from the MildlyInteresting subreddit:
curl -s -A “reddit scraper example” https://www.reddit.com/r/MildlyInteresting.json
Note how the options used before the URL:
-s forces curl to run in silent mode so that we don’t see any output, except the data from Reddit’s servers. The next option and the parameter that follows,
-A “reddit scraper example” , sets a custom user agent string that helps Reddit identify the service accessing their data. The Reddit API servers apply rate limits based on the user agent string. Setting a custom value will cause Reddit to segment our rate limit away from other callers and reduce the chance that we get an HTTP 429 Rate Limit Exceeded error.
The output should fill up the terminal window and look something like this:
There are lots of fields in the output data, but all we’re interested in are Title, Permalink, and URL. You can see an exhaustive list of types and their fields on Reddit’s API documentation page: https://github.com/reddit-archive/reddit/wiki/JSON
Extracting Data from the JSON Output
We want to extract Title, Permalink, and URL, from the output data and save it to a tab-delimited file. We can use text processing tools like
grep , but we have another tool at our disposal that understands JSON data structures, called
jq . For our first attempt, let’s use it to pretty-print and color-code the output. We’ll use the same call as before, but this time, pipe the output through
jq and instruct it to parse and print the JSON data.
curl -s -A “reddit scraper example” https://www.reddit.com/r/MildlyInteresting.json | jq .
Note the period that follows the command. This expression simply parses the input and prints it as-is. The output looks nicely formatted and color-coded:
Let’s examine the structure of the JSON data we get back from Reddit. The root result is an…
I have a crazy passion for #music, #celebrity #news & #fashion! I'm always out and about on Twitter.
Latest posts by Sasha Harriet (see all)
- Johnny Depp Allegedly Tried to Get Amber Heard Fired From Aquaman - April 19, 2019
- Ask Yourself These Four Questions Before You Switch Off and Unplug - April 19, 2019
- Benjamin Franklin’s Secret Rendezvous at Notre Dame - April 19, 2019
More from Around the Web