Projects

Here are a couple of interesting projects I've worked on! More to come...



Reddit Comment Parser

Responsive image

Over the summer, I wrote a Reddit parser to parse through all comments to detect a certain link. The link is similar to a Rick Roll, or Darude Sandstorm for something a bit more modern, where the joke is purposely misleading the recipient. So the goal of the project was to detect when this reddit link is used to improperly direct someone. In short, the bot had to read through all new comments; detect when a certain link is used; check if the link text (human readable section) is misleading; and reply if so.

I did this in Python, because there was a nice library available: PRAW. PRAW handles most of the REST calls, allow us to deal directly with comment objects. By design, the PRAW API only returns a maximum of 100 comments at a time. Since it is assumed that this link can appear within any subreddit, this means we must constantly parse through the most recent comments and quickly decide what to do with each comment. Luckily, PRAW handles the API calls and will wait if we are making calls too frequently. Using regular expressions we can parse through each comment's text in a relatively quick manner.

That's all well and good, but we want to minimize the time of each loop to avoid missing comments. One way I have done so is by keeping a list of previously processed comments. To avoid the list becoming too big, I implemented a queue with over 100 elements. When the queue is full, the oldest element is deleted. If the queue is less than or equal to 100- the number of comments that PRAW can pull in one go- AND the list of comments is the same between runs, the queue will have deleted a repeated comment by the time we have reached it. This will result in the program processing a repeated comment again.

After all processing, we have to send a reply to the comments that need it. To not completely spoil the fun, I decided to wait 5 minutes before sending a response. This is done by working with the asynchronous IO library or asyncio for short. This library can get very complicated, but all I am using it for is a nonblocking wait period. It might be interesting to take a further look into the asyncio library, since before I was using threading options instead.