Reddit uses Web Risk to protect users against phishing, malware, and social engineering

October 4, 2023

By

Google Cloud Blog

Reddit is one of the most popular social media sites in the world, offering users the ability to post and connect on all kinds of topics in community forums, or “subreddits.” Millions of users use it every day to discuss topics ranging from the mundane to the silly to the profound. On Reddit, there’s a home for everyone to dive into their interests, learn something new, connect with people who share their interests, and have a good time.

To keep Reddit a welcoming and real space for users, Trust, Safety, Privacy, and Transparency are pillars of Reddit’s platform. Reddit has a unique, multi-layered approach to content moderation that leverages our community, a team of experts, automated systems, reporting systems, and educational resources to identify, remove, and prevent harmful content.

One type of harmful content Reddit works to protect our users from are malicious links such as phishing, malware, and social engineering, which can cause real damage for the millions of users on our platform. In addition to our current preventative measures, we’re always looking for ways to be equipped with the latest tools to detect and remediate these malicious links as quickly as possible.

To do that, we chose the Web Risk API to develop a system that allows us to cross-reference links posted on Reddit with the extensive database of unsafe URLs offered by Web Risk. Web Risk has more than 15 years of experience with Safe Browsing, scanning billions of web pages every day. Integrating Reddit’s systems with Web Risk was as simple as using their provided docker image to make API calls with our seen URLs. This gave us a real-time verdict if a link was malicious, based on Web Risk’s comprehensive and up-to-date list of malicious URLs.

The flexibility and speed of the Web Risk Update API allows us to identify unsafe URLs in real time that are shared on Reddit in comments (more than 130 comments are posted on the site every second) or new posts (users create more than ~20 posts/second), while preserving end-user privacy. We use a local database lookup for partial hash matches of URLs, with a context-free URL sent to the Web Risk APIs for confirmation matching.

To offer additional protection, we use the Evaluate API to get a risk score between EXTREMELY_HIGH to SAFE that helps us take nuanced action such as sending posts for review to our community moderators. This process is constantly running in the background scanning millions of URLs (~20 URLs/second) shared on Reddit to keep users safe. It is invisible to our community and an excellent time-saving tool for Reddit moderators.

Since deploying Web Risk, we have been able to take down tens of thousands of posts that contained harmful URLs. Additionally, we have been able to proactively identify tens of thousands of malicious links that would have otherwise been published on our platform, potentially harming our users.

This is one of the many examples of efforts Reddit put in place to keep our communities healthy and we are continuously seeking areas where we can improve our protection to make sure that anyone using our platform can do so safely.