A few years ago, when clickbait started to be a thing on YouTube, a lot of people did not like it. Sadly, it seemed to work really well, especially for young people. As someone who spends a lot of time on YouTube and knowing how this allowed for worse content to be more successful, it made me mad. Channels with great content and quality may be forced to also create clickbait thumbnails or lose a lot of viewers. So I decided to create an Add-On for Firefox and Chrome, that will automatically block clickbait-videos.
How it worked
If you opened up YouTube, it will search for all videos on the current page. It then will give every video a score, based on various properties. If the score was lower than the set limit, it was removed from the page or greyed out. What properties did I choose? Well, let’s talk about what clickbait is and means for YouTube-users specifically.
What is Clickbait?
To quote Wikipedia:
Clickbait is a form of false advertisement […] that is designed to attract attention and entice users to follow that link and read, view, or listen to the linked piece of online content, with a defining characteristic of being deceptive, typically sensationalized or misleading.
And that is the main problem. From the title, the audience has a false expectation of what will be the content of the video. Based on my own experience this is mainly done by using shocked faces in thumbnails, questions and emojis in the title and LOTS OF CAPSLOCK.
Most people would argue that you could identify a clickbait video only by using the thumbnail and title. However, training a neural network to recognize those images, running it in-browser at an acceptable speed and adapting it to the frequent changes and variations of clickbait-thumbnails was not an option.
Instead, YouTube has a feature that is way more accurate: The dislike button. If the expectations of the audience are not met, a certain group of the YouTube community will not hesitate to dislike the video right away. Clickbait still works on YouTube, as you can only see the rating of a video if you already clicked on it.
However, back then, the YouTube API-Key limit was quite good (with 1.000.000 free credits). On an average day, asking the API for the rating of every video on every page would use way less for most users. As it showed, that was already a really good start to block clickbait. Based on that I made the following observations:
- from my experience, for a more than 95% accuracy, you only need the rating and title to recognize clickbait
- Clickbait-videos had in general more than 10%-15% dislikes
- Video-titles with lots of capsclock were most likely clickbait if the rating was bad
- Videos with loads of “!” and “?” in a row were most likely clickbait if the rating was bad
Thhe simplified version is this:
var dislikeScore = dislikePercentage * dislikeWeight; var titleScore = capsPercentage + (percentageQuestionmarks * weightQuestionmark) + (percentageExclamationMarks * weightExclamationmark); var totalScore = dislikeScore + titleScore + (bonusPercentage * weightBonus);
This is pretty simple. Some explanations for the equation:
capsPercentage is the percentage of how much of the title is written in caps.
percentageExclamationMarks is calculated like
var percentageExclamationMarks = title.match(/\!\!/g).length * weightExclamationmark;
so the amount of exclamation marks in a row. The same goes for
bonusPercentage is normally 0, but gets increased by roughly 5 points for every instance the title contains the words
If the total score is then larger than Totallimit, the video is considered as clickbait.
What I used as initial values for the algorithm:
var weightQuestionmark = 25; var weightExclamationmark = 25; var weightBonus = 87; var dislikeWeight = 8; var totalLimit = 100;
For reference, I searched for 4 big YouTube channels which mainly contained clickbait and tried to adapt the weights so that their videos got correctly filtered out. As I do not want to shame anyone, I will not mention their names.
The Worst Title
A title with one of the worst (realistic) scores I could imagine would be something like
SHE DID WHAT??????? (PRANK FAIL).
Not everyone shared the same idea of what is clickbait. So some customization was needed. First of all, you could change the limit on how low the score can get with a slider. The results were visible live on the given page in real time. You could also whitelist and blacklist channels, so that you never miss a video of your favorite creators or black out certain channels directly.
After I uploaded it to the Firefox Add-On Site and Chrome Web Store, I installed it for myself so I could use it, as this was the most important part for me (being able to use it as a regular Add-On on all my devices). As I did not expect that many people would use it, I inserted one of my own API-Keys in the code. This is stupid and you should never do this. However, in this case, after the API-credit was used up, it would just stop working (and did not cost me any money) so I thought it would probably be fine. And it was. I also added a tutorial how to add your own API-Key to the extension.
Until one day, only a few days after launch, it stopped working for me. Confused, I checked the API-Credit and was blown away. The API limit was used up every day in the morning hours. As it turned out, by this point more than 20.000 people downloaded the extension (and of course did not bother to insert their own API-Key).
I was speechless. Reading from the reviews, lots of people were absolutely loving it (however, complained that it stopped working in the evening). What now?
Well, only a month later the extension would not be able to work anymore.
Why is it not available anymore?
If the HTML-Sourcecode looks like this (wayyy oversimplified):
<div class="video"><!--Video 1--><div> <div class="video"><!--Video 2--><div> <div class="video"><!--Video 3--><div>
var elements = document.getElementsByClassName('video');
elements will contain a collection of div-nodes with the class “video”.
However, the new YouTube website will look more like this:
<div class="jxicvw9uigwem0gw83"><!--Video 1--><div> <div class="jxicvw9uigwem0gw83"><!--Video 2--><div> <div class="jxicvw9uigwem0gw83"><!--Video 3--><div>
and this class name will change. So it is way more difficult to select items. If there is a simple way today to (for example) select DOM-Elements by their surrounding structure let me know, otherwise this could be an interesting project in the future.
Also, I did not implement the API-requests via OAuth2, which would be easier. I was in the process of implementing it, but then the redesign came and I was running out of time (the summer break at my university ended). So I decided to stop development, and 2 Months later deleted the Add-On, as I had no time to adapt it. That was no easy choice as I poured a lot of energy into this minor side project.
Even though the success rate was great, it was far from perfect. For one, it flagged controversial videos, wich were not clickbait way too often, as their like/dislike ratio was different. This also applied to many News-Channels. Even though you could whitelist channels, it bothered me.
Also, as I learned, users expect your piece of software to work flawlessly, even if it is free. I don’t want to blame them, they don’t know about technical problems or that is was just me that coded it.
Also, I should have never published my own API-Key.
That said, the project was a success. First of all, I learned how to create and publish an extension. I also learned that publishing your API-Key is never a good idea. And lastly, I learned that sometimes the audience for the things you create is bigger than you thought.