Analyzing Sonic Fan Art with data science (2024)

A tutorial on using BeautifulSoup to scrape DeviantArt

Analyzing Sonic Fan Art with data science (1)

Published in

Towards Data Science

·

5 min read

·

Sep 6, 2020

--

The Sonic fandom has achieved a level of notoriety that few fandoms on the Internet enjoy. The art is known for being distorted, disturbing and in many cases, explicit. In my latest Youtube video, I scraped DeviantArt to analyze fan art to determine whether or not it truly lives up to this reputation. This post will walk you through exactly how I did that.

I first wanted to get a sense of how many Sonic artworks there are on DeviantArt, a fan art sharing website. No scraping required, here — I simply searched for “Sonic” on DeviantArt and recorded how many results came up. I also did the same for similar characters, like Shrek and Pikachu. Here are those results, visualized:

Analyzing Sonic Fan Art with data science (3)

The amount of fan art dedicated to Sonic dwarfs that of the other characters, coming to around 1.4 million. Wow! Now that we’ve seen there’s a thriving Sonic culture on DeviantArt, it’s time to move towards answering our research question: Is Sonic fan art really that disturbing?

To start, I used BeautifulSoup to scrape the posts (the full code for this can be accessed here on Github). The following code scrapes the first 2200 links that came up when searching “sonic.”

base_url = "https://www.deviantart.com/search/deviations?"
urls = np.array([])
for i in range(50):
if i == 0:
url = base_url + "q=sonic"
else:
url = base_url + "page=" + str(i) + "&q=sonic"
request=urllib.request.Request(url,None,headers)
if url in urls:
pass
else:
bs = BeautifulSoup(urlopen(request), "html.parser")
links = [item.get("href") for item in bs.find_all(attrs={"data-hook" : "deviation_link"})]
urls = np.append(urls, links)

len(urls)

I then retrieved several attributes from each of these urls, including the post’s title, tags and the number of views, favorites and comments each post got. Here is the code for retrieving this data:

for i in range(num):
print(deviationurls[i])
request=urllib.request.Request(deviationurls[i],None,headers)
bs = BeautifulSoup(urlopen(request), "html.parser")
vals = [item.text for item in bs.find_all("span", class_="iiglI")]
tag = [item.text for item in bs.find_all("span", class_="_3uQxz")]
#print(vals)
if len(vals) == 3:
faves.append(vals[0])
comments.append(vals[1])
views.append(vals[2])
else:
faves.append(vals[0])
comments.append(0)
views.append(vals[1])
tags.append(tag)
titles.append(bs.find_all("h1")[0].text)

Sometimes, the comments field was empty, hence the if/else condition.

After doing this, you can now construct a dataframe of Sonic fan artworks for analysis! Yours might look different from mine because different art works might turn up for your query when scraping. But here is the resulting dataset from my scraper.

Here’s the distribution of years in which the artworks in my dataset were published. You can see that artworks from the 2010s represents the majority — perhaps people started posting more in this time, or perhaps those pieces are simply what came up first when I was scraping.

Analyzing Sonic Fan Art with data science (4)

I can’t include any of the images from the dataset in this Medium post, because I do not have the right to republish them here. I do, however, react to the top 10 most viewed and most favorited pieces of fan art in my video.

In my dataset, the two most viewed artworks were character makers (number one, and number two). This is not surprising, as a huge part of the Sonic fandom is to make original characters, or “OCs,” and write stories around these characters, like in any fandom. (The Sonic fandom, however, takes this pastime to a new level. Try googling your name + “the hedgehog” sometime to see what I’m talking about)

Animated shorts, comics and character references were also popular.

My ultimate goal was to analyze the tags that artists on DeviantArt use. Which tags are used with which? To do this, I took the “tags” column in my dataframe to create a correlation matrix. This was a somewhat involved process — the first step was to create a “corpus” of the tags for parsing.

data = np.unique(df.tags)[1:]
(data[0])
data_real = []
for strdata in data:
str1 = strdata.replace(']','').replace('[','')
l = str1.replace("'",'').split(",")
l = [item.strip(' ') for item in l]
data_real.append(l)
texts = [[word.lower() for word in line] for line in data_real]
corpus = []
for text in texts:
corpus.append(‘ ‘.join(text))
corpus

After creating the corpus, I used scikit-learn to create the correlation matrix.

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(ngram_range=(1,1), stop_words = 'english') # You can define your own parameters
X = cv.fit_transform(corpus)
Xc = (X.T * X) # This is the matrix manipulation step
Xc.setdiag(0)
names = cv.get_feature_names() # This are the entity names (i.e. keywords)
df_tags = pd.DataFrame(data = Xc.toarray(), columns = names, index = names)
df_tags

And your resulting dataframe should look something like this (yes, it will be massive). Here, the number represents the amount of time both tags were used with one another.

Analyzing Sonic Fan Art with data science (5)

I first used Python libraries like networkx and matplotlib to visualize the network, but these were indiscernable messes. It was clear I needed a more powerful software.

I saved the dataframe to a CSV and inputted it into gephi. In the network, the larger a word is, the more times it’s used. And the closer a word is to another word, the more times those tags are used together. After fine tuning the network’s features, this is what it ended up looking like:

Analyzing Sonic Fan Art with data science (6)

This network is obviously massive, but it’s interesting to look around—I suggest taking a few minutes to zoom in and around it to see the connections between certain words. Or, you can watch my YouTube video summarizing this analysis and see what I took out of the network.

And that’s the end! To conclude, while there are examples of disturbing or distorted Sonic fan art, I think my analysis shows that that is certainly not what’s most popular in the community, or even the majority of the work. Much of it is, in fact, quite wholesome.

If you think this article leaves much to be desired, please consider watching my video on the subject — I share more of my results, findings and key takeaways there. This post was more meant to explain how I did the analysis.

Analyzing Sonic Fan Art with data science (2024)

References

Top Articles
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6096

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.