The Sexiest Interracial Pairings, Based on Porn Consumption

It’s wonderful what you can get away with in the interests of science. (And it’s also true that anything you do is bound to be offensive/distasteful to somebody, so trying to avoid offending anyone is a fool’s game.)

Anything is fair material for examination. And what more interesting thing to try to describe and quantify than human sexuality?

It’s many-faceted, there are patterns, there are overlaps, it’s fairly close to the fundamentals of human behaviour rather than a derived thing in its own right… In other words, a prime sort of thing to apply a cheeky bit of Python to. Of course, you need some vast online repository of sexual material. The good news here is that, when I looked for one, I was surprisingly successful. Turns out there’s this thing called porn. Sorted on the “sufficient data” front, then…

Of course, being rather broad, trying to capture everything would be the sort of thing which required chapters, so I decided to narrow down to a certain category which seems to titillate some people, which is the old “interracial” category.

By crawling the “Interracial” section of a what I’m told is a well-known porn site1, it’s fairly straightforward to parse 1,000 pages of video titles (encompassing ~45,000 individual videos) and aggregate the views for each, ah, combination…

1Full code provided below…

Most Viewed Pairings

Wow, that “black – white” is really big… Let’s smooth the extreme range of this graph by taking \log_{10} (Views)

Well ain’t that interesting? The top 10 most popular combinations based on views alone are:

  1. black – white
  2. ebony – white
  3. asian – black
  4. asian – white
  5. indian – white
  6. black – latin
  7. black – japan
  8. black – indian
  9. arab – white
  10. arab – black

(If you struggle to read the x-values from the chart because it’s cramped, you should be able to click through and get some sweet, sweet tooltips.)

My immediate thought here is that at least some of this effect is going to be down to biases in the production rate of these various pairings, as well as the inherent “consumability” of them, so we’ll divide each count of views with the number of individual videos within that category to get a “views per video” view…

Normalised Views

Opportunities

These two graphs are different. The most-consumable categories on a “per video” basis are not the most produced. You can see that by overlaying the two:

If you put out one video featuring a black – Indian couple, you will get nearly 4 times as many views as if you produce one featuring a black – white couple, and even more if you go for Japan – Thai! If I were a porn producer, and I often wish I were, that’s where I’d be pumping my money…

(We’ll overlook the Asian – Japan combination as I suspect they might be two sides of the same coin and I could have misinterpreted the coincidence of the two words in titles.)

Bonus Filth

Of course, all of the above just looks for a certain combination of words in the titles of videos, but of course there’s a whole lot else also being said in those titles, and that’s something that’s just begging to be compared. Here’s how it looks for those top 10; the size of the word represents its relative popularity in titles for that combination…

1. Black – White

2. Ebony – White

3. Asian – Black

4. Asian – White

5. Indian – White

6. Black – Latin

7. Black – Japan

8. Black – Indian

9. Arab – White

10. Arab – Black

I’d attempt a commentary, but I’m blushing.

Code

There are efficiencies which can be made to this, but I was bashing it out after a few beers and under the disapproving eye of the gf, so once it worked, that seemed a good time to stop.

import urllib2
from bs4 import BeautifulSoup
from wordcloud import WordCloud

races = ["black", "latin", "white", "asian", "BBC", "indian", "ebony", "japan", "thai", "mexican", "european", "czech", "arab"]

raceCombos = {}

vidCount = 0

x = 1
while x <=1000:
	rawPage = urllib2.urlopen("https://xhamster.com/channels/new-interracial-"+str(x)+".html").read()
	soup = BeautifulSoup(rawPage)
	vids = soup.find_all("div", class_="video")
	for vid in vids:
		try:
			mentions = []
			title = vid.find("u").contents[0].lower()
			views = int(vid.find("div", class_="views-value").contents[0].replace(",",""))
			for race in races:
				if race in str(title):
					mentions.append(race)
			if len(mentions) == 2: #Aint considering no group stuff fam
				mentions = sorted(mentions) #So black - white aggregates in line with white - black
				race1 = mentions[0]
				race2 = mentions[1]
				combo = race1+" - "+race2
				if combo in raceCombos:
					raceCombos[combo]["views"] += int(views)
					raceCombos[combo]["text"] += " "+title
				else:
					raceCombos[combo] = {}
					raceCombos[combo]["views"] = int(views)
					raceCombos[combo]["text"] = title
			vidCount += 1
		except:
			continue
	print("Done "+str(x)+" pages, "+str(vidCount)+" vids...")
	x += 1

for combo in raceCombos:
	print(combo, raceCombos[combo]["views"])
	print("\n")
	text = raceCombos[combo]["text"]
	for word in combo.split(" - "):
		text = text.replace(word, "")
	wordcloud = WordCloud(width = 1200, height = 400).generate(text)
	wordcloud.to_file(combo+".png")