I once saw a pyramid depiction of human needs. At the bottom was basic sustenance, and on top of that shelter and safety, and on top of that was art and culture and that. It left me confused, because it omitted something which I think all of us want on a deep and fundamental level: A way to capture the lyrical idiosyncrasies of an artist in a machine learning model, and thereby churn out an arbitrary amount of pure poetry in the style of that artist. Fortunately that’s going to be addressed in this post.
We need three things.
- A means to get a bunch of lyrics from a given artist, in order to learn about their style.
- Some way of capturing said style – the way this artist tends to put lyrics together – in a model.
- Off the back of 1 and 2, something that uses the rules to spit out some pure artistry.
And so, without further ado!
1 – Getting Lyrics to Learn From
The good eggs at lyricsnmusic.com have got this covered for us with their dank API. Sign yourself up for an API key then it’s as simple as this to get some lyrics for, say, Coldplay. I heard that the favourite browser of the guys over there is Mozilla/5.0 so I’ve added that as our “browser signature” as a gesture of good faith… 😉
[code language=”python”]
url = "http://api.lyricsnmusic.com/songs?api_key="+apiKey+"&artist=coldplay"
opener = urllib2.build_opener()
opener.addheaders = [(‘User-agent’, ‘Mozilla/5.0’)] #lol
response = opener.open(url).read()
feed = json.loads(response)
for song in feed:
print(song["snippet"])
[/code]
...
He said I'm gonna buy this place and burn it down
I'm gonna put it six feet underground
He said I'm gonna buy this place and watch it fall
Stand he...
Come on, oh my star is fading
And I swerve out of control
If I, if I'd only waited
I'd not be stuck here in this hole
...
Martin, you pretentious beauty! What we have here is a JSON structure of songs, each of which has properties like “title” and “snippet”, which, lamentably, is not the full lyrics. The boys have got us covered again, though, because they also provide a URL to the page containing the full lyrics, so we can just mosey on over there and scrape the shit out of those.
While we’re at it, let’s think ahead, and simply condense all this band’s lyrics into one big string, which we’ll call trainingText. Strange name, you say? No stranger than “Nigella”, I respond.
[code language=”python”]
trainingText = ""
for song in feed:
snippet = song["snippet"]
fullUrl = song["url"]
fullPageHTML = opener.open(fullUrl).read()
page = BeautifulSoup(fullPageHTML, "html.parser")
try:
lyrics = str(page.findAll("pre")[0]).replace("
<pre itemprop=\"description\">","").replace("</pre>
","")
#There’s probably a better way of doing this, but I’m not, nor have I ever been, the queen.
trainingText += lyrics
except:
try: trainingText += snippet+"\n"
except: continue
print(trainingText)
[/code]
What this gives you is essentially one big song of everything available for the artist. AND THAT’S IT. Let’s generalise this to a function which accepts a band and an API key as inputs, helpfully keeps you posted on its song-learning progress, and then gives you back the compiled vocal material of the artist you give it…
[code language=”python”]
def getLyrics(band, apiKey):
encodedBand = urllib.quote_plus(band)
url = "http://api.lyricsnmusic.com/songs?api_key="+apiKey+"&artist="+encodedBand
opener = urllib2.build_opener()
opener.addheaders = [(‘User-agent’, ‘Mozilla/5.0’)] #lol
response = opener.open(url).read()
feed = json.loads(response)
trainingText = ""
songsProcessed = 0
for song in feed:
snippet = song["snippet"]
fullUrl = song["url"]
fullPageHTML = opener.open(fullUrl).read()
page = BeautifulSoup(fullPageHTML, "html.parser")
try:
lyrics = str(page.findAll("pre")[0]).replace("
<pre itemprop=\"description\">","").replace("</pre>
","")
#There’s probably a better way of doing this, but I’m not, nor have I ever been, the queen.
trainingText += lyrics
except:
try: trainingText += snippet+"\n"
except: continue
songsProcessed += 1
print("Learned "+str(songsProcessed)+" songs…")
return(trainingText)
[/code]
2 – Capturing the Artist’s Poetic Nuances
Where would such-and-such-an-artist go with a certain theme? How would their lines play out? How long would they be? When would they decide it was time for a new verse?
Based on the back catalogue provided by step 1, we can build a probabilistic model of this, called a Markov Chain. The below is a slight simplification, but captures the essence of how it works.
Supposing we feed into our model the line “the boys are back in town”.
The model takes the first two words – “the boys”, and looks what follows. It finds “are back”. So at this point, we know that “are back” follows “the boys” in 1 out of 1 occurences, ie, 100% of the time. If we’re asked the question of continuing something starting with “the boys”, we will choose “are back” every time, because it’s all we know.
But then supposing we read a little further in the catalogue and see the line “if the boys wanna fight, you better let ’em”. Now we enrich our model with the knowledge that if you see “the boys”, half the time “wanna fight” comes next. Half the time “are back” comes next. Based on that knowledge, if you asked us many times to carry on from “the boys”, on average half the time we would choose “are back”, and half the time we would choose “wanna fight”.
Of course, you wouldn’t just look at what follows “the boys”; you’d look at what follows every other combination of words in your “training” text. That’s what we’re going to do with the massive long list of lyrics. But rather than looking ahead one word, we’re going to look ahead one character, which will be helpful in deciding when to put punctuation in, when to break to a new line, and when to break into a new verse. And, similarly, rather than looking at what’s likely to come next after two words, we’re going to look at what’s likely to follow a certain number of characters. We’ll move this “window” all the way through the past lyrics we got in step 1, like so:
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
And every time we move the window, we have a look what the next character is, and amend our record of how many times we’ve seen that next character, for this window.
The width of that window is called the order of the Markov Chain, and is essentially a measure of how long the “memory” of our model is. It has a profound bearing on the outcome of the lyric generation, because declaring that “s” follows “y” is a lot different to declaring that “s” follows “the boy”.
Anyway. Here’s the code that does this with our lyrics:
[code language=”python”]
def generateModel(text, order):
model = {}
for i in range(0, len(text)-order):
fragment = text[i:i+order] #Range is exclusive at upper bound
nextLetter = text[i+order] #So this is the next letter
if fragment not in model:
model[fragment] = {}
if nextLetter not in model[fragment]:
model[fragment][nextLetter] = 1
else:
model[fragment][nextLetter] += 1
return(model)
[/code]
Here’s what the model looks like for the line “sometimes I run sometimes I hide”:
('un somet', {'i': 1})
(' sometim', {'e': 1})
(' run som', {'e': 1})
('ometimes', {' ': 2})
(' I run s', {'o': 1})
('I run so', {'m': 1})
('imes I h', {'i': 1})
('metimes ', {'I': 2})
('imes I r', {'u': 1})
('es I run', {' ': 1})
('mes I ru', {'n': 1})
('times I ', {'h': 1, 'r': 1})
('sometime', {'s': 2})
('etimes I', {' ': 2})
('mes I hi', {'d': 1})
('n someti', {'m': 1})
('run some', {'t': 1})
('es I hid', {'e': 1})
('s I run ', {'s': 1})
Your way forward is pretty well determined, until you hit “times I “, at which point you have a 50/50 chance of the next letter being “r” or “h”. Once you choose on of those, you’re on a one-way track to either running or hiding. This is a trivial example, but running the entire back catalogue of a band through creates an interesting tree of probabilities which will take you in a different direction every time.
3 – Letting the car drive itself
What we need to do now is create a means of using the model to build up some lyrics. The way this works in Markov Chains is to look at the current window, get the next character probabilistically, then with that, move the window one place along, and repeat.
First let’s make a function which, given a current “fragment” and a model, gives you a next character…
[code language=”python”]
def getNextCharacter(model, fragment):
letters = []
for letter in model[fragment].keys():
for occurences in range(0, model[fragment][letter]):
letters.append(letter)
return(choice(letters))
[/code]
This is doing something very simple. It looks up the possible next characters for this fragment in the model, creates a list where each character appears as many times as its weighting, then then chooses one at random and returns it. Skill.
So all that remains now is to use these functions to spool out some lyrics. Our function for that is going to look like this…
[code language=”python”]
def generateLyrics(trainingText, order, length):
model = generateModel(trainingText, order)
currentFragment = trainingText[0:order]
output = ""
for i in range(0, length-order):
newCharacter = getNextCharacter(model, currentFragment)
output += newCharacter
currentFragment = currentFragment[1:]+newCharacter
return(output)
[/code]
So what we’re saying to this function is: here’s some lyrics to learn from, use these to generate me some lyrics of a certain length, using a window size or order of such and such. The function uses the training lyrics to generate a model, then uses that to form, character-by-character, a meandering continuation of the first few characters of the training text. Once the lyrics are built up to the requested length, the string is returned.
The first thing you’re going to want to do with this is to try it for loads of different artists, and even try it for the same artist multiple times to appreciate how non-deterministic it is, so I’d suggest that the following is a nice way of calling the various functions:
[code language=”python”]
band = raw_input("Enter artist:\n")
lyrics = getLyrics(band, apiKey)
newLyrics = generateLyrics(lyrics, 8, 600)
print(newLyrics)
[/code]
I Can Haz Lyrics?
Yep. Who are some artists with distinctive styles?
Adele
Oh noooo
Let it burn
Oh oh ohhhh
Let it burned while I cried
‘Cause I heard it screaming out your name,
You said I’m stubborn and raised
In a summer haze bound by the surprise
And he will feel like he’s been there for hours
And you’ll walk that mile
Until you kissed my lips and you prefer the floor
God only known each other,
Think of me in the deep (Tears are gonna fall, rolling in the pavements?
Even if, it leads nowhere
And I hear but our eyes, and settle for wrong
via GIPHY
Cradle of Filth
Devildom voyeurs
Ascend to smother the spite seething Draconist
And commit this wolf of the graveyard, of the moon
Lowered Her mask to me
Your soul
Live for the reams
Of verses and curses
That heavenly brow
Crippled seraph shalt cower in illustrious courts
Whilst She entranced divined from the wolves are the rustic summers of my excess
Expurse of a whore
Receiving sole communion from fate
By alighting to discredit rebirth
Alone as a stone cold wish
To see the witch scholared Her
In even darker spheres
Delighting in my cold cell, when the priest comes
Kanye West
yeah yeah yeah, I got packs to get the clouds to break me down
The only rapper AND a producer and a glove, but didn’t have a buzz bigger than the souls of men
Louboutin on the street trying to get by
Stack ya money to buy her a few pairs of new Airs
Cause lately’s been a whole life (Ohh)
And I wonder if you used to feel invisible
Now that wet mouth
Uh, I know she find out what he is owed?
And throw away my bus pass any and every class
Lookin’ at every ass
Cheated on every song and
save their whole deal, Their wrist is on chill
They house warmin’
Sittin’ here, grillin’ people say
via GIPHY
Bob Dylan
Zanzinger killed for no reason to roam
Don’t forget to flash
We’re all gonna meet
At that million dollar bash
Well, the comic book and me, just us, we caught ’em
And that is not
It doesn’t matter of minutes, on bail was out walking
But you and I, we’ve been deceived by the Fountain Bank
One bird book, and a bottle of bread
Yea, heavy and a bottle of bread
Yea, heavy and a buzzard and his weapon took from him
As they rot
While paupers change possessions
Each one means
At times I think that you do
Make me glad I’m in love with your money, pull up your shawl
Won’t you descend from
Led Zeppelin
Brother, I brought you together baby, I’m sure my shot-gun will.
Gonna go walkin’ through the country lanes,
I’ll be singing a song,
Hear me call your friends coming back home.You know I’m the one you want.
I must be time I’m leavin’,
Baby, dry those silver,
I brought you smiling at me,
That’s alright, I’d be the western shore
So now you’d better lay your mornin’ time is now
To sing my song
I’m going ’round the world
I got to find you remember times like these?
To think of us again?
And I do
Tangerine, Tangerine, Tangerine, Tangerine, Tangerine
(The Tangerine line is interesting – I think what’s happened is that at some stage the fragment “Tangerine, Tangerine” has appeared, so literally the only thing that has a chance of following “Tangerine” is “, Tangerine”, because the word is longer than the order.)
Justin Bieber
Rumors spreading ’bout this other guys?
I can see right from wrong
Help me when I got-got your body
Baby no no nobody has got what I need
‘Cause I didn’t believe
When I need is one love, one heart
My one heart
‘Cause I’m in love with you I’m losin’ you
I’m a me tell you one time (girl I love you)
You look so deep
You know you’re standing in front of the camera
She don’t stop until I find) my runaway love
Why can I choose between us no one else,
Want me to,
Baby we can share mine!
I know you care
Just shout whenever
Metallica
Swing the scene
In the city tonight
We are gathered here to maim and kill
‘Cause this is what we have done unto you
But what is truth?I cannot die
Trapped far beyond my fate
I give
You take
This life off from me
Hold my breath as I wish for death
Oh please, God, help me
Hold my breath as I wish for death
Oh please, God, help me
Death in the fast lane is just how it seems to fade away
Drifting further everyday
Getting lost within myself
Nothing can save you
Justice is lost
Justice is raped
Queen
Your mother’s eyes, from your eyes gonna make a big noise
Playin’ in the sand cannot heal me like a jelly fish
I kinda like it
You call me Mister Fahrenheit
I’m trav’ling at the peak of the land
I seen every Wednesday evening
There´s no ending
The Seer he said
Beware the championsFlash a-ah
Savior of the night followed day
And the hub caps all gleam
When I’m cruisin’ in overdrive
Don’t you take me back to you
In rain or shine
You say shark I say bite
You say your folks are telling you
Write my letter feel much better
And use my fancy patter on the multit
You get the idea.
And of course (there’s always an “of course”) the fact that we’re using the Markov Chain approach to generate song lyrics is incidental. This code can be used to generate text that looks like any other text you care to feed it. That could be religious texts, product reviews, sales pitches, anything.
There’s a pretty diverse amount of material already out there, so, now we have this, we probably don’t need humans any more.
The complete code is available on Github.