Using Machine Learning to Generate Lyrics in the Style of your Favourite Artist

I once saw a pyramid depiction of human needs. At the bottom was basic sustenance, and on top of that shelter and safety, and on top of that was art and culture and that. It left me confused, because it omitted something which I think all of us want on a deep and fundamental level: A way to capture the lyrical idiosyncrasies of an artist in a machine learning model, and thereby churn out an arbitrary amount of pure poetry in the style of that artist. Fortunately that’s going to be addressed in this post.

We need three things.

  1. A means to get a bunch of lyrics from a given artist, in order to learn about their style.
  2. Some way of capturing said style – the way this artist tends to put lyrics together – in a model.
  3. Off the back of 1 and 2, something that uses the rules to spit out some pure artistry.

And so, without further ado!

1 – Getting Lyrics to Learn From

The good eggs at lyricsnmusic.com have got this covered for us with their dank API. Sign yourself up for an API key then it’s as simple as this to get some lyrics for, say, Coldplay. I heard that the favourite browser of the guys over there is Mozilla/5.0 so I’ve added that as our “browser signature” as a gesture of good faith… 😉

url = "http://api.lyricsnmusic.com/songs?api_key="+apiKey+"&artist=coldplay"

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')] #lol
response = opener.open(url).read()

feed = json.loads(response)
for song in feed:
    print(song["snippet"])
...
He said I'm gonna buy this place and burn it down
I'm gonna put it six feet underground
He said I'm gonna buy this place and watch it fall
Stand he...
Come on, oh my star is fading
And I swerve out of control
If I, if I'd only waited
I'd not be stuck here in this hole
...

Martin, you pretentious beauty! What we have here is a JSON structure of songs, each of which has properties like “title” and “snippet”, which, lamentably, is not the full lyrics. The boys have got us covered again, though, because they also provide a URL to the page containing the full lyrics, so we can just mosey on over there and scrape the shit out of those.

While we’re at it, let’s think ahead, and simply condense all this band’s lyrics into one big string, which we’ll call trainingText. Strange name, you say? No stranger than “Nigella”, I respond.

trainingText = ""
for song in feed:
    snippet = song["snippet"]
    fullUrl = song["url"]
    fullPageHTML = opener.open(fullUrl).read()
    page = BeautifulSoup(fullPageHTML, "html.parser")

    try:
        lyrics = str(page.findAll("pre")[0]).replace("
<pre itemprop=\"description\">","").replace("</pre>

","")
        #There's probably a better way of doing this, but I'm not, nor have I ever been, the queen.
        trainingText += lyrics
    except:
        try: trainingText += snippet+"\n"
        except: continue
print(trainingText)

What this gives you is essentially one big song of everything available for the artist. AND THAT’S IT. Let’s generalise this to a function which accepts a band and an API key as inputs, helpfully keeps you posted on its song-learning progress, and then gives you back the compiled vocal material of the artist you give it…

def getLyrics(band, apiKey):
    encodedBand = urllib.quote_plus(band)
    url = "http://api.lyricsnmusic.com/songs?api_key="+apiKey+"&artist="+encodedBand

    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')] #lol
    response = opener.open(url).read()
    feed = json.loads(response)
	
    trainingText = ""
    songsProcessed = 0
    for song in feed:
        snippet = song["snippet"]
        fullUrl = song["url"]
        fullPageHTML = opener.open(fullUrl).read()
        page = BeautifulSoup(fullPageHTML, "html.parser")
	
        try:
            lyrics = str(page.findAll("pre")[0]).replace("
<pre itemprop=\"description\">","").replace("</pre>

","")
            #There's probably a better way of doing this, but I'm not, nor have I ever been, the queen.
            trainingText += lyrics
        except:
            try: trainingText += snippet+"\n"
            except: continue
        songsProcessed += 1
        print("Learned "+str(songsProcessed)+" songs...")

        return(trainingText)

2 – Capturing the Artist’s Poetic Nuances

Where would such-and-such-an-artist go with a certain theme? How would their lines play out? How long would they be? When would they decide it was time for a new verse?

Based on the back catalogue provided by step 1, we can build a probabilistic model of this, called a Markov Chain. The below is a slight simplification, but captures the essence of how it works.

Supposing we feed into our model the line “the boys are back in town”.

The model takes the first two words – “the boys”, and looks what follows. It finds “are back”. So at this point, we know that “are back” follows “the boys” in 1 out of 1 occurences, ie, 100% of the time. If we’re asked the question of continuing something starting with “the boys”, we will choose “are back” every time, because it’s all we know.

But then supposing we read a little further in the catalogue and see the line “if the boys wanna fight, you better let ’em”. Now we enrich our model with the knowledge that if you see “the boys”, half the time “wanna fight” comes next. Half the time “are back” comes next. Based on that knowledge, if you asked us many times to carry on from “the boys”, on average half the time we would choose “are back”, and half the time we would choose “wanna fight”.

Of course, you wouldn’t just look at what follows “the boys”; you’d look at what follows every other combination of words in your “training” text. That’s what we’re going to do with the massive long list of lyrics. But rather than looking ahead one word, we’re going to look ahead one character, which will be helpful in deciding when to put punctuation in, when to break to a new line, and when to break into a new verse. And, similarly, rather than looking at what’s likely to come next after two words, we’re going to look at what’s likely to follow a certain number of characters. We’ll move this “window” all the way through the past lyrics we got in step 1, like so:

The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town
The boys are back in town

And every time we move the window, we have a look what the next character is, and amend our record of how many times we’ve seen that next character, for this window.

The width of that window is called the order of the Markov Chain, and is essentially a measure of how long the “memory” of our model is. It has a profound bearing on the outcome of the lyric generation, because declaring that “s” follows “y” is a lot different to declaring that “s” follows “the boy”.

Anyway. Here’s the code that does this with our lyrics:

def generateModel(text, order):
    model = {}
    for i in range(0, len(text)-order):
        fragment = text[i:i+order] #Range is exclusive at upper bound
        nextLetter = text[i+order] #So this is the next letter
        if fragment not in model:
            model[fragment] = {}
        if nextLetter not in model[fragment]:
            model[fragment][nextLetter] = 1
        else:
            model[fragment][nextLetter] += 1
    return(model)

Here’s what the model looks like for the line “sometimes I run sometimes I hide”:

('un somet', {'i': 1})
(' sometim', {'e': 1})
(' run som', {'e': 1})
('ometimes', {' ': 2})
(' I run s', {'o': 1})
('I run so', {'m': 1})
('imes I h', {'i': 1})
('metimes ', {'I': 2})
('imes I r', {'u': 1})
('es I run', {' ': 1})
('mes I ru', {'n': 1})
('times I ', {'h': 1, 'r': 1})
('sometime', {'s': 2})
('etimes I', {' ': 2})
('mes I hi', {'d': 1})
('n someti', {'m': 1})
('run some', {'t': 1})
('es I hid', {'e': 1})
('s I run ', {'s': 1})

Your way forward is pretty well determined, until you hit “times I “, at which point you have a 50/50 chance of the next letter being “r” or “h”. Once you choose on of those, you’re on a one-way track to either running or hiding. This is a trivial example, but running the entire back catalogue of a band through creates an interesting tree of probabilities which will take you in a different direction every time.

3 – Letting the car drive itself

What we need to do now is create a means of using the model to build up some lyrics. The way this works in Markov Chains is to look at the current window, get the next character probabilistically, then with that, move the window one place along, and repeat.

First let’s make a function which, given a current “fragment” and a model, gives you a next character…

def getNextCharacter(model, fragment):
    letters = []
    for letter in model[fragment].keys():
        for occurences in range(0, model[fragment][letter]):
            letters.append(letter)
    return(choice(letters))

This is doing something very simple. It looks up the possible next characters for this fragment in the model, creates a list where each character appears as many times as its weighting, then then chooses one at random and returns it. Skill.

So all that remains now is to use these functions to spool out some lyrics. Our function for that is going to look like this…

def generateLyrics(trainingText, order, length):
    model = generateModel(trainingText, order)
    currentFragment = trainingText[0:order]
    output = ""
    for i in range(0, length-order):
        newCharacter = getNextCharacter(model, currentFragment)
        output += newCharacter
        currentFragment = currentFragment[1:]+newCharacter
    return(output)

So what we’re saying to this function is: here’s some lyrics to learn from, use these to generate me some lyrics of a certain length, using a window size or order of such and such. The function uses the training lyrics to generate a model, then uses that to form, character-by-character, a meandering continuation of the first few characters of the training text. Once the lyrics are built up to the requested length, the string is returned.

The first thing you’re going to want to do with this is to try it for loads of different artists, and even try it for the same artist multiple times to appreciate how non-deterministic it is, so I’d suggest that the following is a nice way of calling the various functions:

band = raw_input("Enter artist:\n")

lyrics = getLyrics(band, apiKey)

newLyrics = generateLyrics(lyrics, 8, 600)

print(newLyrics)

I Can Haz Lyrics?

Yep. Who are some artists with distinctive styles?

Adele

Oh noooo
Let it burn
Oh oh ohhhh
Let it burned while I cried
‘Cause I heard it screaming out your name,
You said I’m stubborn and raised
In a summer haze bound by the surprise
And he will feel like he’s been there for hours
And you’ll walk that mile
Until you kissed my lips and you prefer the floor
God only known each other,
Think of me in the deep (Tears are gonna fall, rolling in the pavements?
Even if, it leads nowhere
And I hear but our eyes, and settle for wrong

via GIPHY

Cradle of Filth

Devildom voyeurs
Ascend to smother the spite seething Draconist
And commit this wolf of the graveyard, of the moon
Lowered Her mask to me
Your soul
Live for the reams
Of verses and curses
That heavenly brow
Crippled seraph shalt cower in illustrious courts
Whilst She entranced divined from the wolves are the rustic summers of my excess
Expurse of a whore
Receiving sole communion from fate
By alighting to discredit rebirth
Alone as a stone cold wish
To see the witch scholared Her
In even darker spheres
Delighting in my cold cell, when the priest comes

Kanye West

yeah yeah yeah, I got packs to get the clouds to break me down
The only rapper AND a producer and a glove, but didn’t have a buzz bigger than the souls of men
Louboutin on the street trying to get by
Stack ya money to buy her a few pairs of new Airs
Cause lately’s been a whole life (Ohh)
And I wonder if you used to feel invisible
Now that wet mouth
Uh, I know she find out what he is owed?
And throw away my bus pass any and every class
Lookin’ at every ass
Cheated on every song and
save their whole deal, Their wrist is on chill
They house warmin’
Sittin’ here, grillin’ people say

via GIPHY

Bob Dylan

Zanzinger killed for no reason to roam
Don’t forget to flash
We’re all gonna meet
At that million dollar bash
Well, the comic book and me, just us, we caught ’em
And that is not
It doesn’t matter of minutes, on bail was out walking
But you and I, we’ve been deceived by the Fountain Bank
One bird book, and a bottle of bread
Yea, heavy and a bottle of bread
Yea, heavy and a buzzard and his weapon took from him
As they rot
While paupers change possessions
Each one means
At times I think that you do
Make me glad I’m in love with your money, pull up your shawl
Won’t you descend from

Led Zeppelin

Brother, I brought you together baby, I’m sure my shot-gun will.
Gonna go walkin’ through the country lanes,
I’ll be singing a song,
Hear me call your friends coming back home.You know I’m the one you want.
I must be time I’m leavin’,
Baby, dry those silver,
I brought you smiling at me,
That’s alright, I’d be the western shore
So now you’d better lay your mornin’ time is now
To sing my song
I’m going ’round the world
I got to find you remember times like these?
To think of us again?
And I do
Tangerine, Tangerine, Tangerine, Tangerine, Tangerine

(The Tangerine line is interesting – I think what’s happened is that at some stage the fragment “Tangerine, Tangerine” has appeared, so literally the only thing that has a chance of following “Tangerine” is “, Tangerine”, because the word is longer than the order.)

Justin Bieber

Rumors spreading ’bout this other guys?
I can see right from wrong
Help me when I got-got your body
Baby no no nobody has got what I need
‘Cause I didn’t believe
When I need is one love, one heart
My one heart
‘Cause I’m in love with you I’m losin’ you
I’m a me tell you one time (girl I love you)
You look so deep
You know you’re standing in front of the camera
She don’t stop until I find) my runaway love
Why can I choose between us no one else,
Want me to,
Baby we can share mine!
I know you care
Just shout whenever

Metallica

Swing the scene
In the city tonight
We are gathered here to maim and kill
‘Cause this is what we have done unto you
But what is truth?I cannot die
Trapped far beyond my fate
I give
You take
This life off from me
Hold my breath as I wish for death
Oh please, God, help me
Hold my breath as I wish for death
Oh please, God, help me
Death in the fast lane is just how it seems to fade away
Drifting further everyday
Getting lost within myself
Nothing can save you
Justice is lost
Justice is raped

Queen

Your mother’s eyes, from your eyes gonna make a big noise
Playin’ in the sand cannot heal me like a jelly fish
I kinda like it
You call me Mister Fahrenheit
I’m trav’ling at the peak of the land
I seen every Wednesday evening
There´s no ending
The Seer he said
Beware the championsFlash a-ah
Savior of the night followed day
And the hub caps all gleam
When I’m cruisin’ in overdrive
Don’t you take me back to you
In rain or shine
You say shark I say bite
You say your folks are telling you
Write my letter feel much better
And use my fancy patter on the multit

You get the idea.

And of course (there’s always an “of course”) the fact that we’re using the Markov Chain approach to generate song lyrics is incidental. This code can be used to generate text that looks like any other text you care to feed it. That could be religious texts, product reviews, sales pitches, anything.

There’s a pretty diverse amount of material already out there, so, now we have this, we probably don’t need humans any more.

The complete code is available on Github.

SMS Spoofing with Python for Good and Evil

It all started with the best of intentions. I was an excitable graduate going through the second puberty of discovering that if you propositioned customers in the right way, a small percentage of them would buy your stuff.

Having been reasonably successful with personalized emails, it seemed that SMS was fair game as an extension. Fuck, Dominos and Royal Mail do it, and they’re only partially more time-relevant than a sale on a website you bought a novelty Luis Suarez biting bottle opener from two years ago. Turned out that my at-the-time boss didn’t agree, reacting in a way that can be summarized as “strongly negative” when he looked at a shirt on our site and simultaneously received a text which said something to the tune of “that shirt would go great with the brown size 11 shoes you bought 82 days ago.”

via GIPHY

(But bosssssss, it’s a nexus of many technologies! Absolutely no way? Bahhh, okay then. No, yeah, I’ve totally been working on other stuff too.)

I digress, though. Nobody cares about that.

It was the “misusing the tech to banter your friends” stage of that development which, ultimately, led to the discovery of something much creepier.

Experience gained from that same stage of development during my email days led me to start off with volume and persistence, because if there’s one thing that computers are good at (and there is), it’s the old water torture. Clockwork SMS, who will do your bidding at £0.05 per message, seemed as good a bet as any to me, so that’s who I “pip install”d. (And then angrily “sudo pip install”d, of course.)

Lolling delightedly to myself, I set this baby loose…

from clockwork import clockwork
import time

api = clockwork.API("Your API code here")

lyrics = open("Bohemian Rhapsody Lyrics.txt", "r").readlines()

for line in lyrics:
    payload = line.replace("\n", "")
    message = clockwork.SMS(from_name = "F Mercury", to = "447000123456", message = payload)
    response = api.send(message)
    print(response)
    time.sleep(300)

… and, with the warm fuzzy feeling that accompanies the knowledge that a computer is busy doing one’s dirty work, retired to enjoy my evening partaking in my favourite pastime of throwing stones at traffic.

My friend was sent a line from “Bohemian Rhapsody” every 5 minutes until the song was done. Picture it: the numbingly-familiar “ding” of an incoming text and the muscle memory that responds by reaching for your nearby phone, maybe even the rational thought that, in all probability, this was just the latest in the pontifications of one F. Mercury, but the agonizing, irresistible, small-but-not-negligible chance THAT IT’S A LEGIT TEXT. No real hardship just to check. That bastard. Wish I hadn’t checked. Not going to check the next one. Sublime.

Text from Freddie.

Turned out he was really ill and I made a basic arithmetic error which resulted in the texts continuing through the night instead of being spread over a couple of hours, so that was rather less funny than it could have been, but I was hardly to know that at the time. And anyway, the fires of curiosity had been stoked in my mind.

Because of course, annoying though this is, it’s fairly obviously banter. Beyond seeking a restraining order, it’s not really going to influence people. Now, saying that, I should also say that this is where the “in the field” element of my noble research ended: what follows involved the phones of others, but also always their knowledge and consent. The life-wrecking potential of unleashing this shit on people fo’ real should be reserved for elected governments and security agencies. DISCLAIMER: don’t do it etc.

You’ll have noticed that the “from” argument in the call is what appears on the recipient’s phone when the SMS comes in. That in mind, the logical extension from “F Mercury”, or any other experimental name, is “Mum”.

message = clockwork.SMS(from_name = "Mum", to = "447000123456", message = "SURPRISE MUTHAFUCKA! Yo punk ass goin down bitch boi!")

Mum checking in.

Tehehehehe. Excellent. What a time to be alive.

There’s one pig in the poke, though: my phone is clever enough to know that this is only from “Mum” in a superficial sense. If I were forced to guess why this was, I’d probably go with the fact that this is from a string, not a number, which is where my real messages from “Mum” come from. So this message isn’t part of any thread. The real messages labeled as “Mum” come, at a presumably more fundamental level, from a phone number, which is saved under an alias in my contacts.

Ahh, but it's not real.

I wonder. I JUST WONDER.

The magic step is absurdly simple. What would happen if I put the from address as a phone number, the phone number stored under “Mum” in my contacts?

message = clockwork.SMS(from_name = "07999654321", to = "447000123456", message = "You don't know anything, and you stink.")

Seems Legit

Long have critics conjectured on the inspiration behind Lionel Ritchie’s time-immemorial opus “Hello, is it Me You’re Looking For?”. Now we know.

Here you have a trivially simple method of injecting legit-looking messages into text conversations with anyone, providing you know how your target stores the spoofee sender in their contacts. In the UK, that’s largely the difference between starting with 07, and +447.

The utterly exquisite reality is that, if your target then replies, that reply goes to the real sender, who is overwhelmingly likely to reply something like “what the hell are you on about?”, this being the first they’ve heard of this shady business. Does that look like backtracking or denial? I ain’t no Member of Parliament, but it sure sounds like it to me.

And if I may quote Mouse from The Matrix, that makes you wonder about a lot of things. How many friendships and/or relationships have fallen afoul of a simple misunderstanding, a simple miscommunication, one statement taken the wrong way, such as could be easily achieved through the above means?

How many fledgling romances need a seductive kick-start to get them over the stalemate of shyness?

Given that your communication is limited to one-way, what series of messages should you go with for maximum impact, and maximum imperviousness to whatever the replies might be?

Could you even construct some confusing pseudo-two-way thing, if you knew the numbers of both parties? Like two colleagues maybe?

Doesn’t bear thinking about, really. Much.