Can data predict the next top pop song?
By exploring language processing and text analysis of lyric data, I hypothesize that there could be enough information to craft new music.
RESEARCH:
Billboard is an American entertainment media brand founded in 1894 originally as an advertising company. The brand began focusing on music in the 1920s and has famously been tracking the top 100 hit songs every year since 1940. This is decided by a combination of sales, radio play, and streaming popularity.
Pop music, or popular music, is exactly as it sounds. It is music that is popular for its time — songs that become ubiquitous in a way, because of their chart appearing status. Pop music has ebbed and flowed over time and can have influence from many different musical genres and styles.
But although, music is an art form, are there ways in which pop music becomes formulaic?
PARSING PROCESS:
How could we begin to make sense of what makes a hit pop song?
I began by looking at an existing dataset I found here. Using a scraping program this data set included the Billboard Top 100 songs form 1965 to 2015. This felt quite overwhelming, so I decided to focus on the most recent year first and create parsing programs that would look at 100 songs instead of 5,000.
To start, my goal was to find most common words, most common phrases, and lyrics that rhyme within the dataset. I used a combination of python and javascript to create parsers with different functions. My python code returns most common words and n-grams (sequences of words). The javascript code uses RiTa.js to find keywords in context as well as rhyming words in their context.
I noticed that the original dataset had quite a few flaws and it greatly effected the outcomes I was getting. I decided to go back and change my dataset (i.e words strung together without spaces, contained non lyrical information) , I selected the lyrics from the number one hit song for each year in the 2010s. With a new dataset I could ensure there were no issues like words being strung together or extraneous information like the song credits appearing as part of the lyric data. However, this is now a very small data set.
The results of the n-gram parser helped me identify words that occurred most among the lyrics. Since songs are often very repetitive I didn’t want this to skew the results. I made sure a word was only added to the count if it appeared again from a different song. I kept common words because they are often important to song lyrics and would increase the chance of finding grammatically correct phrases.
These are the results of the n-gram parsing:
Interestingly, commonality got up to 4 words in sequence with the phrase “admit that I was”. This appears in “Love Yourself” by Justin Bieber and “Somebody That I used to Know” by Kimbra.
Next I used my other program to search for how the word “admit” appeared in the context of the whole song and for what words rhymed with “admit”.
Rhyming appears in the same way:
For the final product the goal would be to answer the question, Can Data Predict the Next Hit Song? However, I realized that this is a really challenging endeavor. I felt like I got back interesting information, but didn’t quite know how to make sense of it. Below is how I began to link phrases across songs to each other.
key:
CAPITAL LETTERS = N-GRAM phrase
bold letters = keyword in context
colored = rhyming words
I added a chord progression column to my dataset as a way to guide the song making process. A really cool find is that although none of the songs in the dataset follow the resulting chord progression themselves, the parser still identified: C F G Am as the most common chords in the dataset and this progression is the one of the most common in pop music generally.
NEXT STEPS:
While I was hoping that data could make the songwriting process easier, I felt that it was just as hard. I feel like I was given puzzle pieces that don’t quite fit together. In a more realized version, there are different ways this could go:
1) with a larger dataset like thousands of songs, I suspect there would be more interesting N-gram results.
2) if lyrics could be parsed by phoneme or sentence structure, and have an algorithm produce all lyrics that match in structure, we could possibly obtain more words/phrases to create rhythm & melody of a new song. Could an algorithm even identify these structures for each part of a song, like verse, chorus, bridge?
3) while I was determined that the song lyrics should be produced only from existing data, there is also an option to use generative models. Perhaps, the computer could create new lyrics based off of what it learns about the dataset.
PROPOSAL: Ultimately, the final data art project would be an actual song that follows a musical & lyrical structure it learned from a dataset of past top hits. It would also be interesting to visualize how the song came to be. I imagine maps that show links between the existing songs in the database, or perhaps visuals that take a more literal approach of representing equations or formulas.