Can data predict the next top pop song?
By exploring language processing and text analysis of lyric data, I hypothesize that there could be enough information to craft new music.
RESEARCH:
Billboard is an American entertainment media brand founded in 1894 originally as an advertising company. The brand began focusing on music in the 1920s and has famously been tracking the top 100 hit songs every year since 1940. This is decided by a combination of sales, radio play, and streaming popularity.
Pop music, or popular music, is exactly as it sounds. It is music that is popular for its time — songs that become ubiquitous in a way, because of their chart appearing status. Pop music has ebbed and flowed over time and can have influence from many different musical genres and styles.
But although, music is an art form, are there ways in which pop music becomes formulaic?
PARSING PROCESS:
How could we begin to make sense of what makes a hit pop song?
I began by looking at an existing dataset I found here. Using a scraping program this data set included the Billboard Top 100 songs form 1965 to 2015. This felt quite overwhelming, so I decided to focus on the most recent year first and create parsing programs that would look at 100 songs instead of 5,000.
To start, my goal was to find most common words, most common phrases, and lyrics that rhyme within the dataset. I used a combination of python and javascript to create parsers with different functions. My python code returns most common words and n-grams (sequences of words). The javascript code uses RiTa.js to find keywords in context as well as rhyming words in their context.
I noticed that the original dataset had quite a few flaws and it greatly effected the outcomes I was getting. I decided to go back and change my dataset (i.e words strung together without spaces, contained non lyrical information) , I selected the lyrics from the number one hit song for each year in the 2010s. With a new dataset I could ensure there were no issues like words being strung together or extraneous information like the song credits appearing as part of the lyric data. However, this is now a very small data set.
The results of the n-gram parser helped me identify words that occurred most among the lyrics. Since songs are often very repetitive I didn’t want this to skew the results. I made sure a word was only added to the count if it appeared again from a different song. I kept common words because they are often important to song lyrics and would increase the chance of finding grammatically correct phrases.
These are the results of the n-gram parsing: