Investigation - The Science of Singing Along

I have always been interested in the "physics of pop culture" and how we can analyze the culture that we love to consume every day with data science. To further explore this curiosity, I did a deep dive into Alisun Pawley and Daniel Mullensiefen's research on The Science of Singing Along for a quick Metis investigation presentation. The presentation deck and an overview of key concepts is included below and you can check out the full article here!


The study investigates the contextual and musical factors that incite audiences in music venues to sing along to pop songs. Reserach was carried out over a period of 9 months from November 2006 through July 2007. Field research was conducted at five different entertainment venues in Northern England (described in slide 3). Aprroximately 20 hours of research were performed in each venue. 1050 "song events" were collected for 636 different pop songs and 332 "song events" for 115 different songs were selected using stratified sampling and analyzed. Stratified sampling is a process of dividing members of the population in homogeneous subgroups when subpopulations vary. Sampling is then applied within each stratum to make up the entire sample. This method is advantageous when there is a disparity in population size among different subgroups. Stratified sampling ensures that each subgroup is well represented enough in the sample to provide meaningful results.

Research Design

The dependent variable of interest is the relative number of audience members who are singing along to a song. Both contextual and musical variables (slides 4 and 5) were considered as features in the model. Contextual variables are related to aspects of the specific show, such as venue, age range of audience, "liveness" and relative position of the song in the set. We should keep in mind that all of the data was collected manually by the research and assistants. Having said that, data points were estimated and could possibly be subject to bias. Music variables were broken into three subgroups - Melodic/Harmonic, Vocal Style and Lyrics - each with 9-12 variables.

Exploratory Analysis

First, some exploratory data analysis (EDA) was performed on the full data set. A histogram of percentage of people singing along is shown on slide 6. The distribution is skewed right, implying that most songs inspire a relatively low percentage of the audience to sing along. Slide 7 shows a scatter plot of percentage of people signing along against the relative position of song in the set along with a trendline for each venue. I found these graphs particularly interesting. They provide a sense of the build or arc of a concert and how these differ at each venue. For instance, venue A has a relative flat build which makes sense for a small indie bar. On the contrary venue B and C are night clubs and their scatters imply a steeper build as the show progresses deeper into the set.

Contextual Model Design

The first model was performed by fitting only the contextual variables using a Regression Tree. A regression tree is a classification algorithm that works by recursively partitioning observations into homogenous subgroups where observations on the dependent variable have similar values, that is, the variability of the dependent variable is reduced within each subgroup. A branch ends when it reaches a size limit (set at the discretion of the data scientist) or when a split can't increase homogeneity. Regression trees are useful when there is a nonlinear relationship between predictors and dependent variables and have particular utility when faced with a large feature set since they have built in feature selection as predictors that maximally increase homogeneity will be selected.

The resulting tree model is displayed on slide 10. The results can be interpreted by following a branch through to the final node where the prediction for the percentage of people singing along is presented. For example, if you were at a small venue, the song being played has been in the UK Chart for 1-3 weeks and it is a weekend, then the model predicts that 11% of the audience will be singing along.

This model explained about 40% of the variability when used to predict against a holdout test set. More importantly, it identifies the key contextual factors to be venue size, age of the crowd, day of week, relative position of song in set and length of time in the UK Charts.

Musical Model Design

A similar approach was attempted for the musical factors, however, this only explained about 5% of variability in "sing-alongability" for the test set. This wasn't that surprising since a regression tree model would suggest that there is only one model for songs that people sing along too, while in reality there are many different combinations of features of songs that compel a listener to sing along. To deal with this, a Random Forest model was selected to examine the musical features. A random forest model uses bootstrap aggregation, or bagging, of multiple regression trees to create a more robust model. That is, n regression trees are computed using a random subset of k features. The predicted value for each data point is then determined by taking the average predicted value of each regression tree. Random forests allow the researchers to capture the impact that various combinations of features have on "sing-alongability". The model was run using 32 musical features and 1 contextual feature that encompassed all of the key contextual variables determined above and was able to explain 65% of the variability of the holdout test set.

One of the drawbacks of random forests is that they are not easily visualized and interpretation can be complicated. Feature importance is determined by looking at the decrease in mean prediction accuracy of the forest when the variable is not present, i.e. comparing the prediction accuracy to the full forest to the prediction accuracy using only trees in the forest where the feature is absent. Slide 13 shows a summary of the relative feature importance for each of the musical factors. The absolute value of the importance is not meaningful, however, the delta between features is. We can see that the key musical features are high chest voice, vocal effort/energy, gender of vocalist, vocal embellishments and clarity of consonants.

Another drawback of the random forest model is that we can't tell how the key features are directionally (positively or negatively) impacting the percentage of people singing along directly from the model. In order to do this, the researchers ran individual random forest models on each individual variable and compared the distribution of outcomes where these features differed. For example, on slide 14 we can see that the mean percentage of people singing along is much higher when there is any presence of high chest voice.

Takeaways and Discussion

In summary, the key contextual factors in modeling "sing-alongability" are venue size, age of the crowd, day of week, relative position of song in set and length of time in UK Charts while key musical factors are high chest voice, vocal effort/energy, gender of vocalist, vocal embellishments and clarity of consonants. The positive contextual factors (larger venues, a younger audience, a weekend night, songs that are played later in the set, and more popular songs) are connected to aspects generally associated with more intense revelry. One can also hypothesize that a greater engagement with music in the everyday lives of young people might explain greater levels of singing along. Finally, singers using a high energy, well-projected, high chest voice, with minimal embellishment, are successful in motivating revelers to sing along, perhaps because their anthemic singing becomes a “call to party” – an invitation to participate in the party through singing. The paper goes on to draw some parallels to Neo-tribalism, comparing this "call to party" to a modern day war cry.

Written on November 3, 2015