Data visualization, Python, storytelling
Project completed for Digital Narrative and Interactive Design under the guidance of Jessica Fitzpatrick and Dmitriy Babichenko.
No the Film Industry Isn’t Dying: How Hit Movies Have Changed Over Time
Introduction
You might've heard the sentiment that the film industry is dying, or even that it's over completely. And everyone has a different take: maybe it's being brought on by the advent of streaming, or maybe it’s because of the 'formulaic' superhero genre taking the world by storm. Even Martin Scorsese, the iconic and generally well-respected director famous for movies like Taxi Driver and The Wolf of Wall Street, was quoted during 2017 saying, "Cinema is gone...The cinema I grew up with and that I’m making, it’s gone.”
But is it truly gone? Or just changed? Transformed? What can we learn from looking at trends over the past several decades of successful movies? What has evolved and what has stayed the same? Ultimately, how have the movies we love changed over time?
My Dataset
To do this, I’m utilizing a dataset from Kaggle (1). It was scraped from IMDb using python and has the most popular movies per year from the years 1980 to 2020.
I’m a huge lover of movies, so I set out to tell a story on something I genuinely enjoyed writing about and researching. I started with some other datasets on the difference in ratings for popular movies across different critic sites, but that just wasn’t a compelling enough story for me. I ultimately landed on this set because I find the way things have changed over time to be really fascinating and very visually appealing. It also included a lot of data about each movie on the list that I thought would be interesting and not impossibly hard to compare between years. There's lots of columns I'm not utilizing, like the names of movies, stars, writers, directors, etc. It's fun to look at, but not very useful for this project.
What I’m Looking At
There are five key factors I’m looking at for this project: runtime, genre, ratings, budget, and country of origin. I’m looking at all these factors over time and how they’ve changed. I was really excited to find some clear trends and interesting patterns over the years. Originally, I was trying to compare every single data point from every single year from 1985-2019, but that was way too much data and made for overwhelming visualizations. I then tried to compare multiple individual year visualizations against each other, which also wasn’t a great way to see everything.
I finally came to the solution of comparing ~4 years across the 4 decades. I tried to use a variety of numbers in order to get a solid sampling mix. I think this was ultimately the most effective to showcase that change over time, without completely overwhelming the viewer.
Limitations
I am aware that there’s many limitations to my dataset, particularly that I’m only looking at the 200 most popular on IMDb, which is determined by rating on that particular site. I took at look at the demographics (2) of IMDb to find what users looked like. Users skew male, about 2/3, and younger, with over half of users under 34. About 1/3 of the users are from the United States. Therefore, the “most popular” movies are by no means representative of the entire population’s opinion. Also “popular” in this dataset is determined by the ratings of these users, but “popular” in different contexts could refer to a movie’s notoriety or income. It's also important to consider that IMDb was not created until 1990, and did not gain more popularity until it was acquired by Amazon in 1996. Sometimes the movies we look back fondly on were not well-received at the time of release, or vice versa.
I'm attempting to minimize harm by acknowledging here that this is by no means a fully comprehensive and exhaustive production of all hit movies. I also attempt to summarize and explain the information in my visualizations in a fair way that acknowledges imperfections and obstacles. Every person is still entitled to their own personal beliefs about whether or not the film industry is dying. I still think that despite the shortcomings of the source it’s pulled from, it’s still valuable to look at it as a whole, while acknowledging its faults and bias.
Jumping In
Runtime
Figure 1. Averaged runtimes of the top-rated movies from the years 1981 to 2019.
As you can see the average runtime has increased 11+ minutes since its all-time-low in 1986. I've heard the criticism that people have attention spans have shortened, making movies less popular than quicker forms of media, like YouTube videos or TikTok. In that case, I would have expected runtimes of the most popular movies to shrink, when in fact, the inverse is true.
I only have the data of the past forty years, but movies from 1930-1960 were actually quite long. As televisions were introduced into homes, sweeping epics is what drove people to the theater. So then why are we seeing this dip from 1981-1986, and a rise from there? Although it's hard to say definitively, this could very likely be because of the rise of VHS tapes and the desire to fit movies onto a standard-sized one. Movies "lost an average of 10 minutes from 1970 to 1985" (3) Popular movies have crept up back to their original longer runtimes. Superhero movies, a vastly popular genre that accounts for many of the most popular movies in the more recent years of the dataset, are typically 2+ hours long.
Genres
Figure 2. Number of top movies from each genre in the years 1988, 1998, 2008, and 2018.
I will say that this graph varies a little depending on what years you look at, but comedy has consistently and severely dropped in numbers, while action has gone up. It's important to remember that this doesn't mean there's less comedies being made, it just means they are not as a large of a percent of the 200 most popular movies on IMDB. Most superhero movies classify themselves as "action" which also could be why we see that rise in action.
Also, if you feel like there's been an influx in biopics, you're not crazy. Biography shoots up in the more modern years.
I’m so excited to be getting to my visualizations. You can see screenshots of them below, but if you’d like to see my code, interact with some of the elements, and get additional information, you can access the python notebook here.
Lifetime Gross
Figure 3. Number of top movies from with each rating in the years 1986, 1996, 2006, and 2016.
The more niche categories like TV-PG, Unrated, X, and NC-17 are mostly negligible and they have very, very few movies. However, from 1986-2016 there is a discernible decline in top movies being rated PG, and and increase in the PG-13 rating. Upon first glance, this might look like movies are getting more mature. However, the PG-13 rating was not created until July 1, 1984, largely due to Gremlins and Indiana Jones and the Temple of Doom, movies that if released today, would fall under PG-13 (5). So it is more than possible movies were still getting used to the new rating system, and movies that might be qualifying as PG-13 today, were rated as PG at the time.
Regardless, rated R movies seem to reign supreme throughout the years when we think about the top movies.
Ratings
Figure 4. Boxplot displaying the lifetime gross of the top movies in the years 1985, 1987, 1995, 1997, 2005, 2007, 2015, and 2017..
It might be surprising that the IQR and median of the boxplots are actually not the far off from each other year by year. What really increases is the length and spread of the outliers.
Aside: Can you guess what that 1997 outlier is? Think James Cameron.
Country of Origin
Figure 5. Interactive map of where the top 200 movies in 1989 were made.
Figure 6. Interactive map of where the top 200 movies in 2019 were made.
In 2019, the most recent data in the transformed dataset, 112/200 movies were made outside of the US, over 50 more than 30 years before. There's also an increase in movies made in Asia, particularly India (17) and China (13).
Figure 7. Scatterplot of the percentage of top movies from years 1981-2019 that were not made in the US. Line of best fit is displayed in the scatterplot.
The percentage of top movies made outside the United States seems to be trending up, especially in more recent years. In 2019, *Parasite* became the first non-English foreign movie to win Best Picture at the Oscar's. Director Bong Joon Ho said in his acceptance speech: “Once you overcome the one-inch-tall barrier of subtitles, you will be introduced to so many more amazing films” (4).
Conclusion
So no, the film industry isn't dying, but it certainly has changed. Over the past ~40 years, the most popular movies have gotten longer and have come from a wider variety of countries. Less of these popular movies are "comedy" movies or rated "PG", but more are "action" and "biography" movies or rated "PG-13". But some things have stayed the same! R-rated movies seem to be the biggest, top-rated hits, and the majority of films are still from the US (but also keeping in mind the IMDB audience with that statement.) The culture of the world changes, and our movie tastes change with it. And that doesn't have to be scary.
Sources
(1) Grijalva, Daniel. (2020). Movie Industry. Retrieved February 2023 from https://www.kaggle.com/datasets/danielgrijalvas/movies
(2) SimilarWeb. (n.d.). IMDb.com Traffic and Demographic Statistics by SimilarWeb. Retrieved February 16, 2023, from https://www.similarweb.com/website/imdb.com/#demographics
(3) CEC. (2022, February 6). Why movie runtimes keep getting longer. CNN. https://www.cnn.com/2022/02/06/entertainment/movie-runtimes-longer-mcu-batman-oscar-bait-cec/index.html
(4) Garcia, Sandra. (2020, February 12). After ‘Parasite,’ Are Subtitles Still a One-Inch Barrier for Americans? The New York Times. https://www.nytimes.com/2020/02/12/movies/movies-subtitles-parasite.html
(5) Mancuso, V. (2021, August 7). The Classic Movies That Led to the Creation of the PG-13 Rating. SlashFilm. https://www.slashfilm.com/858440/the-classic-movies-that-led-to-the-creation-of-the-pg-13-rating/