Quantitative approaches for evaluating the influence of films using the IMDB database

wizard of oz-imdb
Photo Gallery of The Wizard of Oz, from IMDB, the data source for this study.

Why do films certain remain influential throughout film history? The purpose of this paper is to attempt to answer this question. To do so, we adopt some quantitative approaches that facilitate an objective interpretation of the data. The data source we have chosen for this study is the Internet Online Movie Database (IMDb), and in particular, one of its sections called Connections, which lists references made to a film in subsequent movies and references made in the film itself to previous ones.

The extraction and analysis of these networks of citations allows us to draw some conclusions about the most influential movies in film history, identifying their distinguishing features, and considering how their popularity has evolved over time.

Influential films, film history, IMDb, connections, database, quantitative approaches

Imagen 1-Wizard
Source: Canet, F.; Valero, M.A.; Codina, L. 2016. Click to download

What are the most influential films in film history?

What features define these films? Why films do certain remain popular over time? In other words, why are some films kept alive in the collective memory while others are consigned to oblivion? The main purpose of this paper is to attempt to answer these questions. To do so, we adopt some quantitative approaches that facilitate an objective interpretation of the data.

The data source we have chosen for this study is the Internet Online Movie Database (IMDb), for two reasons: first, it is the biggest and the most well-known movie database on the Internet, and has an extensive range of descriptive categories that allow for a multilayered analysis; second, a significant number of previous studies have used this dataset, which supports its value as a resource for scientific research.

One of these previous studies is by Max Wasserman, Xiao Han T. Zeng and Luís A. Nunes Amaral, who, in their paper titled «Crossevaluation of metrics to estimate the significance of creative works»,define the concept of significance as «the lasting importance of a creative work».

According to these authors, a film’s significance can be measured by taking three factors into account: quality, impact and influence. While the concept of quality is self-explanatory, the other two terms require some clarification.

Impact is defined by the authors as «the overall effect of a creative work on an individual, industry, or society at large», which “can be measured as sales, downloads, media mentions, or other possible means.” Influence, on the other hand, is defined «as the extent to which a creative work is a source of inspiration for later works» (2014b: 1). In this article, we will be focusing mainly on this last factor, as we consider «influence» to be a central concept for explaining the significance of a movie over the course of film history     

Filmmakers are thus the ones who determine the influence of films over time, when they look back on their own film heritage and choose one movie or another as a source of inspiration for their own works. According to Marijke de Valck and Malte Hagener (2005: 15), this tendency to use film history “as a limitless warehouse that can be plundered for tropes, objects, expressions, styles, and images from former works” is a well-known phenomenon.

In any academic milieu, the most influential scholars are the ones whose works are quoted most by their colleagues in subsequent articles. Hence the current popularity of the «h index», which attempts to measure a researcher’s impact through the citations that his or her works receive. It would seem logical that if we can define «influential articles» in this way, we can similarly speak of «influential movies» as those which are cited most often in subsequent films.

the godfather
Source: Canet, F.; Valero, M.A.; Codina, L. 2016

However, while scholars explicitly state their sources in their texts, filmmakers do not, which poses the problem of how to identify citations made by filmmakers and who is responsible for identifying them. The obvious answer is that this responsibility falls upon the audience. Thus, in cinema, a citation is only effective if someone recognizes it, as filmmakers invite spectators to play a game of identifying what is familiar. Sigmund Freud describes this discovery of the familiar as a source of human pleasure, while Umberto Eco (2005: 108) suggests that it activates the spectator’s «encyclopedic competence». In this respect, it is important to note that the recognition of a cinematic citation depends on the spectator’s personal competence, a fact also pointed out by film scholars such as Vera Dika (2003: 103) and Pam Cook (2005: 168).

In recent years, the potential for encyclopedic competence has been greatly enhanced by the Internet and the birth of social networks that invite users to participate. IMDb, for example, is not only a database but also a social network whose users share their opinions and knowledge, contributing content to the largest digital collection of data not only on films but also on television programs and video games. Citations recognized by spectators are collected on IMDb under its category of «Connections«. Thus, as Wasserman et al. point out, «by analyzing this citation network obtained from user-edited data, we can investigate the suitability of metrics to estimate film significance based on the spread of influence in the world of motion pictures» (2014b: 2).

It is precisely this that is our main objective in this paper: through the extraction and analysis of these data, to draw some conclusions about the most influential films in cinema history, identifying their distinguishing features, and considering how their popularity has evolved over time. These quantitative results will be interpreted from a film studies perspective, which will facilitate a better understanding within the context of film history .



Useful links


Canet, Fernando; Valero, Miguel Angel; Codina, Lluís. (2016). «Quantitative approaches for evaluating the influence of films using the IMDb database». Communication & Society 29(2), 151-172. https://doi.org/10.15581/