Show Me the (Big) Data!

While I’ve never been involved in the movie business professionally, I’ve had the opportunity to play a few small roles in film.

Image: Business InsiderI appeared as myself in the 2010 documentary film Inside Job, about the financial crisis of 2007-2008. Inside Job premiered at the Cannes Film Festival four years ago, and it won an Academy Award for Best Documentary that same year. I had another small cameo role, also playing myself, in Oliver Stone’s film Wall Street 2: Money Never Sleeps.

Since appearing in those films, I’ve attended both the Cannes Film Festival and the Oscars every year. (If you’re a film fan like me, I highly recommend you attend a film festival.) While I enjoy the downtime and the opportunity to socialize, economics is never far from my mind.

Big Data, Economics, and Hollywood

Lately, I’ve spent some time thinking about big data, economics, finance, and the film industry. While I haven’t written formally on the subject, I’d like to share a few of my informal thoughts with you and discuss some of the benefits big data can provide.

It is well known that movies are big business. Last year, new film releases grossed $10 billion at the box office in the United States alone. While that number is substantial, it represents less than one-third of the global international box office gross, which earned an impressive $36 billion last year. A single-hit blockbuster movie can gross upward of $1 billion worldwide.

The film industry sits on the dividing line between art and commerce. Hollywood films have an audience of literally billions, now that films have true global distribution. The films most suitable for global release are also, typically, among the most expensive to produce. When executives at major film studios commit over $100 million to a single project, they want as much insight as possible into that project’s potential for profitability. Increasingly, the movie business is turning to big data to find that edge.

Can Statistics Predict Blockbusters?

A fair amount has already been written about how statistical models and predictive analysis are used to forecast elections. (The blog, run by statistician Nate Silver, is probably the best-known example in the political arena.) Similar tools have been used to predict Oscar wins, based largely on the historical pattern of Academy votes. This year, economist David Rothschild of Microsoft Research accurately predicted Oscar outcomes in 24 categories, while the Columbus, Ohio-based Farsite Group correctly predicted the outcome of all six major Oscars.

But such forecasting of Oscar votes is retrospective: The films have already been produced, and their initial marketing campaigns are largely completed prior to their nomination. Hollywood studios are already using forward-looking statistical analysis to fine-tune the social media campaigns of their films. Online marketing can be populated, virtually in real time, with data driven by the demographic profiles and sentiment analysis collected at test screenings and previews.

The holy grail of the film industry is the ability to know whether a film will be a hit or a flop before it gets produced. The phrase “Strategically developed content,” which is now beginning to appear in the film world, refers to the digital search for this grail.

The promise of strategic content is that a movie’s themes, script elements, cast, run time, and every other conceivable variable can be used to model whether or not a film will be profitable. Right now, that process is still in its infancy.

Given the colossal size of the movie business, entrepreneurial individuals are already creating and marketing strategic content research to movie studios. The exact data elements and methods by which the models are built, not surprisingly, are closely guarded as trade secrets by the firms who create them.

Data Mining with IMDb

It’s probably too soon to say for certain just how successful this generation of prospective movie forecasting models has been. Certainly, over-budget box office flops have yet to disappear from the Hollywood landscape. Conversely, the techniques that have been used in other domains, and on retrospective forecasting, have provided valuable lessons in the search for better prospective forecasting techniques in film. The richer and more comprehensive a data set is, the better the results it will produce.

You begin to get a sense of the potential, and of how incredibly rich and complex the data sets can get, by visiting the website IMDb (Internet Movie Database), an online repository of film data. IMDb has data from over 2 million movies, and over 4 million cast and crew members, amounting to a grand total of about 130 million data elements.

Amazon, which is consistently ranked as one of the most sophisticated companies in the world for its use of data, has owned IMDb since 1998. (The acquisition dates back to the days when IMDb was used to support Amazon’s videocassette sales.) Amazon has mastered not just the online marketing component of data but also the computationally intensive fields of logistics and supply chain analytics.

In recent years, Amazon has begun to compete more aggressively with Netflix in the rapidly expanding market for online video distribution. It will be interesting to see what moves Amazon makes next: As a leader in cloud services that can credibly compete with Google, and a major player in the distribution and analytics side of the entertainment industry, Amazon has much to offer at the nexus of big data and movies.

In filmmaking, the impact of predictive analytics is just beginning to make itself felt. Analytic techniques are improving, data sets are getting bigger and better, and demand for those services is increasing. Certainly, big data has great promise in the world of film—promise that has already been realized in the field of economics.

Are Crises Predictable?

During 2006-2007, I predicted the global financial meltdown that occurred in 2008-2009. Because of that prediction, I was given the nickname Dr. Doom, but I prefer to think of myself as a realist. I had spent 20 years thinking about currency, banking, and financial crises, as well as sovereign, corporate, and household debt.

This kind of analysis provided me with valuable insights into the fragilities that were then gathering in the global economy—and is the same kind of analysis I use for all my global economic research. When the warning signals began to materialize, I was prepared to compare those signals against the research data that my colleagues were generating.

In my 2011 book, Crisis Economics, I compared economic crises to hurricanes: You may not know precisely when a hurricane is going to strike—but you can both foresee and understand the general pattern of its behavior, even if its time of arrival remains a mystery. I called the first chapter of that book “White Swan,” rather than “Black Swan,” for an important reason: Financial crises are not random and unpredictable outcomes. Rather, every financial crisis is the outcome of a buildup of macro-financial policy, inherent vulnerability within economies, and mistakes that at some point led to a critical tipping point.

At Roubini Global Economics, we make extensive use of quantitative analysis in our macroeconomic coverage. Our statistical models of national economic competitiveness use upward of 100 data elements as inputs for forecasts on 174 different countries. (In addition to analyzing traditional economic data, RGE is currently exploring the next generation of big data tools, which parse the rich data trails that stream off political and economic events as they unfold.)

But despite all the complexity of the quantitative analysis, it’s the qualitative work of the analyst that is the final judgment. 

The key to success in economic analysis and forecasting is having not just technical sophistication, but highly skilled human input, which can overrule the model when it appears that the signal the data is generating is wrong or misleading. Human nature, and by extension economics, will always retain an element of unpredictability—what Keynes called animal spirits—which are notoriously difficult to model.

The Lessons of Big Data

Image: IMDbPerhaps the best introduction to the theme of big data is the Hollywood movie Moneyball, based on the nonfiction book by author Michael Lewis. The movie is about baseball—and about winning and losing. Brad Pitt plays Billy Beane, the Oakland A’s general manager.

The setup for the film is simple: The Oakland A’s are nearly broke—and their team is about to lose three crucial star players they can’t afford to replace. It’s Brad Pitt’s job to find new players, and to do it with what little money they have.

There’s a scene early in the movie that shows what it’s like to try to make decisions when you don’t have access to accurate data. Brad Pitt’s character summons his baseball lieutenants and scouts together to talk about replacing the recently lost players. The conversation quickly descends into a hopeless muddle of clichés—the kind of silly things that people talk about when they don’t have the data to focus on what really matters. The scouts aren’t even discussing the real attributes of the players they are considering—remarking on irrelevant things—such as whether a player is “clean-cut” or “good-looking” or “has a good jaw.”

Brad Pitt tells them in disgust, “You’re not even looking at the problem!”

Pitt’s character ultimately hires a young statistician and baseball wonk named Peter Brand. Brand gives a statistical framework to what Brad Pitt’s character already intuitively senses—that baseball recruiting is foolishly and fundamentally biased. Good players are routinely dismissed because they don’t look right or because they swing the bat in a peculiar way—even if they get on base more often than guys who look the part.

Brand tells Pitt’s character “Baseball thinking is medieval. They’re asking all the wrong questions,” and Pitt’s character is suitably impressed.

It’s an excellent movie, and I won’t spoil the ending for you. The film is about more than just baseball—it’s about seeing old problems in new ways—and exploring more elegant solutions to the problems you didn’t know you had.

It’s what good movies do at their best: take you into a different world, and, in this case, show you the power of data, even if you’re not a baseball wonk or a number cruncher.

Brad Pitt’s character nicely sums up the movie’s rebel spirit of taking on the system with data: “We’re card counters at the blackjack table—and we’re going to turn the odds on the casino.”

Early in the film, before Brad Pitt’s character hires Peter Brand, Pitt asks Brand to join him in a parking garage for a private talk. Brand is just a kid, barely 25 years old, and this isn’t just his first job in baseball, it’s his first job ever. Pitt’s character has one final question for the young wonk, who, as it turns out, has just graduated from Yale.

“What did you study?”

Peter Brand hesitates—then reluctantly confesses his secret.

“Economics. I studied economics.”