It was a rough night for number crunchers. And for the faith that people in every field — business, politics, sports and academia — have increasingly placed in the power of data.
Donald J. Trump’s victory ran counter to almost every major forecast — undercutting the belief that analyzing reams of data can accurately predict events. Voters demonstrated how much predictive analytics, and election forecasting in particular, remains a young science: Some people may have been misled into thinking Hillary Clinton’s win was assured because some of the forecasts lacked context explaining potentially wide margins of error.
“It’s the overselling of precision,” said Dr. Pradeep Mutalik, a research scientist at the Yale Center for Medical Informatics, who had calculated that some of the vote models could be off by 15 to 20 percent.
Virtually all the major vote forecasters, including Nate Silver’s FiveThirtyEight site, The New York Times Upshot and the Princeton Election Consortium, put Mrs. Clinton’s chances of winning in the 70 to 99 percent range.
The election prediction business is one small aspect of a far-reaching change across industries that have increasingly become obsessed with data, the value of it and the potential to mine it for cost-saving and profit-making insights. It is a behind-the-scenes technology that quietly drives everything from the ads that people see online to billion-dollar acquisition deals.
Examples stretch from Silicon Valley to the industrial heartland. Microsoft, for example, is paying $26 billion for LinkedIn largely for its database of personal profiles and business connections on more than 400 million people. General Electric, the nation’s largest manufacturer, is betting big that data-generating sensors and software can increase the efficiency and profitability of its jet engines and other machinery.
But data science is a technology advance with trade-offs. It can see things as never before, but also can be a blunt instrument, missing context and nuance. All kinds of companies and institutions use data quietly and behind the scenes to make predictions about human behavior. But only occasionally — as with Tuesday’s election results — do consumers get a glimpse of how these formulas work and the extent to which they can go wrong.
Google Flu Trends for instance, looked to be a triumph of big data prescience, tracking flu outbreaks based on trends in flu-related search terms. But in the 2012-13 flu season it greatly overstated the number of cases.
This year, Facebook’s algorithm took down the image, posted by a Norwegian author, of a naked 9-year-old girl fleeing napalm bombs. The software code saw a violation of the social network’s policy prohibiting child pornography, not an iconic photo of the Vietnam War and human suffering.
And a Microsoft chat bot, intended to learn “conversational understanding” by mining online text, was quickly retired this year after its machine-learning algorithm began generating racist comments.
Even well-meaning attempts to harness data analysis for the greater good can backfire. Two years ago, the Samaritans, a suicide-prevention group in Britain, developed a free app to notify people whenever someone they followed on Twitter posted potentially suicidal phrases like “hate myself” or “tired of being alone.” The group quickly removed the app after complaints from people who warned that it could be misused to harass users at their most vulnerable moments.
This week’s failed election predictions suggest that the rush to exploit data may have outstripped the ability to recognize its limits.
...
The danger, data experts say, lies in trusting the data analysis too much without grasping its limitations and the potentially flawed assumptions of the people who build predictive models.
The technology can be, and is, enormously useful. “But the key thing to understand is that data science is a tool that is not necessarily going to give you answers, but probabilities,” said Erik Brynjolfsson, a professor at the Sloan School of Management at the Massachusetts Institute of Technology.
Mr. Brynjolfsson said that people often do not understand that if the chance that something will happen is 70 percent, that means there is a 30 percent chance it will not occur. The election performance, he said, is “not really a shock to data science and statistics. It’s how it works.”
So, what happened with the election data and algorithms? The answer, it seems, is a combination of the shortcomings of polling, analysis and interpretation, perhaps both in how the numbers were presented and how they were understood by the public.
...
...unlike weather prediction, current election models tend to take into account only several decades’ worth of data. And changing the parameters of that data set can also significantly affect calculations.
The FiveThirtyEight model, for instance, is calibrated based on general elections since 1972, a year when state polling began to increase. On Oct. 24, that model put Mrs. Clinton’s chances of winning at 85 percent. But when the site experimentally recalibrated the model based on more recent polls, dating back just to 2000, Mrs. Clinton’s chances rose to 95 percent, Mr. Silver wrote on his blog.
...
“If we could go back to the world of reporting being about the candidates and the parties and the issues at stake instead of the incessant coverage of every little blip in the polls, we would all be better off,” said Thomas E. Mann, an election expert at the Brookings Institution. “They are addictive, and it takes the eye off the ball.”
http://www.nytimes.com/2016/11/10/t...book&nl_art=0&nlid=65763264&ref=headline&te=1
Blame the algorithms, if old.
 
	 
				 
 
		