May 18, 2015 | Kris Hammond
The Democratization of Data
While attending the Aspen Ideas Festival, for which I was a part of a panel on the Future of Story Telling, I attended a session on how big data can be used to help cities make decisions about transportation, distribution of services, crime and availability of educational opportunities. The informative and engaging discussion focused on the problems of data collection and how quality data can always help to inform strategic decision-making and tactical planning. It was definitely refreshing – a thoughtful group of people having a thoughtful conversation.
During the panel, someone asked how to provide people with more access to data. The entire panel came back with a similar response: we need more open data and open APIs. This was followed by a comment that to truly democratize data we need to make sure that people become more “data literate.” Meaning, we need to make sure that everyone can understand data as it is presented to them via open data initiatives and freely accessible APIs.
On the surface, this sounds compelling. People becoming more data literate is a good thing, as long as being data literate is not a precondition to getting access to data. Unfortunately, the reality is that most people can only use data after learning how to process and interpret it.
Imagine this scenario – your physician hands over your test results and then points to a stack of books you need to read in order to understand what they mean. Why would we force someone to do this? Unfortunately, this is what it means when people say “in order to cope with this new age everyone needs to become data literate.”
Truly democratizing data requires some work in order to make it more understandable to everyone. Ideally, we would not have to run more queries. We would not have to interpret charts or graphs. We would not have to perform time series analysis, cohort comparisons or table joins in SQL. We would not have to understand the analysis that underlies the results.
Data provides us a window into the world, and we should not deny people access to that world just because they don’t know how to use certain tools – this is not democratization.
Democratization requires that we provide people with an easy way to understand the data. It requires sharing information in a form that everyone can read and understand. It requires timely communication about what is happening in a relevant and personal way. It means giving people the stories that are trapped in the data so they can do something with the information.
This is particularly evident in the world of education. Think for a moment about standardized tests. Educators, students and parents are now confronted with masses of data about student performance that is incomprehensible to all but the most skilled data analysts. Staring at spreadsheets, charts and graphs do not help people get to the story beyond the numbers. In order to understand the assessment and advice that could improve student and classroom performance, someone has to dive into the data and interpret it. Training every teacher, administrator and parent how to do this is simply absurd. It is even more absurd when you realize that the analysis that needs to be done is always the same. All we need to do is apply the same analysis over an over again to get to the information and insight that students needs so they can get better.
Of course, if the analysis is the same in most cases, it means that the work of looking at the data, extracting facts from it, and then transforming those facts into a narrative can be done by an intelligent system. Rather than force everyone in the world to be a data scientist and communications expert, why not just teach the machine to do it so that the work can be done at scale?
Democratizing data does not mean dropping a huge spreadsheet on everyone’s desk and saying, “good luck.” We can now configure a machine to look at that data and automatically explain it, so people can understand and act on it.
Ironically, it’s a machine itself that is going to free us from the limitations of data.
By Kris Hammond. Kris Hammond is the Chief Scientist of Narrative Science. Connect with Kris on Google+ and Twitter.