The recent emphasis on data and analytics might make it seem like data a relatively new thing; however, data has driven human decisions since the dawn of time. Ever since we have been able to think and plan, we have been using data to inform our decisions. The difference between then and now is the methods that we used to gather data and our ability to process it. For most of human history, we gathered data through our five senses. We made decisions based on what we saw, smelled, tasted, heard, and felt. Our brain was our only processing tool, and we relied upon past experiences to inform us of what might happen in the future.
Advances in mathematics gave rise to new tools that allowed us to quantify our experiences and gave us abilities to capture new types of data and analyze them in new ways. Numbers gave us the ability to count, mathematical operations the ability to perform analysis, calculus the ability to determine likelihoods of events. People who work in the mathematical field of Statistics study these tools, and further advance and apply them to new situations. They developed methods for gathering accurate data through sampling, testing for events through hypothesis tests, finding trends through regressions and time series, and determining truths through experimentation. However by today's standards, the data we gathered was slow, sparse, and focused on only a few subjects.
Our ability to gather data underwent a dramatic change once the digital age came upon us. With the internet and the number of people connected to it, we have suddenly changed from having difficulty gathering enough data about people to having enormous quantities of data. Our best method for gathering data before the internet was using surveys, which were time consuming and specific to certain topics. Data back then was quite sparse. With internet usage data, suddenly the problem shifted to having enormous amounts of data on nearly every person that spans almost all the activities they do!
The problem of data privacy emerged from the abundance and richness of the data we have on people nowadays. Before, when our best measures came from surveys that covered tiny fractions of a population, the chances of an individual being identified were quite slim. They would have to have voluntarily given their data to a trusted source. The trusted source would then take measures to protect the data, as any leak would jeopardize the trust of the source, and then nobody will give them data. That all changed with the internet, as people had to give up some information about themselves if they were to use certain internet services. What data these services were gathering, how they were using it, and their purposes were often made very unclear to users, buried as they were in lengthy legal agreements or even not stated in any clear fashion. These services also do not necessarily need to maintain trust to keep on obtaining data, as many of them have become integral to people’s lives. This has resulted in the rise of problems with regards to data privacy, as people realize what data has been gathered from them without their explicit consent, and simultaneously their relative helplessness in preventing the gathering of this data.
With the news of data breaches, different parties analyzing data for manipulation, and advances in technology and analytics that allow individuals to be identified with pinpoint accuracy, the question of privacy has been pushed to the forefront of society. What does it mean to be private when all our services demand our data? What can we do as individuals to protect our own privacy? What can these organizations that gather all this data do to create trust? And what can be done to ensure that the data is protected and not used maliciously, and instead used for the greater good? I will be going over these topics in future articles.