The entity specialized in social impact of technology Eticas Foundation has warned about the consequences and impacts that the incorporation of erroneous, corrupt and incomplete data into Big Data is having worldwide.
Eticas has analyzed these cases in which the Big Data becomes “Bad Data” at an international level, causing dozens of problems that affect such fundamental fields as police, health or immigration. In addition, it has collected particular examples of people who have experienced its consequences. Cases apparently as small as having the same name as someone else ended up on a list of defaulters with all the implications that this situation entails.
Big Data is an opportunity to channel the immense amount of data that is generated per second throughout the world, it is the only way to agglutinate, analyze and take advantage of its full potential. But the tools that are used must have all the guarantees of their correct operation since the consequences can be very serious.
“When talking about security and data, it seems that the only concern is its protection, but companies that develop Big Data solutions are not aware that all their processes must be validated in the right way so that the results obtained from their application do not give biased or corrupted solutions. Avoiding these scenarios is fundamental and is not currently being done with the best guarantees as we have seen,” says Gemma Galdon, President of Eticas Foundation.
Bad Data cases in fields such as security, public health or immigration
The incorporation of Bad Data occurs in practically all fields that build the Big Data: those collected by Eticas affect sectors as sensitive as health, security or immigration. Among them are:
- Racial bias in the Oakland, California, police predictive software: which noted that black neighborhoods were committing twice as many drug-related crimes when their actual incidence was similar throughout the locality. It was also biased toward low-income communities, which the software found had disproportionately higher rates than high-income communities. The reality in Oakland is that in every neighborhood, regardless of race or income, drug crimes are uniform.
- AlerHealth alert for amounts of lead in Washington’s waterta para la salud por las cantidades de plomo en el agua de Washington: Changing Washington’s water disinfectant increased corrosion in the city’s lead pipes, raising the amount of lead in the water. Studies had to be conducted by the US Center for Disease Control and Prevention (CDC) to assess how this consequence of poor data collection could affect the health of its inhabitants, reassuring the public.
- Failed Google Flu Forecast Study: The Google Flu Forecast Study would monitor user searches and identify places where many people were investigating various flu symptoms. In those places, the program would alert public health officials that more people were about to get the flu. But a change in Google’s algorithm resulted in certain searches being linked to the flu, causing public health officials to forecast twice as many. This error was even published in the journal Science.
- Deportations for voice data collection failures: up to 7,000 foreign students were expelled from the UK due to voice data recognition errors. The test results from Educational Testing Services, a Princeton-based company, were based on an automated system to identify false test results that turned out to be faulty.
These are some examples of the problems that can result from incorporating erroneous, corrupt or incomplete data into Big Data. This does not mean that we have to give up its advantages “but it does mean that we have to be more careful and pay more attention to how it is running to avoid Bad Data,” concludes Gemma Galdon.