What are algorithms?

Technically, an algorithm is a set of instructions designed to perform a specific task or, in other words, a list of rules to follow in order to solve a problem or produce a desired outcome. A cooking recipe is an algorithm, for example. However, when we talk about algorithms in the context of computer applications, data and the internet, normally we are referring to a computer programme including an algorithmic system designed for a particular purpose.

Because algorithms can deal with any kind of data or content, and because they are much faster than humans at dealing with big amounts of information, more and more algorithms are being developed and implemented these days by all kinds of entities and organisations. Spotify and Netflix use algorithms to recommend you automatically what music to listen to or which TV show or film to watch. So do Amazon to tell you what you may want to buy next, Google to try to predict what you want to search, and Facebook to decide what to show you in your feed. Public institutions also increasingly use algorithms to decide what taxes you must pay, who has access to financial aid and social services, and which school a child will attend, among many other examples. And security forces and agencies also rely on algorithms to try and predict crime and to identify people through face recognition or by following their online activities, for example.

Machine Learning Algorithms

Traditional algorithms are sequences of steps programmed to achieve a particular goal. If you want to improve the algorithm’s performance, you need to modify those steps until you get the newly desired outcome. However, these days digital technology has made that traditional approach obsolete, and currently most algorithms being developed and implemented are machine-learning algorithms, in which the programme updates itself as it tries to improve its performance automatically and without human intervention: it’s as if the machine were trying to learn.

Machine-learning algorithms often consist on statistical models that, based on particular datasets, try to find out the most relevant information to enhance the algorithm’s performance as more data gets into the algorithmic system. In other words: machine-learning algorithms are statistical models design to predict something or to classify information according to some particular categories, and those algorithmic systems try to improve the accuracy of their prediction or classification independently of human intervention by iteratively updating their statistical models by processing new data.

The fact that machine-learning algorithms are able to modify or even develop new rules and steps for their programme on their own, without human intervention, means that we need to have procedures in place to make sure we can understand how these algorithms work: we need to be able to audit them.

Machine Learning in practice

Imagine a group of scientists that want to develop an algorithm (i.e. a set of rules) that accurately identify and save images of cats in the “Cats” folder and images of dogs in the “Dogs” folder. If they decide to develop a traditional algorithm, then they need to specify themselves a huge set of rules like these:

– If the animal is bigger than “X”, then probably it’s a dog.

– If the animal is smaller than “X”, then probably it’s a cat.

– If the animal’s snout is longer than “Y”, then probably it’s a dog.

– If the animal’s snout is shorter than “Y”, then probably it’s a cat.

The problem is that there are dogs that are much smaller than cats, and there are dogs that have shorter snouts than cats; so the scientists would need to produce and constantly revise a very long list of rules to increase the probability that the programme will correctly identify the images.

Instead, if they decide to develop a machine-learning algorithm, then the scientists can train the algorithm’s statistical model on many pre-identified images of cats and dogs, and then leave the algorithm to develop on its own a set of rules that will allow it to accurately identify new images as cats of dogs. In principle, the more pre-identified data the algorithm is trained on, and then the more new data the algorithm is fed, the more accurate its prediction or classification ability should become. This video offers a good illustration of how machine learning works.


However, the way machine learning works also has limitations and carries its own set of challenges. If the data used to train the system are biased, if for example the data don’t contain enough images of a particular type of cat or dog (or, as in real-life cases, if they don’t contain enough images of women or people of colour or any other minority or disadvantaged group), then the algorithm may develop a wrong set of rules that will make it commit systematic mistakes. In addition to that, the fact that a machine-learning algorithm can write its own sets of rules without human intervention or even knowledge means that it may be very hard to understand how the algorithms works and to make it accountable.

 

Interested in our work?

You can collaborate with the project by sharing with us algorithms that are being implemented around you or by using the information in this directory to foster changes in your community.