Recently I’ve said farewell to my academic career and switched back to industry. While I haven’t regretted that move for one moment, I started having some concerns on if I will be able to keep on reflecting on the ethical aspects of my work now that my work has become more pragmatic and my autonomy has decreased. In general I think we as a data science community are not doing enough to guarantee our work does not harm or disadvantage those that interact with whatever we create. In this piece I tried to consolidate my observations and ideas on how to do better than we are currently doing, by adopting a more human-centric and value-sensitive approach.
Data Science and AI
After his contribution to developing the atom bomb, Robert Oppenheimer stated “I have become death, destroyer of worlds”. While data scientists arguably have not contributed to anything with an impact as heinous as the atom bomb, they do create models and systems that affect the everyday life of millions or billions of people. This makes me wonder: how can I ensure that I’ll have to draw a similar conclusion in 5 to 10 years from now? What further fuels this concern is my observations that a significant number of people in the data science and AI community does either not value or possess the skills or background to perform a critical reflection on the ethical impact of their efforts.
This ‘manifesto’ is a consolidation of thoughts that I have been forming since I started working on Data Science. It contains observations on how data science research is done and how this results in blind spots in terms of impact. It describes how my research leading up to and during my PhD tried to avoid keeping blind spots blind, by taking into consideration how users experience AI applications. It contains a reflection on why even that approach was not enough, as not all AI applications are systems that users consciously use (people browsing the internet may not be aware that the content they see is personalized based on algorithmic predictions, or citizens may not realize police patrols are sent to their neighborhood based on historical data). And in conclusion some thoughts I have, despite me having only a faint idea on how we should move forward as a community to ensure in a decade we will not have to look back in embarrassment, shame or guilt about what we have built.
The AI applications we contribute to as data science community are becoming increasingly pervasive. We’re at the point that the majority of information any person is exposed to some extent is the result of AI. This is huge. And at the same time, we are not at all aware of how the technology that we build is affecting the people exposed to it, or society as a whole. There are plenty of examples of AI not improving, but actually decreasing the quality of life for people. Reading through ‘Weapons of Math Destruction’ by Cathy O’Neil gives plenty of examples where algorithms through negative feedback loops harm or disadvantage people, ranging from filter bubbles that pigeonhole users of multimedia platforms into conspiracy theories or echo chambers, to predictive policing algorithms that keep on allocating police resources to patrol disadvantaged areas where petty crimes are committed and taking resources away from preventing bigger crimes.
My own background is that of recommender systems, so throughout this piece I will be using a recommender system as example. Recommender systems are pieces of software that are used by websites that have huge collections of content, such as movies or products in an e-commerce website. The aim of recommender systems is to help users of the websites to find relevant items. They do this by analyzing and comparing interaction behavior among all visitors for patterns that can be used to predict what a visitor is most likely to purchase or consume, based on their historic interaction behavior.
One of the root causes of AI applications missing the goal of improving people’s lives is that the design and development of these applications is often compartmentalized. Already in terms of design, these systems live on different levels of abstraction. Goals are formulated on a strategic level, from which objectively measurable key performance indicators are formulated and data for the solutions are selected. For our multimedia streaming website, we want to increase visitor engagement, which we intend to measure by checking the number of videos people watch.
This gives us enough information to create an algorithm that will produce a predictive model to serve as the basis for our intelligent system and over time the performance in terms of the original metric is improved. Great! So our viewers of our multimedia streaming platform are spending less time searching for movies (good for them), and they watch more movies (good for them and for us).
One tricky part is that as you go to the more granular level of the numbers and the data, you move further away from the context, the strategic goals and thus the people that are exposed to it. The algorithms do not care if we are developing a recommender system or a fraud detection system; the same numbers, rules, parameters apply. So the people building the algorithms and training and implementing models do not even have to understand what their models’ output are used for, as long as they can formulate the problem as a machine learning or data science problem. And by ignoring the context in which your model will be used, things can become weird. Sure, we can make the assumption that a system that makes people watch more movies is performing well and we can train our models to optimize that metric. But when taken to the extreme, this could lead to the undesirable end result of systems making people watch 24 hours a day? In theory that is the perfect model. But in practice that is not at all a desirable, sustainable solution. In other words, it is essential to broaden the scope from the predictions to the original context of the application and how the predictions will affect systems, users, people and the world.
I have a degree in Human-Technology Interaction, which allowed or forced me to adopt a user-centric approach throughout my PhD. I have been taught to not be satisfied when an AI solution solely improves an objective, behavioral metric, such as duration or number of clicks, but to always include the subjective user experience. Are people actually satisfied with the recommender system even if it keeps them watching video for 24 hours on end? As this is the end goal of our system, you cannot suffice by using some proxy of satisfaction such as the number of clicks and the assumption that people that click more are more satisfied. You have to ask them! And to make this even more important, people are quirky. Can they actually be satisfied when they watch the items recommended to them by an algorithm, or would they maybe be more satisfied when doing a manual search, or when they can rely on programming decisions made by a board in a TV station, or when watching something recommended by their friends? It could even be that if all four approaches have the same output, people have different levels of satisfaction.
And to make matters worse, not all users are the same. When looking at systems that learn how to automate tasks users often perform, such as Google’s Nest smart thermostat, we can use AI to develop a system that over time learns to reduce the user effort required to complete a task to a minimum, or even 0 by completely automating it. But what is the effect on the person that used to do this task? What about the people that maybe even like the feeling of control when fiddling with their thermostats?
All of this can be addressed through some form of user-centric evaluation. Looking beyond how an AI application affects the objective user interaction and putting the user experience more central to the evaluation of your AI application already reduces the probability that you create something that harms them. But there are things beyond the user experience. What is the effect of our systems on the humans behind the users? How do recommender systems help people in developing their tastes and preferences and even character? Is having to search for movies to watch or music to listen to and form an opinion on that not part of our personal development? And are recommender systems that not maybe depriving us from an opportunity to develop ourselves?
Why is User Centrality not Enough?
In my time as faculty at Maastricht University, interacting with colleague from other faculties, I slowly came to the conclusion that this user-centric perspective is not sufficient, for two reasons. The first reason is that users are not necessarily the best judge of what is good for them. Every smoker will say they are satisfied with their cigarettes, yet we can quite objectively say that it is harmful for them. Similarly users may indicate that they are satisfied with the output of an AI application, while in the long run this system might actually be harming them. The second reason is that not everybody that is exposed to the actions of an AI application is an actual, conscious user. In some cases people do not have the power to decide whether or not they will be exposed to the application, such as when AI applications are used by government. In other cases they might be interacting with something they do not understand and forcing them to understand that to evaluate their user experience is unrealistic and undesirable.
No (or not all) users can judge what is good for them
When looking at when people’s satisfaction might not be in line with what is good for them we can look at filter bubble recommender systems ¹. A filter bubble is what happens when a recommender system makes an inference about a user’s interests. A system realized that someone might be interested in a certain category of content and start providing more of that content. The result of that is that it becomes more difficult for the user to view any other content and provide evidence to the system that the inferences are wrong. The end result is that the user can get pigeon-holed into for example cycling videos (fairly innocent) or conspiracy theories (less innocent). Can we consider it ‘good’ (in a moral sense) if a person gets pigeon-holed as long as they are happy/satisfied with this?
I think that AI should not only consider the needs/desires of the people interacting with it, but also with values we strive for as a society. Not everything we do is the way we would like it. In some cases, we set some values as a society, such as “everybody needs to have a basic understanding of math”. We need to acknowledge that we have values and these values are embedded in our solutions. These values may have been consciously or unconsciously embedded and it is time to ensure we reflect on the values we embed.
Not all users are using consciously or voluntarily using AI
Apart from the fact that not all people can judge what is good for them, these values are also insufficient because not all users are actually ‘users’. People exposed to for example targeted advertisements might not be aware of the fact that these advertisements are shown to them based on algorithmic predictions. Targeting is what happens when an ad network such as Google’s Adwords uses people’s web browsing behavior to make inferences about their interests and uses these inferences to provide them with ads that are most likely relevant to them. Now some people are users of those systems. I’m definitely a user: I understand how personalization works, I control when I share what information and I block personalized ads when I want. My mother is not so much a user: She’s exposed to a system and its output. She might even incorrectly think that the posts she sees on her Facebook feed are selected by Mark Zuckerberg himself as being worthy of her attention instead of them being based on algorithmic predictions (which brings a whole new problem).
Similarly when it comes to other types of decisions people might not be aware that they are using a system and they might not have the option to not be affected by algorithmic output. Examples of this can be school admission departments or HR departments that employ AI to help in recruitment. Or mortgage providers that employ AI to weigh in or decide to accept or reject an application.
The Future: Human-Centric and Value-Sensitive
How can we ensure that in 10 years we do not look back on the efforts of the past decade as a mistake? There are currently efforts under way to ensure the work we do will stand up to some ethical scrutiny, both from the data science/AI community itself, as well as from a regulatory nature. And both of these efforts still have their shortcomings.
In all honesty, efforts are made to formulate universal values. Fairness, Accountability and Transparency (or FAccT) are becoming values that the machine learning community now strives for. Any machine learning application should result in decisions/predictions/output that is fair, transparent and that someone can take accountability for. At the same time, I personally am not convinced these specific ones should be universal. Sure, accountability is something that makes sense. Nobody should be the subject of decisions that they cannot contest and we also do not want AI that systematically favors one group versus another. But is transparency a value we have to strive for? Transparency itself I think is in line with the value of people being able to understand what they are interacting with, but at the same time it may be at odds with the value of ease of use. Having additional information available of why a decision is made the way it is, forces people to invest time and energy in reading (or at the least deciding whether to read) this additional information.
The main drawback of these community values is that again these values are not user- or human-centric. It is hard to argue that transparency is something that the people exposed to the AI system output will value. It is time to investigate what the people that interact with our applications value, and in all honesty I hardly expect “transparency” to be a value that will come out of this investigation. One values that I do think will come out, is important and should be considered when developing AI is ‘autonomy’. If I ever restart an academic career, this would be one of my research directions.
Another effort to ensure that we are doing no harm is that of adapting the Hippocratic Oath to a Hippocratic Oath for Data Scientists ² . While in principle this is exactly the type of rules of engagement and principles that we need to strive for, sadly it has not widely been recognized by the AI/Data Science community.
Why Being Compliant is Not Enough
To some extent regulations aim to serve as legal boundaries to ensure AI does not harm or disadvantage people. And as AI is becoming more pervasive, more boundaries are put in place. But there are some shortcomings to current regulation and possibly all regulation.
Since developments in AI are happening so fast, regulation quite often takes place after the fact. Something goes wrong (e.g. Cambridge Analytica), and laws are put in place to avoid those from happening again (e.g. GDPR). So the legal boundaries define what we can and cannot do, based on what we have in the past judged as being wrong.
This goes against one of the main design principles of “do no harm”. In this case, harm can and should be taken in the broadest sense of the word: one should not design systems that actively harm people, but also no systems that cost companies, users and the planet more resources than strictly necessary. Having to educate yourself on what cookies do to make a conscious decision of accepting or rejecting them means the European Union is imposing a burden on its citizens ³. A burden that they relatively easily could take on themselves, by stipulating under what conditions visitors can and cannot be tracked.
The best effort you can make to ensure that your actions are not the mistakes that lead to laws in the future, is not by operating within legal boundaries, but within a set of ethical boundaries. They are not set in stone anywhere, except for certain industries where codes of conduct are agreed upon such as journalism and banking. Even then they can be vague and open to interpretation. My set of boundaries in the past has been ensuring that whatever I create is done from a user-centric approach, putting the user and their experience first. I can quite confidently say that I have not created anything that I did not evaluate or intended to evaluate through user studies.
Towards the Future: Human-Centric AI
Responsible AI and ethical AI are two names for seemingly similar things. In my eyes, the only way to ensure that we will not be harming the people that are affected by what we do as data scientists is to adopt a human-centric approach. We cannot only consider how the actions taken by our AI applications affect objective behavior, we have to consider how these applications affect the experience and human lives of those interacting with our systems. This way we can again reduce the probability that we are harming them and even ensure that we are actually improving lives through our work. Going a step beyond, we can align the values embedded in our systems with the values we have as those that release AI applications into the world, we have as users of AI applications or even with the values that we have as society. But it does require us to be aware of and take those values into consideration.
It cannot be acceptable anymore for anyone working on AI to use “I just make algorithms, what they are used for is someone else’s responsibility”. Anyone in the chain from product owner, to data scientist, to data engineer needs to share the responsibility to ensure what we develop is improving the world in a pareto optimum way: not causing harm or disadvantaging anyone. And doing that requires more than asking people exposed to the systems if they are alright with those systems. In some cases it requires us to actually think and decide on their behalf. And while that is scary to some, there is nothing wrong with that. But there are ways to do this, and rather than seeing that as distracting from the work, data scientists should embrace this and use it to empower their own work.
Plenty of literature exist on Innovation Studies, or reflections on how innovations come about. Key aspects of innovation studies are how to make and measure innovation, technological innovation systems (or ways to model innovations), innovation and politics and most relevant for this piece: the effects of innovation, on economy, society and environment. In this approach researchers try to explain the effects of an innovation on its environment by studying the relationship between the objective properties of the new technology and the environment.
Another important aspect is that of value-sensitive design. Since AI applications are created by people and they are based on data collected from people, thus as mentioned before there are always values embedded in the applications we create. They are either embedded in the data we use, or in the metrics we select to optimize, or the actions we choose to take based on our predictions. One way to ensure applications are built with values in mind, is by following an approach called value-sensitive design. This approach allows for designing technology that accounts for human values, by taking stakeholders and their values into account. Among others, ethnographical methods are used to get an idea of what values are relevant and how they are accounted for by the technology to be designed.
How I Will Remain Human-Centric
I am currently in the process of writing a data science roadmap for Obvion. In creating and implementing this roadmap I will try to ensure that resources are allocated to the ethical part of our work. As we are a financial institution, there is a lot of regulation in place regarding to how we do our work, but being compliant should be the bare minimum. My aim is to go above and beyond that to ensure we treat the people we interact with fairly and improve their and our lives. As a mortgage provider, our actions and decisions affect one of the arguably biggest purchases in anyone’s life. My main goal is to ensure that through our actions and decisions we do not harm, disadvantage or treat anyone unfairly.
This will firstly be done in developing the team, where my aim is to create a team that will not focus only on the numbers. Each of us shall have knowledge and understanding about the processes and clients, advisors and colleagues that we are helping with our solutions. We will challenge ourselves, asked to be challenged and challenge those asking us for solutions. Secondly, we will develop and roll out AI applications concentrically along two axes. The first axis is that of providing decision support versus automating decisions. We start out with applications that will provide decision support and not effectuate decisions themselves. As we learn more we will slowly move towards applications that can automatically effectuate decisions. The second axis along which we will shift is by having our solutions start off with an internal focus, looking at decisions and actions that affect us as an organization internally, and slowly moving towards decisions and actions that affect our customers and advisors.
There’s no guarantees, but this is our attempt to minimize the probability of negatively affecting those we interact with.
Thanks to David van Kerkhof for the wicked good feedback!
Last year I was assistant professor in Data Science at Maastricht University’s School of Business and Economics. This year I’m Lead Data Intelligence at Obvion, a mortgage provider in the Netherlands. I realize that aside from a list of publications and some still on-going projects, I have nowhere consolidated my experiences, thoughts and opinions from my time in academia. Even despite the fact that I was hired by Obvion partly for my experiences, thoughts and opinions. Hence my attempt to write down what I did in what I’ve dubbed my ‘Human-Centric AI Manifesto’.
¹ Filter bubbles are still a debate in scientific literature, with both evidence supporting and disproving its existence. This is an indication that there are some external factors that influence whether or not filter bubbles do influence user consumption and thus they cannot be dismissed. If anything, this is an indication that we should be mindful of the fact that technology can influence our thinking and behavior.
³ Another reason why I think Transparency should not be a universal value to strive for.
Originally published at https://aigents.co.