On February 27th, Dr. Rediet Abebe gave a prescient talk at Georgia Tech. During her talk, entitled “Designing Algorithms for Social Good”, she gave highlights from a research career that has already produced remarkable results at the intersection of algorithms, optimization, and social computing.
The surreal portion of the talk came at the beginning, where she detailed algorithms for the distribution of social benefits that could be optimized to improve the chances that people on the margins survive and thrive in the face of shocks. Shocks like wars, recessions, or viral pandemics.
The slide above — “A Model of Welfare” — is taken from her paper Subsidy Allocations in the Presence of Income Shocks and speaks so directly to the plight of the millions of gig economy workers, restaurant staff, undocumented, underemployed and others whose very ability to find adequate food and shelter are endangered as the impact of the COVID-19 pandemic compounds. In short, a humane and just welfare system has to account for the events that are likely to drive people on the margins — and millions who may consider themselves to be far from the margins — into poverty. With the increase income inequality and the looming impact of climate change and other threats, the reactive and piecemeal approaches to social welfare are clearly not up to the challenge.
The words of Bryan Stevenson, founder of the Equal Justice Initiative resonate: “The opposite of poverty is justice”. It would be an amazing world where the tools of computing were bent toward justice and equity. I hope that you’ll read Dr Abebe’s work, and ask your elected officials for justice. I hope that you contribute whatever you can to ease the burden of those around you.
Here’s a recent talk of Dr Abebe’s for inspiration
Shoshana Zuboff’s 2019 book, The Age of Surveillance Capitalism is a powerful critique of the interplay between capitalism and Big Data. In about 700 pages pages, she makes a convincing argument that the very existence of human autonomy is under threat. In short, Dr. Zuboff is concerned that large companies have perfected a profit loop in which they collect immense amounts of data on consumer behavior, devise models that accurately predict behavior using that data, and then use those models to manipulate consumer behavior to drive profit. It is an intriguing and important read.
It defines several ways of thinking about the data collection and machine learning technologies at the core of this new surveillance capitalism. I think the book falls short in articulating a path forward, especially for people who work in the data industry.
If you’re a data scientist, analyst, or work in any of the myriad jobs behind this phenomenon, you might have concerns about how your 9-to-5 contributes to this mess. If so, you’ve probably asked yourself what to do – how to respond as an ethical, responsible, perhaps moral human being. I’ve put down some thoughts below, chime in if you’d like!
We need to name these things!
My partner always emphasizes that to address an issue, we need to be able to name it as precisely as possible. The devil hides in the details. To those of us with academic training particularly, we have the obligation to identify these names and make them accessible to our people – to make it plain as my grandmothers would say.
In that spirit, there are a few terms that I pulled from The Age of Surveillance Capitalism that helped me view commercial Big Data in a new light.
Rendition is the activity that we (or our ETL flows) engage in when we reduce the features and signals about some human activity or human to some set of observable features and use those features to completely describe the person or activity. We render their engagement with our landing page to the act of clicking, the number of milliseconds from rendering of the page to first clicks, the time spent scrolling, etc.
We render emotion using the straw of a signal that we get from our browser API. She talks about rendition of emotions, rendition of personal experience, rendition of social relations, and rendition of the self: all operations that reduce very rich and complex human activities to numbers.
Behavioral surplus refers to data collected which exceeds the amount required for the application or platform to complete the task at hand. You have an app that provides secure communication. A user happens to talk about a certain brand of sneakers a lot when they’re using your app. Capturing information about their shoe habits doesn’t have much to do with secure instant messaging, but you capture these interactions and share them with a data aggregator who in turn shares them with a shoe retailer. The behavioral surplus is the “sneaker interest”. Your app user who was counting on privacy and security knows that their trust was an illusion when ads for the exact shoes show up everywhere they look online. Maybe that user is undocumented. Maybe ICE shows up at the Foot Locker they shop at.
Instrumentarianism refers to a form of behavioral control, based on Behaviorist models of activity modification, enabled by large cross platform logging of online behavior, combined with predictive machine learning, combined with the capacity to conduct online behavioral experiments. The Instrumentarian regime knows enough about a given population, so that with some certainty C, when an action A is presented to this group, they will perform some behavior B. The regime can thus control this group with a given certainty. Because of the economies of scale, the Instrumentarian need only guarantee a small percentage of response to this stimuli to reap enormous profit.
The right to sanctuary is the human right to some place (real or virtual) that can be walled off from inspection. Surveillance capitalism, through always on devices, potentially removes all sense of sanctuary. Every conceivable space is monitored through “smart” bathroom scales, “smart” televisions, “smart” smoke alarms, robot vacuum cleaners. In this theory, any device is loaded with surplus sensing that reports on and records personal and social interaction far beyond the purpose of the device. That is, your Echo records you even though you might have just purchased it to play your favorite music. For the surveillance capitalist, the device more than pays for itself by providing precious behavioral insights that might have otherwise been hidden in “sanctuary”.
Holes in Zuboff’s analysis
As I was reading, I noticed three glaring oversights in Zuboff’s analysis.
Instrumentarianism is explicitly violent
Zuboff makes the claim that Instrumentarianism does not require violence for control. But I still leave her book with a concern about the potential for violence that Instrumentarianism and Surveillance Capitalism empowers. To quote from the announcement for the upcoming Data for Black Lives Conference
Our work is grounded in a clear call to action – to Abolish Big Data means to dismantle the structures that concentrate the power of data into the hands of a few. These structures are political, financial, and they are increasingly violent.
increasingly violent. We just have to reflect on the way in whic digital manipulation and capitalist interest have coalesced in state supported violence in support of India’s Citizenship Amendment Act The 2019 El Paso shooting or the complex interplay of anti-Black policing with surveillance technologies and corporate interests in tools like Amazon Rekognition illustrate the complex ways in which violence has and could play out in surveillance capitalism.
Impact upon the marginalized is profound
Zuboff does not discuss the ways in which surveillance capitalism is especially exploitative of marginalized populations. Absent was a discussion of how companies like Palantir and private prisons exploit the incarcerated, including asylum seekers. Absent was the discussion of how Instrumentarianism is particularly pernicious and violent with respect to marginalized populations including the homeless, the undocumented, communities of color, the poor, and LGBTQ communities. There are important voices that are startlingly absent in Zuboff’s analysis.
Data sharing can be empowering
Zuboff doesn’t like the idea of the “hive mind” — mass collection of patterns of activity. When people of a group commit to a collective model, it can be a source of empowerment. Many of the policing abuses were identified because everyday people were able to pool knowledge and data – shared spreadsheets documenting the high incidence of stop and frisk, shared cell phone videos of police shootings if innocent citizens. Class action lawsuits against workplace discrimination and the willingness of people to participate in studies to identify housing discrimination are two activities that come to mind in which a “hive mind” (the group discovery of adverse behavior on the grand scale) is able to benefit the individual and the community. I think that she is arguing for transparency – what are the power relationships that the technology is reinforcing – as opposed to the technology itself.
A plan for action
Articulating the answer to “where do we go from here” is where the book falls short. Zuboff claims (convincingly) that nothing less than human agency is at stake. As far as I could tell, Zuboff’s only suggestion is to trust that the EU’s General Data Protection Regulation (GDPR) would come to our rescue. I think that’s too precious a right to trust just the GDPR.
So what to do?
Here are a few thoughts. Got more? Please chime in!
Unite and act
Ruha Benjamin in a recent talk cited an example of how students in a high school refused have their education mediated by a Facebook instructional platform. They demanded face to face instruction with human teachers and they refused to have all of their learning activity be rendered into the Facebook cloud. They lead a successful campaign for real instruction from real teachers and won. According to Benjamin, the most successful campaigns against surveillance capitalism require collective effort – the technology is too ubiquitous for the single individual to have much impact. A protest staged today at a dozen colleges against the use of facial recognition on campus emphasizes the power of collective action.
So you’re never too old or young or too math or technology challenged to act and make a difference. I think the most powerful “next step” is to think through what to do with your family, friends, neighbors, classmates, community. Start locally.
There are a host of recent books, articles, posts, and classes that provide a more complete view and history of how to deal with surveillance capitalism. A sampling includes:
The Detroit Community Technology Project along with other community organizations launched the Our Data Bodies Project which created a fascinating report Reclaiming Our Data that lays out concrete steps to reclaim personal and private data.
Computing is becoming more common in primary school, but in depth education on ethics and computing is still rare. It should be a central component of computing instruction. I’ve come across just a handful of classes on ethical design of data platforms. The topics usually appear in advanced classes or as the final lecture in course on machine learning or data science. Given the immense impact that even the simplest social app can have, we should advocate ethics having a central role in computer science and related fields. Talk to your friends in academia, the high schools and grade schools in your neighborhood. Or just…
A lot of the technology at the core of surveillance capitalism is available to everyday people in spreadsheet packages and cloud environments. The same tools and algorithms at the core of the surveillance regime can be “flipped” to identify and counter manipulative pricing, discriminatory and racist patterns. The groundbreaking work of Rediet Abebe demonstrates the potential of the good that can happen when the tools of data science are used by every day people to improve their lives.
Advocate for transparency and ethical deployment of software systems
Even inside Google, employees were able to advocate for model cards that explain to cloud service users how machine learning models are trained, how they might be biased. Google employees also raised questions about many of the company’s instrumentarianist practices. Certainly, advocating for transparency is one move that can insure that users are provided basic protections. There is still little in the way of openness and ethical training that is provided to users of online experimentation platforms like Optimizely
Build an inclusive workplace
People continue to argue that if we include the perspectives of the marginalized in the development of these platforms, we may be less prone to rush to deploy them in profoundly abusive ways. The presence of marginalized voices in the large surveillance companies remains at unrepresentative levels. Beyond hiring practices, we’ve yet to see wide-scale development of community governance structures implemented. As a data professional advocate for equity and inclusion.
Understand the humanity of your customers
It can be easy sometimes for the data professionals to think of “customers” as a probability distribution, a score, or a click. To use Zuboff’s terminology, our view of our social relationship with the people that use our systems have been subject to a kind of rendition as well. The data that you’re looking at makes it hard to associate a human being with those numbers, much less connect with one. Further, even if you are directly connected with a front end product (for example, making suggestions of videos to watch), you’re more often that not playing a game of aggregates – you’re looking at an abstraction of the actions of millions of people.
Surveillance Capitalism is a worthwhile read despite it’s flaws. We live in a critical time for data science and ultimately it will be up to all of us to determine what direction it takes.
If your company runs A/B tests involving it’s user community, this talk is a must see. Christo Wilson at Northeastern University discusses an analysis his lab ran on how companies use the Optimizely platform to conduct online experiments. Although these experiments tend to mostly be innocuous, there’s a tremendous need for transparency and mechanisms for accountability. How is your company addressing this?
Online behavioral experiments (OBEs) are studies (aka A/B/n tests) that people conduct on websites to gain insight into their users’ preferences. Users typically aren’t asked for consent and these studies are typically benign. Typically an OBE will explore questions such as whether changing the background color influences how the user interacts with the site or whether the user is more likely to read an article if the font is slightly larger.
Avi Wigderson has a nice talk on how Kurt Gödel’s incompleteness theorems bound what can and cannot be computed (or proved) by simple programs.
In this recent post I talked about how Gödel’s theorem was used to show that for certain kinds of learning algorithms, we can’t know definitively whether the algorithm learns the right thing or not. This is kind of equivalent to saying that we can’t definitively know whether there will be a gap in the program’s learning.
The flip side of this, as Wigderson points out, is that it is probably a good thing that there are certain things that are too hard for a program to figure out. This hardness is the key to privacy — the harder it is to decipher an encrypted message, the more you can have confidence in keeping the message content private. This principle is at the core of what allows e-commerce.
Perhaps there is a way to structure our online communications or transactions so that learning our behavior — in pernicious ways — becomes impossibly hard. This might diffuse a lot of the emerging fears surrounding AI.
Wigderson makes his point — what he calls the “Unreasonable usefulness of hard problems” — about 30 minutes into the talk.
Kurt Gödel was one of the great logicians of the 20th century. Although he passed away in 1978, his work is now impacting what we can know about today’s latest A.I. algorithms.
Gödel’s most significant contribution was probably his two Incompleteness Theorems. In essence they state that the standard machinery of mathematical reasoning are incapable of proving all of the true mathematical statements that could be formulated. A mathematician would say that that the consistency (or ability to determine which of any two contradictory statements is true) of standard set theory (a collection of axioms know as Zermelo–Fraenkel set theory) is independent of ZFC. That is, there some true things which you just can’t prove with math.
In a sense, this is like the recent U.S. Supreme Court decision on political gerrymandering. The court ruled “that partisan gerrymandering claims present political questions beyond the reach of the federal courts”. Yeah, the court stuck their heads in the sand, but ZFC just has no way to tell truth from falsity in certain cases. Gödel gives mathematical formal systems a pass.
It now looks like Gödel has rendered his ruling on machine learning.
A lot of the deep learning algorithms that enable Google translate and self driving cars work amazingly well, but there’s not a lot of theory that explains why they work so well — a lot of the advances over the past ten years amount to neural network hacking. Computer scientists are actively looking at ways of figuring out what machines can learn, and whether there are efficient algorithms for doing so. There was a recent ICML workshop devoted to the theory of deep learning and the Simons Institute is running an institute on the theoretical foundations of deep learning this summer.
However, in a recent paper entitled Learnability can be undecidable Shai Ben-David, Amir Yehudayoff, Shay Moran and colleagues showed that there is at least one generalized learning formulation which is undecidable. That is, although the particular algorithm might learn to predict effectively, you can’t prove that it will.
They looked at a particular kind of learning that in which the algorithm tries to learn a function that maximizes the expected value of some metric. The authors chose as a motivating example the task picking the ads to run on a website, given that the audience can be segmented into a finite set of user types. Using what amounts to server logs, the learning function has to output a scoring function that says which ad to show given some information on the user. The scoring function learned has to maximize the number of ad views by looking at the results of previous views. This kind of problem obviously comes up a lot in the real world — so much so that there is a whole class of algorithms Expectation Maximization that have been developed around this framework.
One of the successes of theoretical machine learning is realizing that you can speak about a learning function in terms of a single number called the VC dimension which is roughly equivalent to the number of classes the items that you wish to classify can be broken into. They also cleverly use the fact that machine learning is equivalent to compression.
Think of it this way. If you magically could store all of the possible entries in the server log, you could just look up what previous users had done and base your decision (which ad to show) based on what the previous user had done. But chances are that since many of the users who are cyclists liked bicycle ads, you don’t need to store all of the responses for users who are cyclist to guess accurately which ad to show someone who is a cyclist. Compression amounts to successively reducing information you store (training data or features) as long as your algorithm performs acceptably.
The authors defined a compression scheme (the equivalent of a learning function) and were then able to link the compression scheme to incompleteness. They were able to show that the scheme works if and only if a particular kind of undecidable hypothesis called the continuum hypothesis is true. Since Gödel proved (well, actually developed the machinery to prove) that we can’t decide whether the continuum hypothesis is true or false, we can’t really say whether things can be learned using this method. That is, we may be able to learn an ad placer in practice, but we can’t use this particular machinery to prove that it will always find the best answer. Machine learning and A.I. are by definition intractable problems, where we mostly rely on simple algorithms to give results that are good enough — but having certainty is always good.
Although the authors caution that it is a restricted case and other formulations might lead to better results, there are some two other significant consequences I can see. First, the compression scheme they develop is precisely the same structure that are used in Generative Adversarial Networks (GANs). The GAN neural network is commonly used to generate fake faces and used in photo apps like Pikazo http://www.pikazoapp.com/. The implication of this research is that we don’t have a good way to prove that a GAN will eventually learn something useful. The second implication is that there may be no provable way from guaranteeing that popular algorithms like Expectation Maximization will avoid optimization traps. The work continues
It may be no coincidence that the Gödel Institute is in the same complex of buildings as the Vienna University AI institute.
Avi Wigderson has a nice talk about the connection between Gödel’s theorems and computation. If we can’t event prove that a program will be bug free, then we shouldn’t be too surprised that we can’t prove that a program learns the right thing.
I recently made an inquiry with the City of Atlanta’s Mayor’s office as to the use of facial recognition software. I received the following reply on the Mayor’s behalf from the Atlanta Police Department
The Atlanta Police Department does not currently use nor the capability to perform facial recognition. As we do not have the capability nor sought the use of, we not have specific legislation design for or around facial recognition technology.
Delta Airlines, a company based in Atlanta, continues to promote the use of facial recognition software, and according to this wired article makes it difficult for citizens to opt out of its use.
There are several concerns with use of facial recognition technology, succinctly laid out by the Electronic Frontier Foundation:
Face recognition is a method of identifying or verifying the identity of an individual using their face. Face recognition systems can be used to identify people in photos, video, or in real-time. Law enforcement may also use mobile devices to identify people during police stops.
But face recognition data can be prone to error, which can implicate people for crimes they haven’t committed. Facial recognition software is particularly bad at recognizing African Americans and other ethnic minorities, women, and young people, often misidentifying or failing to identify them, disparately impacting certain groups.
So in other words, the technology has the potential for free assembly and privacy abuses and because the algorithms used are typically less accurate for people of color (POC), the potential abuses are multiplied.
There are on going dialogs (here is the U.S. House discussion on the impact on Civil Liberties) on when/how/if to deploy this technology.
Do me a favor? If you happen to fly Delta, or are a member of their frequent flyer programs, could you kindly ask for non-facial recognition check in? Then asking for more transparency on the use and audit of the software used would be an important step forward.
The argument against the technology is twofold: first, the technology is highly invasive in public spaces and may constitute a direct threat to basic (US) constitutional rights of freedom of assembly; secondly the feature extraction and training set construction methodologies (for newer deep learning based models) have been shown to have racial and gender biases “baked in”. For example, the systems analyzed in Buolamwini’s work are less accurate for Black people and women — either because the data sets used for training include mostly white male faces, or the image processing algorithms focus on image components and make assumptions more common to European faces.
Do you know if such a system is deployed in your city? If so, are there measures to control its use, or make audits available to your community? If not, have you considered contacting your elected representatives to support or discuss appropriate safeguards?
A few days ago I attended the talk “Sparsity, oracles and inference in high-dimensional statistics” by Sara van der Geer who is visiting Georgia Tech. The talk is described here.
But I didn’t record the talk! I had a working iPhone! I only have an after thought photo of the white board that remained after the lecture
Just focus on lambda!
Phones are ubiquitous and there’s nothing like a short clip that can distill some of the essence of an idea, a lecture. Maybe it’s all those “No recording devices, please!” announcements at concerts, or that my videography skills are in need of serious help.
PSA: If you think that someone is bring across some important knowledge, record it — give them their attribution, don’t steal their stuff — but you are sharing knowledge with the world!
So what was the talk about? If you do machine learning, the idea of regularization is probably familiar. L1 regularization a.k.a LeastAbsoluteShrinkage and SelectionOperator ak.a. lasso in particular assigns a penalty on the absolute value of the predictor weights. It’s an technique that reduces the tendency to overfit to the training data. There’s a whole book on it called Statistical Learning with Sparsity that you can download for free!
The amazing thing about lasso is that it also drives the less extraneous parameters close to zero: it can reduce the number of parameters you need in your model, or it results in a model that is more sparse (that is, just remove the close-to-zero parameters from the model). This can make the model faster to compute.
The main things I picked up were that there are some bounds on the error for lasso regularization that can be expressed in terms of the number of parameters and the number of observations you have in your training set. The error should be within a constant of , where I believe that is your guess about the smallest non-sparse weight. You also get a similar expression for a good starting value for the penalty . The p is the number of parameters in your model, and n the number of observations you are training with. Scikit-learn or your favorite machine learning library probably comes with the lasso, but it doesn’t look like the bound results are baked in.
She introduced something called the compatibility constant that’s discussed further in a couple of papers [Belloni, et. a. 2014, Dalalyan 2017]. She also discussed how lasso behaves when you assume that you have noisy observations. The final lecture is September 6th at Georgia Tech on applications to inference.
Wouldn’t it have been better if I’d just recorded it though??
The Foundations of Data Science Boot Camp given last week (August 27 – 31) at the Simons Institute in Berkeley explored how pure mathematics and theoretical computer science are providing actionable insights that the working data scientist can use — or at least ponder.
I found the talk below by Ravi Kannan useful in pointing out how dimensionality reduction techniques like SVD can be used to set clustering up for success. When dealing with immense data sets, this can be the difference between useful or garbage clusters.
I also thought that David P. Woodruff‘s lecture on a dimensionality reduction technique called sketching was impressive for its clarity. As a data scientist or analysis, you’re often in a dilemma when your Impala cluster runs out of memory for that critical model build — you may just have to sample from that terabyte pile of web pages. It is good to know that you have some math magic behind you when the time comes.
Santosh Vempala thinks the seminar was a better value than Netflix. I’m not sure about that, but those were some good lectures.
A.I. and Big Data Could Power a New War on Poverty is the title of on op-ed in today’s New York Times by Elisabeth Mason. I fear that AI and Big Data is more likely to fuel a new War on the Poor unless a radical rethinking occurs. In fact this algorithmic War on the Poor seems to have been going on for quite some time and the Poor are not winning.
Mason posits that AI and Big Data provide three paths forward from the trap of inequality: 1. The ability to match people to available jobs; 2. the ability to deliver customized training that enables people to perform those jobs; and 3. the ability to algorithmically deliver social welfare programs in a more efficient manner.
The first objective seems within the realm of Indeed.com and LinkedIn’s recommendation algorithms and second — personalized training — has a long history in AI systems development. The problem is access: how do you get one of the “good middle-class jobs” in San Francisco when you live in Atlanta and attend a high school that lacks the coursework to prepare you for Stanford? How do you get access to an immersive 3D training environment when your family can’t afford to put down 100 a month for high speed internet and your school lacks the equipment also?
The third part of Mason’s strategy is the most problematic. We’ve seen AI (meaning machine learning and decision making algorithms) used to enforce biased sentencing practices; seen how skewed training data can lead to racial bias in facial recognition; and the use of data-driven methods in predatory lending has also been documented. These examples constitute the tip of a deep problem and still largely un-addressed problem in AI. In short, if the algorithms on which our hopes for transformation are pinned learn from data that reifies the structural racism at the root of social inequity, then we’re simply finding a more optimal route to oppression.
Before we hand over the lives and futures of the most vulnerable members of society to algorithms that we are still trying to fathom, we should strive first for accountability and transparency in algorithms. The efforts underway in New York City to insure algorithmic ethical accountability is one start.
But if machine learning and AI are the new tools of our age, we should empower all people to put the computational tools and conceptual frameworks of data science to work for them. Black Lives Matter activists took the social networking tools to organize protests and share video that has changed and empowered. What could a coming generation do with additional visualization and analytical tools?
It was the prospect of using AI to empower education that first attracted me to the field. I think that the emerging technology has some good to do. But the process must necessarily be participatory. When artists, educators, poets, activists, grocery store owners, gardeners — everyone — can be given access to the tools then I’ll bet on the human capacity to find new paths to expression and opportunity.