# Category: Machine Learning

## Gödel, Incompletness, and AI

Kurt Gödel was one of the great logicians of the 20th century. Although he passed away in 1978, his work is now impacting what we can know about today’s latest A.I. algorithms.

Gödel’s most significant contribution was probably his two Incompleteness Theorems. In essence they state that the standard machinery of mathematical reasoning are incapable of proving all of the true mathematical statements that could be formulated. A mathematician would say that that the consistency (or ability to determine which of any two contradictory statements is true) of standard set theory (a collection of axioms know as Zermelo–Fraenkel set theory) is independent of ZFC. That is, there some true things which you just can’t prove with math.

In a sense, this is like the recent U.S. Supreme Court decision on political gerrymandering. The court ruled “that partisan gerrymandering claims present political questions beyond the reach of the federal courts”. Yeah, the court stuck their heads in the sand, but ZFC just has no way to tell truth from falsity in certain cases. Gödel gives mathematical formal systems a pass.

It now looks like Gödel has rendered his ruling on machine learning.

A lot of the deep learning algorithms that enable Google translate and self driving cars work amazingly well, but there’s not a lot of theory that explains why they work so well — a lot of the advances over the past ten years amount to neural network hacking. Computer scientists are actively looking at ways of figuring out what machines can learn, and whether there are efficient algorithms for doing so. There was a recent ICML workshop devoted to the theory of deep learning and the Simons Institute is running an institute on the theoretical foundations of deep learning this summer.

However, in a recent paper entitled Learnability can be undecidable Shai Ben-David, Amir Yehudayoff, Shay Moran and colleagues showed that there is at least one generalized learning formulation which is undecidable. That is, although the particular algorithm might learn to predict effectively, you can’t prove that it will.

They looked at a particular kind of learning that in which the algorithm tries to learn a function that maximizes the expected value of some metric. The authors chose as a motivating example the task picking the ads to run on a website, given that the audience can be segmented into a finite set
of user types. Using what amounts to server logs, the learning function has to output a scoring function that says which ad to show given some information on the user. The scoring function learned has to maximize the number of ad views by looking at the results of previous views. This kind of problem obviously comes up a lot in the real world — so much so that there is a whole class of algorithms Expectation Maximization that have been developed around this framework.

One of the successes of theoretical machine learning is realizing that you can speak about a learning function in terms of a single number called the VC dimension which is roughly equivalent to the number of classes the items that you wish to classify can be broken into. They also cleverly use the fact that machine learning is equivalent to compression.

Think of it this way. If you magically could store all of the possible entries in the server log, you could just look up what previous users had done and base your decision (which ad to show) based on what the previous user had done. But chances are that since many of the users who are cyclists liked bicycle ads, you don’t need to store all of the responses for users who are cyclist to guess accurately which ad to show someone who is a cyclist. Compression amounts to successively reducing information you store (training data or features) as long as your algorithm performs acceptably.

The authors defined a compression scheme (the equivalent of a learning function) and were then able to link the compression scheme to incompleteness. They were able to show that the scheme works if and only if a particular kind of undecidable hypothesis called the continuum hypothesis is true. Since Gödel proved (well, actually developed the machinery to prove) that we can’t decide whether the continuum hypothesis is true or false, we can’t really say whether things can be learned using this method. That is, we may be able to learn an ad placer in practice, but we can’t use this particular machinery to prove that it will always find the best answer. Machine learning and A.I. are by definition intractable problems, where we mostly rely on simple algorithms to give results that are good enough — but having certainty is always good.

Although the authors caution that it is a restricted case and other formulations might lead to better results, there are some two other significant consequences I can see. First, the compression scheme they develop is precisely the same structure that are used in Generative Adversarial Networks (GANs). The GAN neural network is commonly used to generate fake faces and used in photo apps like Pikazo http://www.pikazoapp.com/. The implication of this research is that we don’t have a good way to prove that a GAN will eventually learn something useful. The second implication is that there may be no provable way from guaranteeing that popular algorithms like Expectation Maximization will avoid optimization traps. The work continues

It may be no coincidence that the Gödel Institute is in the same complex of buildings as the Vienna University AI institute.

Avi Wigderson has a nice talk about the connection between Gödel’s theorems and computation. If we can’t event prove that a program will be bug free, then we shouldn’t be too surprised that we can’t prove that a program learns the right thing.

## Building Thousands of Reproducible ML Models with pipe, the Automattic Machine Learning Pipeline

My colleague Demet Dagdelen explains the process of building machine learning models at scale for customer insights at Automattic

## I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application

I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application

Interesting paper on mobile user churn prediction at Snapchat

## You should record technical talks!

A few days ago I attended the talk “Sparsity, oracles and inference in high-dimensional statistics” by Sara van der Geer who is visiting Georgia Tech. The talk is described here.

But I didn’t record the talk! I had a working iPhone! I only have an after thought photo of the white board that remained after the lecture

Just focus on lambda!

Phones are ubiquitous and there’s nothing like a short clip that can distill some of the essence of an idea, a lecture. Maybe it’s all those “No recording devices, please!” announcements at concerts, or that my videography skills are in need of serious help.

PSA: If you think that someone is bring across some important knowledge, record it — give them their attribution, don’t steal their stuff — but you are sharing knowledge with the world!

So what was the talk about? If you do machine learning, the idea of regularization is probably familiar. L1 regularization a.k.a Least Absolute Shrinkage and Selection Operator ak.a. lasso in particular assigns a penalty on the absolute value of the predictor weights. It’s an technique that reduces the tendency to overfit to the training data. There’s a whole book on it called Statistical Learning with Sparsity that you can download for free!

The amazing thing about lasso is that it also drives the less extraneous parameters close to zero: it can reduce the number of parameters you need in your model, or it results in a model that is more sparse (that is, just remove the close-to-zero parameters from the model). This can make the model faster to compute.

The main things I picked up were that there are some bounds on the error for lasso regularization that can be expressed in terms of the number of parameters and the number of observations you have in your training set. The error should be within a constant of $\sqrt{s_{o} log(p)/n}$ , where I believe that $s_{0}$ is your guess about the smallest non-sparse weight. You also get a similar expression for a good starting value for the penalty $\lambda >> \sqrt{ log(p)/n}$. The p is the number of parameters in your model, and n the number of observations you are training with. Scikit-learn or your favorite machine learning library probably comes with the lasso, but it doesn’t look like the bound results are baked in.

She introduced something called the compatibility constant that’s discussed further in a couple of papers [Belloni, et. a. 2014, Dalalyan 2017]. She also discussed how lasso behaves when you assume that you have noisy observations. The final lecture is September 6th at Georgia Tech on applications to inference.

Wouldn’t it have been better if I’d just recorded it though??

## AI and the War on Poverty

A.I. and Big Data Could Power a New War on Poverty is the title of on op-ed in today’s New York Times by Elisabeth Mason. I fear that AI and Big Data is more likely to fuel a new War on the Poor unless a radical rethinking occurs. In fact this algorithmic War on the Poor seems to have been going on for quite some time and the Poor are not winning.

Mason posits that AI and Big Data provide three paths forward from the trap of inequality: 1. The ability to match people to available jobs; 2. the ability to deliver customized training that enables people to perform those jobs; and 3. the ability to algorithmically deliver social welfare programs in a more efficient manner.

The first objective seems within the realm of Indeed.com and LinkedIn’s recommendation algorithms and second — personalized training — has a long history in AI systems development. The problem is access: how do you get one of the “good middle-class jobs” in San Francisco when you live in Atlanta and attend a high school that lacks the coursework to prepare you for Stanford? How do you get access to an immersive 3D training environment when your family can’t afford to put down 100 a month for high speed internet and your school lacks the equipment also?

The third part of Mason’s strategy is the most problematic. We’ve seen AI (meaning machine learning and decision making algorithms) used to enforce biased sentencing practices; seen how skewed training data can lead to racial bias in facial recognition; and the use of data-driven methods in predatory lending has also been documented. These examples constitute the tip of a deep problem and still largely un-addressed problem in AI. In short, if the algorithms on which our hopes for transformation are pinned learn from data that reifies the structural racism at the root of social inequity, then we’re simply finding a more optimal route to oppression.

Before we hand over the lives and futures of the most vulnerable members of society to algorithms that we are still trying to fathom, we should strive first for accountability and transparency in algorithms. The efforts underway in New York City to insure algorithmic ethical accountability is one start.

But if machine learning and AI are the new tools of our age, we should empower all people to put the computational tools and conceptual frameworks of data science to work for them. Black Lives Matter activists took the social networking tools to organize protests and share video that has changed and empowered. What could a coming generation do with additional visualization and analytical tools?

It was the prospect of using AI to empower education that first attracted me to the field. I think that the emerging technology has some good to do. But the process must necessarily be participatory. When artists, educators, poets, activists, grocery store owners, gardeners — everyone — can be given access to the tools then I’ll bet on the human capacity to find new paths to expression and opportunity.

## Artificial Intelligence at Historically Black Colleges (HBCUs)

If you would like to add to it, just send me an email.

Why am I doing this? Last month in AI and the Souls of Black Folk, I tried to make a case that people from all walks of life — particularly those from historically oppressed groups — have a part to play in shaping how the technology evolves. I think that HBCUs can be a catalyst for making AI an inclusive and responsive undertaking.

The list is a starting point.

## AI and the Souls of Black Folk

The impact of AI on communities of color — particularly through job displacement and policing — is now undeniable. Given that HBCUs have historically been on the forefront of technology education for the Black community, I am proposing to build a list of current activities (courses, research, seminars, clubs, etc.) at the HBCUs relevant to AI and its wider implications. If you’d like to contribute to the list, I’ll eagerly accept your input! To understand some of my motivations, keep reading.

We’ve now reached the point where the impact of Artificial Intelligence (AI) upon everyday life is undeniable. Everyone takes Siri for granted, your local Walmart can hook you up with a drone that does object recognition, and the introduction of self-driving cars now seems inevitable.

The title of my post is inspired both by W.E.B. Du Bois’s classic The Souls of Black Folk — a collection of essays on the state of African Americans at the start of the 20th century — and by Tracy Kidder’s The Soul of a New Machine which chronicles the development of a computer architecture at the end of the 20th century. I think that at the start of the 21st century, a critical look at how African Americans are impacted by immense technological change is needed. The title tries to capture my central question:

What is the impact of AI and related technologies on the lives of Black folk, and how can we organically shape a future for these technologies that enhances opportunity rather than reifies oppression?

To be honest, I am deeply concerned about the potential AI has for disruptive and devastating impact on communities of color. The Obama administration released a sobering assessment of the  economic impact of AI — it forecasts that changes in the transportation sector alone (trucking and delivery) will mean the elimination of occupations which Black and Brown folk have relied upon for entry into the middle class. Those findings are likely to generalize to other occupations. The prevalence of predictive policing and algorithmic sentencing raise serious concerns about equality and self-determination — especially when mass incarceration and other racial disparities in criminal justice are taken into account.

In theory, a modern democracy should allow impacted communities to raise concerns about a technology and then foster the deliberative processes necessary to fairly address those concerns. In theory, the open source movement provides a model through which communities can identify and develop technologies that serve their particular needs.

You might respond that “technology is colorblind, science is colorblind, it shouldn’t matter whether there are any Black folk involved at all in the development of and policy making around AI technology“. I think in this case particularly, it matters a great deal. AI, looking back over its history, is itself an endeavor that grapples with the question of what it means to be human — it is an endeavor that demands broad societal input.

Aside from President Obama’s initiative, I see very little presence of the disenfranchised in discussions on the future course of AI. For example, OpenAI is a research institute of sorts formed with the express purpose of “discovering and enacting the path to safe artificial general intelligence“. Despite lofty claims OpenAI seems to have the traditional Silicon Valley underrepresentation.

So all that said, what is the simplest concrete contribution I can make?

I have spent most of my career in AI. I grew up in Atlanta, attended Morehouse College and Georgia Tech through the Atlanta University Center’s Dual Degree Program, and went on later to complete a doctorate in computer science at the University of Chicago focusing on robot planning and learning. Along the way I studied and worked with other Black people doing advanced computing, witnessed Black people found successful technology startups and saw Black women and men lead successful academic careers in these fields. On the one hand, the diversity (exclusion?) figures we see from Facebook and Google seem at odds with  that experience. On the other hand, it jibes with the experience of being “the one and only” in many places I’ve worked or studied at. I wanted to begin to quantify and understand the dimensions and particulars of exclusion — things just don’t seem to add up, so perhaps we are looking in the wrong places and asking the wrong questions when we conclude there are no Black folk doing AI?

The Historically Black Colleges and Universities (HBCUs) provided the fertile intellectual soil in which Du Bois’s ideas sprouted and grew. So I thought my first concrete step would be to take inspiration from Tracy Chou’s Women in Software Engineering and put together a set of Google spreadsheets that document how HBCUs are looking at AI.

I created the spreadsheet AI at HBCUs. Please give it a look. Right now it is an aspirational document in that it tries to gather up any kind of activity at the HBCUs related to AI, Machine Learning, or Data Science. Hopefully it can be the basis of other kinds of summary statistics, update posts,  or active development efforts.

I’ve split the document into a number of sheets:

 Sheet Name Description HBCUS Name and address information for US HBCUs based on information obtained from IPEDS Course Information on AI related courses taught at the institution. Any department. Grants Information on grants received by the HBCU for AI related work Publications Publications on AI related topics. Clubs Student related clubs. For example a robotics club, drone club, a group formed for Kaggle competitions, etc. Workshops and Seminars Has the institution hosted any seminars or workshops? Links to videos would be great. Outreach Any Saturday events for grade schoolers? Teach-ins for community organizers? Graduate Placements Any numbers on the graduates who’ve gone on to careers, graduate school or internships in AI related fields.

Here’s how I think this could work. If you are a  faculty or a student at a HBCU, you can for the time being send an email to me charles.cearl@gmail.com with information on courses, seminars, research, clubs, outreach programs or other related activity at your institution. I’ll manually post your information to the relevant sheet. If there’s enough interest, I can just set this up to allow direct update (through pull requests or direct editing of the relevant sheet). I’m open to suggestions on formatting, information gathering, and overall focus.

Let’s get the discussion started!