You should record technical talks!

A few days ago I attended the talk “Sparsity, oracles and inference in high-dimensional statistics” by Sara van der Geer who is visiting Georgia Tech. The talk is described here.

But I didn’t record the talk! I had a working iPhone! I only have an after thought photo of the white board that remained after the lecture

Just focus on lambda!

Phones are ubiquitous and there’s nothing like a short clip that can distill some of the essence of an idea, a lecture. Maybe it’s all those “No recording devices, please!” announcements at concerts, or that my videography skills are in need of serious help.

PSA: If you think that someone is bring across some important knowledge, record it — give them their attribution, don’t steal their stuff — but you are sharing knowledge with the world!

So what was the talk about? If you do machine learning, the idea of regularization is probably familiar. L1 regularization a.k.a Least Absolute Shrinkage and Selection Operator ak.a. lasso in particular assigns a penalty on the absolute value of the predictor weights. It’s an technique that reduces the tendency to overfit to the training data. There’s a whole book on it called Statistical Learning with Sparsity that you can download for free!

The amazing thing about lasso is that it also drives the less extraneous parameters close to zero: it can reduce the number of parameters you need in your model, or it results in a model that is more sparse (that is, just remove the close-to-zero parameters from the model). This can make the model faster to compute.

The main things I picked up were that there are some bounds on the error for lasso regularization that can be expressed in terms of the number of parameters and the number of observations you have in your training set. The error should be within a constant of $\sqrt{s_{o} log(p)/n}$ , where I believe that $s_{0}$ is your guess about the smallest non-sparse weight. You also get a similar expression for a good starting value for the penalty $\lambda >> \sqrt{ log(p)/n}$. The p is the number of parameters in your model, and n the number of observations you are training with. Scikit-learn or your favorite machine learning library probably comes with the lasso, but it doesn’t look like the bound results are baked in.

She introduced something called the compatibility constant that’s discussed further in a couple of papers [Belloni, et. a. 2014, Dalalyan 2017]. She also discussed how lasso behaves when you assume that you have noisy observations. The final lecture is September 6th at Georgia Tech on applications to inference.

Wouldn’t it have been better if I’d just recorded it though??

The science of data science

The Foundations of Data Science Boot Camp given last week (August 27 –  31) at the Simons Institute in Berkeley explored how pure mathematics and theoretical computer science are providing actionable insights that the working data scientist can use — or at least ponder.

I found the talk below by Ravi Kannan useful in pointing out how dimensionality reduction techniques like SVD can be used to set clustering up for success. When dealing with immense data sets, this can be the difference between useful or garbage clusters.

I also thought that David P. Woodruff‘s lecture on a dimensionality reduction technique called sketching was impressive for its clarity. As a data scientist or analysis, you’re often in a dilemma when your Impala cluster runs out of memory for that critical model build — you may just have to sample from that terabyte pile of web pages. It is good to know that you have some math magic behind you when the time comes.

Santosh Vempala thinks the seminar was a better value than Netflix. I’m not sure about that, but those were some good lectures.

My colleague Boris Gorelik continues to deliver the insights of a data science sage. You should study carefully his recent EuroSciPy 2018 talk “Data visualization: from default to efficient”

Then check out his notes on how to continuously improve a talk.

Gayatri interviews Yuyi Morales at Decatur Book Festival

My partner Dr Gayatri Sethi is interviewing celebrated writer and illustrator Yuyi Morales today at the Decatur Book Festival today, Saturday, September 1 from 1:00 to 1:30. They’ll be discussing Morales’ new book Dreamers.

Isn’t it time you talked to your children about what is going on at the border? Dreamers talks about Morales’s own 1994 journey from Xalapa, Mexico to the US. with her child. It is such a beautifully illustrated book, and a profound story with so many layers.

AI at HBCUs Fall 2018

If you teach/study computer science at a Historically Black College or University, or know of someone who does, please check out and pass along the site https://charlescearl.github.io/ai-hbcu/.

I put it up a year or so ago to document the work being done at those institutions to increase the impact and participation of the African Diaspora in shaping the way that AI technologies are developed and used. As this recent report by the ACLU points out, yes algorithmic racism is still a thing.

If you know of important initiatives, interesting classes, or discussions going on at HBCUs around this issue, please feel empowered to check out the repository https://github.com/charlescearl/ai-hbcu and send a pull request. Or drop a comment below.

I was encouraged to see that the Neural Information Processing conference (one of the most attended AI conferences) is taking steps towards inclusion. They keep promising to change their name.

Havana: whimsical artchitecture

Cuba has left us with a lot to think about. Still coming to terms with its lessons on race, identity, the bounty of being out of one’s place of comfort, and most importantly those on human dignity and kindness.

While I make sense of those lessions, I’ll share some photos from Vedado, Trinidad, and points in between that testify to the whimsical, surprising beauty of this country and its people.

Investing in empowerment

Dr King spent his last precious hours advocating for the economic rights of African American sanitation workers in Memphis. In his broader vision, this was one arm of a struggle for justice for the poor and powerless that spanned divides of gender and race.

I recently met Ryan Harrison at the Conference on Fairness, Accountability, and
Transparency  and he shared his amazing post on socially responsible or as he puts it solidarity based investing.

The post, at least for me, presents a new way of thinking about “return on investment”. In other words, the “return” is the uplift and empowerment of our communities in ways that seek to build equity for all instead of maximal profits for a few.

In our brief conversation, Ryan schooled me on bail bonds funds as one example. Since many people can’t afford the bond for minor traffic violations and misdemeanors, they end up having to do jail time, miss work, lose jobs, and thus end up in a downward poverty spiral. Since it’s not supposed to be a crime to be Black, Brown and Poor, non-profit funds such as the Bronx Freedom Fund were setup to provide a route of this particular trap. An investment in the bail bond fund is a direct investment in the economic viability of a given community — like the South Bronx.

As Ryan points out, the move away from the traditional 401k/IRA can be gradual — say 10% of your investment funds allocated to solidarity investments. It is the start of the journey that matters.

The options for where to put your solidarity dollars range from grant based investing (like bail bond funds or in local food cooperatives like the one in the featured image by Steven) to direct lending programs (like Canopy Coop in Boston) to more traditional equity investments like the Shared Capital Cooperative .

Gayatri Sethi (my life partner) is working on a education platform called Alt-College that’s based on this solidarity model.

Do you have any suggestions on efforts to invest in? Strategies that you have put into place for socially conscious investing? Please share!