The Foundations of Data Science Boot Camp given last week (August 27 – 31) at the Simons Institute in Berkeley explored how pure mathematics and theoretical computer science are providing actionable insights that the working data scientist can use — or at least ponder.
I found the talk below by Ravi Kannan useful in pointing out how dimensionality reduction techniques like SVD can be used to set clustering up for success. When dealing with immense data sets, this can be the difference between useful or garbage clusters.
I also thought that David P. Woodruff‘s lecture on a dimensionality reduction technique called sketching was impressive for its clarity. As a data scientist or analysis, you’re often in a dilemma when your Impala cluster runs out of memory for that critical model build — you may just have to sample from that terabyte pile of web pages. It is good to know that you have some math magic behind you when the time comes.
Santosh Vempala thinks the seminar was a better value than Netflix. I’m not sure about that, but those were some good lectures.