The science of data science

The Foundations of Data Science Boot Camp given last week (August 27 –  31) at the Simons Institute in Berkeley explored how pure mathematics and theoretical computer science are providing actionable insights that the working data scientist can use — or at least ponder.

I found the talk below by Ravi Kannan useful in pointing out how dimensionality reduction techniques like SVD can be used to set clustering up for success. When dealing with immense data sets, this can be the difference between useful or garbage clusters.

I also thought that David P. Woodruff‘s lecture on a dimensionality reduction technique called sketching was impressive for its clarity. As a data scientist or analysis, you’re often in a dilemma when your Impala cluster runs out of memory for that critical model build — you may just have to sample from that terabyte pile of web pages. It is good to know that you have some math magic behind you when the time comes.

Santosh Vempala thinks the seminar was a better value than Netflix. I’m not sure about that, but those were some good lectures.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.