My colleague Demet Dagdelen explains the process of building machine learning models at scale for customer insights at Automattic
I am happy to announce that The Little Shop of Stories bookstore in Decatur, GA is awesome for data science! A few blocks away from us, it is such a regional treasure for children’s books and events. Diane has brought game changing books, authors, and programs to Atlanta and environs.
But last week I was ecstatic when I came across a treasure of data visualization on the shelves.
The book I am referring to is W. E. B. Du Bois’s Data Portraits: Visualizing Black America. But if you live in the Atlanta area, please get it at Little Shop — Amazon can make it without your dollars.
You may be aware of Dr W.E.B. DuBois work in championing and defining civil rights for peoples of the African diaspora during the first half of the twentieth century. You might be aware of his book The Souls of Black Folk , his leadership of the NAACP, and his intellectual nurturing of African independence efforts. But his work at the Atlanta University Center (now Clark Atlanta University) stands the test of time for how to do good data visualization.
Visualizing Black America pulls together the amazing visualizations that he and his AUC students developed for the 1900 Paris Exposition. They are beautiful, innovative, meticulous and tell the story of Black America at the beginning of the 20th century.
that he and his AUC students developed for the 1900 Paris Exposition. They are beautiful, innovative, meticulous and tell the story of Black America at the beginning of the 20th century.
We are so lucky in the Atlanta area to have a bookstore with the vision to stock this treasure. Stop through if you are in the ATL.
The Foundations of Data Science Boot Camp given last week (August 27 – 31) at the Simons Institute in Berkeley explored how pure mathematics and theoretical computer science are providing actionable insights that the working data scientist can use — or at least ponder.
I found the talk below by Ravi Kannan useful in pointing out how dimensionality reduction techniques like SVD can be used to set clustering up for success. When dealing with immense data sets, this can be the difference between useful or garbage clusters.
I also thought that David P. Woodruff‘s lecture on a dimensionality reduction technique called sketching was impressive for its clarity. As a data scientist or analysis, you’re often in a dilemma when your Impala cluster runs out of memory for that critical model build — you may just have to sample from that terabyte pile of web pages. It is good to know that you have some math magic behind you when the time comes.
Santosh Vempala thinks the seminar was a better value than Netflix. I’m not sure about that, but those were some good lectures.
The recent election of Doug Jones to the U.S. senate in Alabama — thanks largely to African American turnout — got me thinking: What if the Black populations of Southern cities were to experience a dramatic increase? How many other elections would be impacted?
Does that seem far-fetched? Over a tenth of the Black population of the U.S. left the South during the first half of the last century.
They moved from the rural South to the North and West, hoping to escape race-based terrorism and find economic opportunity. The featured image, from the U.S. Library of Congress, is an infographic made in 1950 by the Census department about the migration. My grandparents were part of this movement — they left oppression in small town Georgia and Alabama hoping to find a (slightly) better situation in Atlanta.
As the U.S. census figure infographic below indicates, this migration — one wave in 1910 – 1940 and another wave coming 1940 – 1970 — was epic. Isabel Wilkerson’s book The Warmth of Other Suns is a gripping history of this Great Migration.
A trend towards a reverse migration back to the South has been noted recently. In a 2011 story, the New York Times reported that in 2009, of the 44,000 people who left New York City, over half moved to the South. A more recent report by the Times, provocatively entitled Racism Is Everywhere, So Why Not Move South? explores some of the rationale behind this movement. The sentiments echo the recent paper Individual Social Capital and Migration by Julie L. Hotchkiss and Anil Rupasingha. Improved social capital — the sense that you are a somebody in the place that you live, that your life matters (or could matter someplace) is a powerful catalyst for movement.
The LinkedIn Workforce Report for January confirms that Southern cities are gaining workers at the expense of Northern cities, and this Redfin analysis reports that there has been some North to South migration. According to the LinkedIn Workforce Report, southern cities are still among the top ten in terms of job migration (at least amongst LinkedIn members). Thriving African American communities in cities like Atlanta and Jacksonville, lower costs of living, and the rise of these cities as technology centers are powerful draws.
To look at the potential political impact of a new reverse migration, I ran a few simulations. I assumed a similar reverse migration rate of 2% per year over out ten years. In my simulations, I assume that the main states from which African Americans migrate are New York, Illinois, Michigan, New Jersey, Indiana, Pennsylvania, Maryland, Ohio, and California — the main destinations of the Great Migration. I assumed that the main destinations of the new migrants are among the states that people left during the initial Great Migration: Alabama, Florida, Georgia, Mississippi, and North Carolina. I could have arguably added Tennessee to this mix. I used a Dirichlet distribution to model the allocation of migrants to various destination states.
Let’s first revisit the 2016 election map
Below are a couple of illustrative outcomes from my simulations. In most of the outcomes, Florida, Georgia, and North Carolina are the states in which the political outcome of the migration are felt most.
Again, I let 10,000 simulations play out, sampling the allocation of migrants to destination states from a Dirichlet distribution.
To make the point a bit further, below is a bar chart showing the number of outcomes for each state over the 10,000 simulations in which Black voters had a decisive impact upon the presidential election (i.e. allocation of electoral college votes) for that state.
The point though is not really predicting the dominance of one political party or the other, it is understanding the implication Black voter empowerment — how Black people are empowered to participate in decisions regarding the health, education, policing, and economic viability of their communities. Further, beyond just Black and White, it speaks to me as an opening to think about participatory multi-racial democracy. After all, there was a flash of time between the Civil War and the enactment of Jim Crow racialist laws in which Citizens of Color of the South were actively involved in governance.
Although these are speculative simulations — for me they contain the seeds of a certain kind of hope. Perhaps the future is the past — but maybe we can mold the future in ways that are universally empowering.
If you’re in sales, it pays to call (and email, and chat) early and often. This intuitive insight comes from a recent study, “Research on 200 Million Sales Interactions Cracks the Code on Cadences” published by Atlanta startup SalesLoft. This data was shared with me by Butler Raines, SalesLoft’s Head of Product — a dear friend, beautiful human being, and a new-school bitter southerner.
I found the piece illuminating, not only for the nicely presented graphs of customer/sales interactions, but also for the exposition on sales terminology (I learned what a cadence is).
Does SalesLoft have other insights they’d like to share? Many data scientists would like to know!