Archive for January, 2013


Anna Mehrotra

January 3, 2013

Our interview subject is Anna Mehrotra, from Streamline Dataworks, who would love to talk to you about how she can help you with your data.

Kush: You recently founded Streamline Dataworks.  What led you to do so?  I’m guessing that it wasn’t because data scientist has been called the sexiest job of the 21st century, or was it?

Anna Mehrotra: Having a sexy job is a bonus (I think?), but, really, the decision resulted from a two-part process.  First, I realized that I needed to start my own company so that I could have more autonomy over my work life, thereby allowing more room for my family life.  Second, I concluded that I should go into data visualization because, well, I have always been doing it: regardless of my job title (engineer, research scientist, graduate student), I have always enjoyed playing with data and charts.  Further, I have always cared a lot more than most people about identifying the optimal colors for multiseries scatter plots and avoiding the use of default Excel charts.

Lav: Your consultancy seems to be concerned in large part with the insightful visual display of quantitative data. A few years ago I read through all of Edward Tufte‘s books and picked up several good ideas (which I believe you also like), but had trouble implementing them with the time/tools I had. Which technological developments from the last few years do you feel are enabling better consumable visual analytics?

AM: One of the best tools to use out-of-the-box is Tableau, including the free Tableau Public. Tableau makes it easy to explore and display your data in an effective, and aesthetically pleasing, way. I like the d3 JavaScript library, because you are only limited by your imagination in terms of what you can create. But you can be very effective with R or even Excel or Illustrator, provided you follow some basic design principles. As Alberto Cairo explains (paraphrasing here): it is your brain (not a particular tool) that generates an effective visualization.

L: You talk about how the brain (of the creator) generates an effective visualization, but how do you think the brain of the visualization consumer should play into it? For example, should psychophysical laws play a role in visualization design? What about visual weight as a depiction of uncertainty, as in waterfall regression?

AM: Good question. The designer of a visualization needs to consider the brain of the consumer in two respects. First, what type of brain is it? Is it the brain of a scientist? Or of someone who is not as familiar with concepts of uncertainty, probability, etc.? Is it the brain of someone who already cares about the topic being presented, or of someone who could care less? It may be impossible to know the answers to these questions a priori and, indeed, the audience may be varied. However, understanding the audience for your visualization will be helpful for determining what types of graphs to use, how much explanation to provide, and what sort of affordances to offer. Second, the visualization designer needs to consider the basics of how we perceive, such as the concepts of preattentive attributes and Gestalt principles. The former relates to the fact that objects stand out if they are a different color, orientation, sharpness, etc., or if they move, and that the objects that stand out can be perceived as more important than the objects that don’t. I should note that the use of color in visualizations is tricky, not to mention culturally dependent. The latter relates to the fact that we make assumptions about objects based on how they relate to other objects in the visualization (e.g., these points all belong to the same group because they are close to each other), which can be useful when we are setting up graph axes, table rows, or other design elements.

The idea of using visual weight to depict uncertainty is intriguing but, personally, I’d rather just see all the data points with the regression line (that is, the first figure in here) rather than something like the first Panel B, which I don’t understand intuitively. I am much more accustomed to seeing something like the first Panel A. In terms of the efficiency argument (you can get more series on a single panel by using different colors and visually weighting the uncertainty), your audience really doesn’t mind looking at four different panels. We are very adept at decoding the quantitative information on these so-called non-aligned scales. And the selection of colors can, again, be tricky: is the red series somehow more important than the blue? I suppose if you have 1.4 millions data points, then you’d get a giant mess if you plot them all. In that case, something like this could work, provided it was accompanied by a brief explanation of how to interpret it.

On the topic of uncertainty, there has been discussion of both blur (the degree to which something is, well, blurry) and sketchiness (the degree to which something looks hand-drawn) as a way to convey uncertainty in visualizations. This paper suggests sketchiness helps us understand uncertainty, as does this one. But, some people just get annoyed by looking at blurry or sketchy things, and can view them as less authoritative relative to something crisp and clearly computer generated.

K: Growing up, your husband had a set of World Book encyclopedias that were consulted when questions of fact arose.  Was it the same with you?  Some would say that the world is becoming more data-centric.  How do you have your children find answers to questions?

AM: Alas, I never had a set of encyclopedias growing up. This is probably why, to this day, my husband is a more reliable source of facts and factoids than I am.  I agree that we are becoming more data-centric but, echoing the previous question, we generally don’t have a grasp of the uncertainty inherent in various datasets. We might feel better (or worse) knowing that Obama is ahead in the polls by 4%, but do we really understand where that number sits relative to the margin of error? In terms of where our children find answers: wikipedia first and their parents second. In fact, if we are discussing something at the dinner table and there is general dissatisfaction with the answers we are giving, our children will insist that we “look it up” on our phones immediately. In the old days, we just had to take our parents’ word for it, or wait until we got to the library and could take a peak in that encyclopedia.