h1

Microblogging

May 19, 2013

As noted in my previous post, you seem to be microblogging quite a bit these days.  I am strongly considering jumping on the bandwagon, but I’m not quite sure what to tweet.  Any suggestions?  Do you find the 140 character limit to improve your tweeting?  As part of my guiding philosophy I might use “May you enjoy the special pleasures of craft—the private satisfaction of doing a task as well as it can be done. @jeffreylehman #dirt” which has 135 characters, but this would violate “Minimize the use of aphoristic quotations. @jeffreylehman #OptimisticHeart”, which comes in at 74 characters.

I suppose one should consider the structure of what Twitter is.

As noted in Dhiraj Murthy‘s book, Twitter, [p. 6]:

This structure of channels and consumers of channels of information draws from notions of broadcasting.  Specifically, Twitter has been designed to facilitate interactive multicasting (i.e., the broadcasting of many to many)… Twitter encourages a many-to-many model through both hashtags and retweets.  A “retweet” (commonly abbreviated as “RT”) allows people to “forward” tweets to their followers and is a key way in which Twitter attempts to facilitate the (re)distribution of tweets outside of one’s immediate, more “bounded” network to broader, more unknown audiences.  It is also one of the central mechanisms by which tweets become noticed by others on Twitter.  Specifically, if a tweet is retweeted often enough or by the right person(s), it gathers momentum that can emulate a snowball effect.

Since he describes it that way, I wonder if there is an information theory problem in there.  Anyway, @dhirajmurthy goes on to say [p. 150]:

Of course, if those promoted tweets are significantly retweeted, that will have more direct effects on Twitter’s modes of originating popular discourse.  The terminology used by marketing professionals is between “organic” and “promoted” trending topics.  The label of “organic” implies more of a grassroots development of a topic, whereas the “promoted” version aims to skip the grassroots building of a topic.  As you can imagine, skipping the construction of a support base can have consequences on the popularity of a topic.  Promoted topics have changed Twitter in that they have brought monetization into tweet audience reception.  However, as the statistics from Twitter show, “organic” topics are the most popular.

So I suppose if one wants to be popular on Twitter, one should take advantage of the interactive multicasting nature of the medium and try to be as organic as possible.  Since you’ve been at it for some time, do you agree?

To be organic, it seems prudent to follow a good number of people on Twitter.  But something like Dunbar’s number must surely come into play.  There is a fairly new paper by Dunbar and several others out that discusses the ability to stay in touch.  The title is “Time as a limited resource: Communication strategy in mobile phone networks,” and the authors are Giovanna Miritello, Esteban Moro, Rubén Lara, Rocío Martínez-López, John Belchamber, Sam G.B. Roberts, and Robin I. M. Dunbar.  The main result is that there are time constraints which limit tie strength (as measured by time spent communicating) in large personal networks, and that even high levels of mobile communication do not alter the disparity of time allocation across networks.  This is argued from the fact compared to those with smaller networks, those with large networks do not devote proportionally more time to communication and have on average weaker ties.  Of course time is an inelastic resource, and people only have a limited amount of time in each day to devote to social interaction.

A related paper is titled “Limited communication capacity unveils strategies for human interaction” and is written by several of the same authors.  In particular Giovanna Miritello, Rubén Lara, Manuel Cebrian, and Esteban Moro.  Again the underlying theoretical framing is around the fact that time, attention, and cognitive resources are inelastic.  Each person is characterized by a communication capacity and by a communication activity level which are different from each other.  The authors then get into how social activity is influenced by these limitations, again studying data from Telefonica in Spain.

The flip side to this is of course having the time, attention, and cognitive resources to tweet.  I saw a recent blog post that describes this as the writer’s shuffle: there are only so many words one can write in a day.  

Given my limited cognitive resources, let’s see if I end up tweeting much or not at all.

h1

Creative Thought

March 26, 2013

I haven’t blogged much recently, and it seems you haven’t been much either, though you are microblogging quite a bit. 

Anyway, I was inspired.  I got my copy of The Atlantic in the mail yesterday and the cover story is about the so-called Touch-Screen Generation, the generation of toddlers that uses iPads.  It is an interesting article overall, and brings forth various issues to consider when one is raising children.  But one specific thing I found rather intriguing is a quote from Frank and Theresa Caplan’s 1973 book, The Power of Play:

What is that often puts the B student ahead of the A student in adult life, especially in business and creative professions?  Certainly it is more than verbal skill.  To create, one must have a sense of adventure and playfulness.  One needs toughness to experiment and hazard the risk of failure.  One has to be strong enough to start all over again if need be and alert enough to learn from whatever happens.  One needs a strong ego to be propelled forward in one’s drive toward an untried goal.  Above all, one has to possess the ability to play!

Having this sense of whimsy and playfulness, I think is rather important, so it is good to see other people agree.

This general question of how to promote creativity is an important one, whether for schools and universities, firms, or societies as a whole.  In your microblogging, you had pointed out an article to me that puts forth three broad ideas for achieving a more creative life:

  1. Be mindful and disconnect: one of the key ideas is to walk around.
  2. Delve into the past to create meaningful things: the idea is to understand where things come from and why they exist to then create meaningful new things.
  3. Be masterful: the key idea is to be able to make serendipity work for you everyday.

All of these strategies make a great deal of sense to me and I hope to incorporate them more into my own life. 

Claude Shannon also once gave a speech about how to be creative, and primarily motivated by informational and engineering kinds of questions, the points he raised for creative thinking include: simplification; seeking similar known problems; restate a problem in as many different forms as you can; generalization; structural analysis; and inversion.  Again a solid set of ideas, especially for mathematically-oriented research work.

Some people argue that certain places are more conducive to creative thought than others, and they act as magnets for the so-called creative class.  Places like Fairfield, Iowa.  

In fact there is a whole book on The Rise of the Creative Class, which was originally written more than a decade ago but has recently been reissued.  At least from the summary, it sounds like an engaging book.  Maybe I should get a copy sometime.  In fact the author, Richard Florida, seems to put out quite a nice set of blog posts, which are also on my to-read list.

This blog post has been, so far, about human creativity, but what about computational creativity?  A lot of my work these days, and some of yours as well, has been on building a system that can create novel, flavorful, and healthy culinary recipes.  Though our work is not particularly connected to Watson, the press has been linking it to that general idea.  Notwithstanding, some nice recent articles include these ones.

Since you are not quite as immersed in the project as I am, you are probably better able to disconnect and be mindful.  So let me ask, do you think any of the ideas on leading a creative life or for engaging in creative thinking are useful for computational creativity systems?

h1

Anna Mehrotra

January 3, 2013

Our interview subject is Anna Mehrotra, from Streamline Dataworks, who would love to talk to you about how she can help you with your data.

Kush: You recently founded Streamline Dataworks.  What led you to do so?  I’m guessing that it wasn’t because data scientist has been called the sexiest job of the 21st century, or was it?

Anna Mehrotra: Having a sexy job is a bonus (I think?), but, really, the decision resulted from a two-part process.  First, I realized that I needed to start my own company so that I could have more autonomy over my work life, thereby allowing more room for my family life.  Second, I concluded that I should go into data visualization because, well, I have always been doing it: regardless of my job title (engineer, research scientist, graduate student), I have always enjoyed playing with data and charts.  Further, I have always cared a lot more than most people about identifying the optimal colors for multiseries scatter plots and avoiding the use of default Excel charts.

Lav: Your consultancy seems to be concerned in large part with the insightful visual display of quantitative data. A few years ago I read through all of Edward Tufte‘s books and picked up several good ideas (which I believe you also like), but had trouble implementing them with the time/tools I had. Which technological developments from the last few years do you feel are enabling better consumable visual analytics?

AM: One of the best tools to use out-of-the-box is Tableau, including the free Tableau Public. Tableau makes it easy to explore and display your data in an effective, and aesthetically pleasing, way. I like the d3 JavaScript library, because you are only limited by your imagination in terms of what you can create. But you can be very effective with R or even Excel or Illustrator, provided you follow some basic design principles. As Alberto Cairo explains (paraphrasing here): it is your brain (not a particular tool) that generates an effective visualization.

L: You talk about how the brain (of the creator) generates an effective visualization, but how do you think the brain of the visualization consumer should play into it? For example, should psychophysical laws play a role in visualization design? What about visual weight as a depiction of uncertainty, as in waterfall regression?

AM: Good question. The designer of a visualization needs to consider the brain of the consumer in two respects. First, what type of brain is it? Is it the brain of a scientist? Or of someone who is not as familiar with concepts of uncertainty, probability, etc.? Is it the brain of someone who already cares about the topic being presented, or of someone who could care less? It may be impossible to know the answers to these questions a priori and, indeed, the audience may be varied. However, understanding the audience for your visualization will be helpful for determining what types of graphs to use, how much explanation to provide, and what sort of affordances to offer. Second, the visualization designer needs to consider the basics of how we perceive, such as the concepts of preattentive attributes and Gestalt principles. The former relates to the fact that objects stand out if they are a different color, orientation, sharpness, etc., or if they move, and that the objects that stand out can be perceived as more important than the objects that don’t. I should note that the use of color in visualizations is tricky, not to mention culturally dependent. The latter relates to the fact that we make assumptions about objects based on how they relate to other objects in the visualization (e.g., these points all belong to the same group because they are close to each other), which can be useful when we are setting up graph axes, table rows, or other design elements.

The idea of using visual weight to depict uncertainty is intriguing but, personally, I’d rather just see all the data points with the regression line (that is, the first figure in here) rather than something like the first Panel B, which I don’t understand intuitively. I am much more accustomed to seeing something like the first Panel A. In terms of the efficiency argument (you can get more series on a single panel by using different colors and visually weighting the uncertainty), your audience really doesn’t mind looking at four different panels. We are very adept at decoding the quantitative information on these so-called non-aligned scales. And the selection of colors can, again, be tricky: is the red series somehow more important than the blue? I suppose if you have 1.4 millions data points, then you’d get a giant mess if you plot them all. In that case, something like this could work, provided it was accompanied by a brief explanation of how to interpret it.

On the topic of uncertainty, there has been discussion of both blur (the degree to which something is, well, blurry) and sketchiness (the degree to which something looks hand-drawn) as a way to convey uncertainty in visualizations. This paper suggests sketchiness helps us understand uncertainty, as does this one. But, some people just get annoyed by looking at blurry or sketchy things, and can view them as less authoritative relative to something crisp and clearly computer generated.

K: Growing up, your husband had a set of World Book encyclopedias that were consulted when questions of fact arose.  Was it the same with you?  Some would say that the world is becoming more data-centric.  How do you have your children find answers to questions?

AM: Alas, I never had a set of encyclopedias growing up. This is probably why, to this day, my husband is a more reliable source of facts and factoids than I am.  I agree that we are becoming more data-centric but, echoing the previous question, we generally don’t have a grasp of the uncertainty inherent in various datasets. We might feel better (or worse) knowing that Obama is ahead in the polls by 4%, but do we really understand where that number sits relative to the margin of error? In terms of where our children find answers: wikipedia first and their parents second. In fact, if we are discussing something at the dinner table and there is general dissatisfaction with the answers we are giving, our children will insist that we “look it up” on our phones immediately. In the old days, we just had to take our parents’ word for it, or wait until we got to the library and could take a peak in that encyclopedia.

h1

Being Social

November 25, 2012

A few weeks ago, we were both at the Interdisciplinary Workshop on Information and Decision in Social Networks (WIDS).  I learned a lot of interesting things, met a lot of interesting people, and generally had a great time.  As I had said in my talk itself, I think conversations are really the whole key to conferences and had several really engaging ones. WIDS being in Cambridge, of course I also met a good number of people outside the conference.  Another nice part of the trip was a return to Edgerton House, where Ankur Mani was kind enough to host me.

You had mentioned opinion dynamics in a previous post and your work at WIDS was also on this topic, but with a decision-making twist.  I found it fascinating, but I’ll let you pass on any details you want to.  How many people brought up the election when they talked to you about it?

It seems that using mobile phone data is becoming huge these days.  In one talk, Vincent Blondel talked about a publicly released data set from Ivory Coast that is the subject of a current competition. In my session, a talk from the group of Marta Gonzalez revisited the navigability of small worlds using mobile telephony data.  This same group has been interested in ‘walking around‘, looking at it using spectral techniques I like.

Since WIDS was primarily an academic event, there was more talk of life and much less explicit talk of social business, even though it is a potential game-changer for knowledge work.  Notwithstanding, did you pick up any particularly interesting tidbits relevant for business?

To close, let me ask a random question: I know you’re not a huge combinatorics guy, but do you know what the major challenge is in extending this work to larger subgraphs?

h1

Adapting to the External World Internally

October 11, 2012

You had written about how diet rather than inactivity may lead to obesity, but heredity also plays a role, right?  Although genetics is one of the primary modes of hereditary information transfer, another very intriguing mechanism for hereditary information transfer is epigenetics.  As you may recall, when we were at Himanshu‘s wedding in Chicago, Ashwin was watching something on the Discovery Channel on how people conceived during famines have impaired glucose tolerance, raised blood pressure, and higher rates of obesity in adulthood.  Further if a grandfather went through a famine as a teenager, then a grandson would have higher mortality risk ratio.  Obesity seems to be linked to epigenetic phenomena.  Broadly speaking, it seems epigenetics are a quick way to pass on environmental adaptation to offspring.  Unlike genetics, an information theory of epigenetics seems to be lacking, though there has been some progress in understanding the epigenetic code.

I recently came across a paper entitled “Buyers’ Subjective Perceptions of Price” which appeared in the Journal of Marketing Research in 1973.  As is perhaps obvious, price influences buyers’ buying decisions, but a lot of behavioral experiments show that this influence is not at all straightforward.  In particular, it is really the perception of price that leads to behavior rather than the price itself.  One traditional view called odd pricing indicates prices just under a round number (e.g., 99) increase consumer sensitivity.  Although this strategy is discounted in the paper, it reviews several other techniques and phenomena such as perceptual price-quality relationships.  The most interesting part for me, however, was the invocation of the Weber-Fechner law from psychophysics for pricing and the ‘just noticeable difference’ experimental method from which it was originally derived.   As I think you know, together with John Sun et al., I have written a paper providing an ‘optimization approach to biology’ explanation for the Weber-Fechner law, basically arguing that our internal representation of external stimuli is well-matched to the environment.

So it seems adaptation to the natural world is central to the two most interesting informational systems in biology (at least to me): information transmission through heredity and information within nervous systems.

h1

Social Capital Correlations

September 30, 2012

Top of the morning to you Señor László Cseh.  It sounds like you built up a lot of social capital in Bangalore, Delhi, Amarnath, and Aligarh.  I did go through the social capital statistics by state that you pointed out.  When I just visually inspected the ranked list of states by social capital, an interesting connection jumped to my mind.  

Earlier that week I had been looking at a variety of health and healthcare statistics.  The states with low social capital seemed very much in correspondence with states with high percentages of diagnosed diabetes according to the CDC. Mississippi, Georgia, Louisiana, and Alabama are at one end of both social capital and diabetes.  Montana, Vermont, and the Dakotas are at the other end.

In the same place, the CDC also gives data for obesity and for physical inactivity, which are clearly correlated both with each other and with diabetes.  What are the causal relationships?  (I still intend to put something up here about Rubin-style causal inference.)  Does physical inactivity cause obesity?  Not according to an observational study comparing Westerners with members of one of the last remaining hunter-gatherer societies.  According to the study, calorie expenditure of hunter-gatherers is the same as Americans and Europeans, meaning that the obesity problems here are all about diet, not inactivity.  One of the authors writes:

“We’re getting fat because we eat too much, not because we’re sedentary. Physical activity is very important for maintaining physical and mental health, but we aren’t going to Jazzercise our way out of the obesity epidemic.”

So what about social capital and diabetes?  I thought that that would be a pretty neat relationship to uncover.  After I mentioned this thought to you outside the confines of the blog and you did some poking around, you found that exactly this study has already been done. 

Exactly.

Exactly the same.

So why might this be so?  One thought I have is that perhaps in the absence of social capital and the presence of bowling alone, a person has no connections to peers and only connections to advertisers, and is thus only influenced by advertisers.  Influence of junk food advertising perhaps leads to a bad diet.  An opinion dynamics-based hypothesis for such phenomena is discussed in this report from Sandia.

Public health statistics is an interesting topic, no? I’m looking forward to learning more about it starting in a couple of weeks.

Finally, let me say that I’m happy to have you (certainly not a bozo) walking the halls of the Yorktown building, even though that walking isn’t purportedly helping you on the body weight front.

h1

Cloud Factories: Energy and Information

September 23, 2012

The New York Times has, today, published the first article in a series of articles about the power consumption in data centers and other cloud computing infrastructures.  The consumption numbers are rather huge and the efficiency numbers are rather small.  As you know, I’ve been thinking about the relationship between energy and information for some time now, and spoke about it in Cambridge in July.  One of the key points to be made is that information is physical: “bit is it” to quote a famous physicist who also symmetrically said “it is bit.”  As noted in the ISIT 2012 paper as well as in the ISIT 2008 paper, the tradition of separating the study of energy from the study of information goes back almost a century.  I wonder what effect the academic separation between power engineering and radio engineering has had on popular understanding.  Indeed, the NYTimes article points out that:

With no sense that data is physical or that storing it uses up space and energy, those consumers have developed the habit of sending huge data files back and forth, like videos and mass e-mails with photo attachments.

And so about three-quarters of data is created by ordinary consumers.  Going into the history of this some more, I think I’ll quote a bit from a book called Grammatical Man by Jeremy Campbell, which I had first read many years ago.  On p. 193, he says:

Norbert Wiener made it clear at an early stage, however, that there is a critical distinction between power engineering and communication engineering, and this distinction must be grasped if we are to begin to understand how the nervous system works.  A television transmitter, Wiener said, may need large amounts of power to do what it is supposed to do, but it is first and chiefly a device for sending messages.  A dentist’s drill, on the other hand, may use only a tiny fraction of the power needed to drive the transmitter, but the prime consideration in designing the drill is the energy it consumes.  Wiener, no shrinking violet where his own reputation was concerned, claimed credit in his memoirs for first alerting the scientific world to the importance of this distinction, and for showing that control devices, like the ones used for aiming antiaircraft guns at German planes, were as much a part of communications science as the telephone or the radio, even though their function might be to move an object as heavy as a large gun.

Going on p. 195 to say:

One especially important difference between energy and information is that the first is subject to the laws of conservation, while the second is not: information can be created or destroyed

and then on p. 270:

Even more surprising, Aristotle gives a hint of the peculiar asymmetric relationship between energy and information, a relationship which was brought to light in modern science only when the full implications of Maxwell’s demon were understood in the twentieth century.  The demon needs enormous quantities of information about the particles in the dark chamber of gas in order to reduce the entropy of the gas by even a small amount.  In other words, it is relatively easy, though not costless, to convert orderly energy into information, but difficult and expensive to transform matter into a more orderly state by the use of information.  Aristotle showed that he had a general grasp of this inequality.

Although not a central part of the talk in Cambridge, my mention of results from the thermodynamics of computation seemed to elicit quite a bit of discussion.  Perhaps counterintuitively, several of our past/present colleagues at IBM Research have established that mathematical work does not actually require energy, though there are all kinds of caveats.  As I had once noted in my doctoral thesis:

Physical mechanisms proposed for reversible computing such as ballistic computers, externally clocked Brownian machines, and fully Brownian machines, however require that the system have no faults (contrary to the model in Chapter 5), have infinite storage capability (contrary to the model in Chapter 4), and operate arbitrarily slowly (contrary to the goals in Chapter 3).

Do you think the series of articles in the New York Times will also bring up the thermodynamics of computing?

ISIT

Follow

Get every new post delivered to your Inbox.