Trusting Data

March 13, 2012

Howdy Señor Evan R. Lawson, CFO of HankMed!  I was recently impressed by the elevators at the Windsor Court apartments in New York City because you punch in your desired destination floor on the panel in the hallway rather than just whether you want to go up or down.  (I’ve been wanting to see that somewhere, anywhere, for several years.  In a system with several elevators, the extra information can help reduce waiting time.)  I was also recently impressed by the tags of items at Macy’s department store because Macy’s has bar-coded down to the individual item; if we each buy exactly the same two-toned shoes (same size, same brand, same design, same everything), the two tags will have different identification numbers.  (Because of this unique identification, if I return an item, I don’t have to have the receipt as my payment transaction information is recorded in the database with the item.)

Why do I mention these two things?  Mainly just because I wanted to, but also because they are related to the idea that when harnessed properly, more bits of data or information can lead to the smoother operation of life.

But data is tricky business and statistics is one of the three kinds of lies. 

One aspect of the SellerScope demo was its interactive nature.  One of the points brought up by John Patterson while we put the interactive visualization together was that users have to have trust.  In the demo, we show predictions of salespeople that are at risk of voluntarily leaving the company.  If one interacts with the data and predictions, it can be noted that at least in the fictional company whose data is loaded in the demo, salespeople tend to leave within the first 5 or 6 years of service with the company; after they’ve been around that long, they tend to stay the rest of their careers.  The beauty of the interaction is that if the user discovers that pattern his- or herself, then he or she is much more likely to trust it.

As we’re both well aware, there’s more data being generated than ever before in human history – exponentially more.  But I don’t think that the trust in data is there yet, nor should there be.  I don’t think there can be trust unless there is transparency and the ability for people to interact with data.  In discussing a data error with the 2010 BCS calculation, Jerry Palm wrote:

The BCS has shaky enough credibility with John Q. Footballfan as it is. It needs to verify its data. Right now, nothing is verified. Nothing is accountable. Except Colley. Thank goodness we at least have that.

Also as discussed in this Nature Biotechnology article:

Systems biology aims to provide a mechanistic understanding of biological systems from high-throughput data. Besides its intrinsic scientific value, this understanding will accelerate product design and development, facilitate health policy decisions and may reduce the need for long-term clinical trials. For this to happen, the knowledge generated by systems biology has to become sufficiently trustworthy for the empirical approach underlying long-term clinical trials to be supplanted by an approach in which mechanism and mechanistic understanding is a driver for decisions. This raises fundamental questions of how to evaluate the veracity of predictions from systems biology models and how to construct mechanistic models that best reflect biological phenomena—questions that are of interest to both academia and industry.

One of the movements to give ‘John Q. Footballfan’ the ability to interact with data so that he may trust it is known as Open Data.  My former IBM Research officemate Marc Szeto-Millstone has recently joined a proponent and enabler of this movement, Socrata.  However, sometimes data being completely open is not ideal either, because it diminishes the incentive to invest in obtaining difficult-to-obtain data, especially by companies.  It is very challenging to balance the trust that goes with openness and the financial disincentive that goes with openness, wouldn’t you say?  The authors of the Nature Biotechnology paper have made one attempt at that balance, but I’m not sure that there aren’t other better ways.

One thing I’m pretty sure about is that the recent trend to make the creation of static infographics easier will not solve the trust issue.  (You have posted images generated using Many Eyes and the Google Data Explorer on the blog previously; those platforms are really excellent for what they are intended to do.)  I think that even infographics made by skilled designers will only be trusted if the user can interact with the data and discover things without being explicitly told.  Hopefully within our lifetime, the kinds of lies will drop to two.


Business Made Social

March 3, 2012

You’re welcome Señor Satnam Singh Bhamara.  That potato episode’s premise is quite interesting.  Another episode, The Gold Job, features Hardison taking the lead and applying gamification to the con.

Gamification was one of the themes of the demonstrations in the Innovation Lab at Lotusphere 2012.  (As part of the Innovation Lab, I presented a demonstration called SellerScope: Interactive Prescriptive Salesforce Analytics, that was put together in conjunction with Moninder Singh and Jamie Rasmussen.)  The idea of gamification is to make tasks for employees less like work and more like an adventure in which you earn points along the way.  Competitive juices are a good motivator, so why not harness them for productive work? 

The convention as a whole was a new experience for me.  There were more than 7000 participants and the opening session had a rock band play followed by having Michael J. Fox speak.  One of the evening events was at Sea World, which had been completely rented out for Lotusphere. 

From what I understood, once IBM puts its stamp on social technologies (blogs, wikis, twitter, facebook, linkedin, flickr, youtube, pinterest, and everything else along those lines), large corporations feel a sense of reassurance that yes, now we can adopt these technologies for internal use.  Now that IBM is doing so through its Lotus Connections product line, lots of businesses are on their way to becoming social businesses in 2012, making this the “year of social business.”  IBM’s role in this is giving out security blankets

The other point that seemed to come across to me was that advanced predictive and prescriptive analytics are not yet part of social business offerings.  This jives with what Brenda said yesterday at her farewell event, that IBM should and needs to dominate in analytics.  It will be through analytics that IBM’s social business products will differentiate themselves and move beyond their current security blanket status.

I seem to recall that in addition to the signal processing homework problem related to This is Spinal Tap that you mentioned, there was also one called “sampling for fun and profit.”  Coming back to your point about having or not having something to say, the actual social societal world and the social business world are different.  At the end of the day in the business world, the only thing that matters is profit.  In the societal world, life is for fun and profit and lots of other things.  In social business, the conversations, collaboration, etc. matter if they lead to monetary profit somehow, which is quite different than in the global village that we call home. Business is business and life is life; the technologies being developed for both are similar, but their objectives are fundamentally different.  Enough pontification from me as well. 

Time for the curtain call of the Scoop and Kris Show in Syracuse.


On Potatoes

March 1, 2012

When I had gone to England in 2004 (as depicted in the previous post), I had had a solid English breakfast at The Cat Tavern (which of course included hash browns) before I left for Stonehenge and then saw an episode of Hustle on BBC upon returning back to Salisbury.  I never did end up watching more episodes of that entertaining show, but recently I’ve been watching the American version Leverage.  Thanks for getting me into it.

So one recent episode that I saw was “The Hot Potato Job” that centered around the theft of a genetically-engineered potato that “has extra nutrients inside,” in particular Vitamin A.  As you know, I’ve had some past interest in food products that are genetically engineered to have additional properties. In particular, I helped advise the MIT iGEM team in engineering the yogurt bacterium Lactobacillus bulgaricus to secrete the peptide p1025 which helps prevent dental carries (cavities).  The goal of team biogurt was to create a sustainable method for delivering this peptide, which the method of home yogurt production inherently provides.  Interestingly as noted in a remarkable paper by Nunn and Qian:

Humans can have healthy diets from consuming potatoes, supplemented with only dairy, which contain the two vitamins not provided for by potatoes, vitamins A and D (Dairy is not actually necessary for vitamin D because humans produce it after absorbing sunlight).

Thus, the Leverage super-tuber with vitamin A is all that would be needed to have a (nutritionally) healthy life.  If this is paired with biogurt, then one will be set both nutritionally and have good teeth.  If some wheat is also thrown in, I argue one can have a psychologically fulfilling life as well, cf. The Kush Show presents Everyday Indian.

Given its ubiquity and presence e.g. in the phalahar diet, it may seem as though the potato has been around everywhere for all time, but it only came to the Old World during the Columbian Exchange.  The etymology of the English word potato is actually rather interesting and its possible influence on the spelling of the word tomato even more interesting.  Anyway, coming back to the main part of the Nunn and Qian paper:

According to our most conservative estimates, the introduction of the potato accounts for approximately one-quarter of the growth in Old World population and urbanization between 1700 and 1900.

Isn’t that astounding?  Much more impact than any technology I can think of at the moment.  Moreover, they say that “the introduction of the potato increased average adult heights by approximately one-half inch.” 

How quickly did the potato spread once it came to Europe?  Nunn and Qian, say, for example:

The potato first reached India not long after it arrived in Europe, introduced by either the British or the Portuguese. The earliest known reference to potato cultivation in India is a written account from the 1670s by John Fryer.  By the late eighteenth century there are various accounts of widespread cultivation in many parts of India.

They were not able to do a detailed micro-level study of the spread of this agricultural technology, but it was rather quick.  I would imagine that social learning played a key role, just like with pineapples in Africa.  I wonder what would have happened had there been an information technology like Digital Green in the 16th and 17th centuries.  I don’t suppose it could have spread much faster than another new world import, syphilis (which reached reached Hungary and Russia by 1497; Africa, the Middle East, and India by 1498; China by 1505; Australia by 1515; and Japan by 1569), could it?

This other paper by Nunn and Qian I have been linking to has a great discussion of other aspects of the Columbian Exchange.  With food, there are nice discussions on chili peppers, tomatoes, chocolate, and vanilla.

The world really was a different place in 1491.