December | 2010 | Information Ashvins

Archive for December, 2010

The Blogosphere

December 27, 2010

Although this blog is written as a conversation between you and me, it is of course embedded in a larger social and literary milieu. As such, I thought I’d take a few lines to respond to the community, if that is alright.

On my Antarctica post, Anand was essentially suggesting a deeper empirical basis for mechanism design and synthetic biology. I think the growth of “big data” will lead to that, but it will require more than analytics that are just data visualization tools.
Notwithstanding, a new reader of this blog pointed me to Information Is Beautiful, which is really very insightful. And speaking of beautiful imagery in science, let me promote the book Portraits of the Mind: Visualizing the Brain from Antiquity to the 21st Century, which also features an image from my recently accepted paper on the C. elegans connectome.
On the Antarctica post, a long-time friend of the blog, let us call him Davy Jones, was arguing that the distinction between risk and uncertainty may be crucial in trying to understand the degree to which discovery based on localized models, e.g. inside laboratories, can be meaningful in an unpredictable world, pointing out an article about scientific refutation.
Davy was also saying that maybe quantum entanglement is my ticket to Antarctica. Although I am very much interested in how measurement affects classical systems (watch out for more later), I am scared of bras and things like that.
I was in Palo Alto for the COCOA2010 Workshop a few weeks ago, where I was presenting a position paper on taking stochastic approaches to understand coordination of distributed work in global service delivery. While in California I saw another friend of the blog; let us call him SuperGrover. SuperGrover was suggesting that blogs are better read when they are less philosophical and have a straightforward nugget. I’ll see if I take that advice into account.
Murali, who apparently likes to think of new technologies in terms of old, delurks and asks how I make reading decisions. Unfortunately there is no particular method to my madness. Tangentially relevant to this blog, I have some general areas that I am interested in including the history of science, the biological sciences, and the cognitive/economic sciences and so I pick things from there. For the particular case of The Shape of Life, I was at the National Aquarium in Baltimore during ACC2010 and was thinking about whether the functionality-wiring tradeoff I presented there informs the gross anatomical shape of animals, since shapes like these do kind of look like animals: Hence I picked up that book. I didn’t actually follow up on the motivating question, though. Interestingly, the eight major animal body plans in existence today, along with 27 minor ones all emerged during the Cambrian Explosion and no new ones have since developed. If someone followed up on this or already has and tells me, I would be appreciative.

That is all for now. Hopefully I responded to the crowd sufficiently.

Posted in Uncategorized | 2 Comments »

308,745,538

December 21, 2010

As I had foreshadowed, the US Census released its first set of data today: population counts for the several states. The country as a whole had a 9.7% increase in population and Massachusetts (where I was counted) went from 6.35 million to 6.55 million people. Although the Census hasn’t released more specific geographic data yet, I would guess that Cambridge is still the 4th or 5th most populous city in the Commonwealth, after Boston, Worcester, and Springfield. The District of Columbia finally reversed its decades of population loss by gaining 29,664 people since 2000. Unsurprisingly, the biggest gainers were from the South and West, with Nevada showing a 35.1% increase. California now has a population that exceeds 37 million, though still dwarfed by Uttar Pradesh which is the most populous subnational entity. The only state to lose population was Michigan; I wonder if Dave Bing can do anything about that; maybe bring back Eric Devendorf to promote the state with other prominent residents. Unsurprisingly, the political implications of the recently released census data have been analyzed by Nate Silver.

Although the US Census has been a source of data for hundreds of years, and indeed its tabulation was one of the launching points for IBM, the world is increasingly awash in “big data”. (As a side note, I’ve been becoming more and more of a fan of Canadian conventions w.r.t. punctuations outside of quotation marks.) As an example sticking to government, the CIO of the United States, Vivek Kundra, has been big on releasing data through data.gov. As he points out in a recent report, he is also big on cloud computing and cloud-based Infrastructure-as-a-Service (IaaS) offerings. These areas definitely seem like very interesting areas for research.

Indeed, I have a feeling that the connection between information technology and information theory might grow in the future. You might recall Ronald Kline’s article on the emergence of “Information Technology” as a keyword. Now that Google has put up a source of big data and an analytics tool, one might even try to roughly test Kline’s basic argument in an empirical way. This is what you get when you plug in terms such as “information theory” and “information technology”:Just as Kline describes, there was a blossoming of discourse around “information technology” in the mid-1960s when prominent humanists and social scientists proclaimed the advent of a new type of society based on the processing of information. Moreover, as Kline describes, by the early 1980s, several discourse communities including policy analysts, business writers, managers, information scientists, and social scientists had adopted the term.

Of course, one can also observe the great flowering of the term “information theory” after 1948 and its seeming decline after 1967. Not only might might big data be a boon for historians, as I have suggested here, it might also be useful for textual analysis of a different kind.

As you know I had once suggested that the phrase “arbitrarily small probability of error” is a formula in information theory, just like there are formulae in epics and greek mathematics. Though a rather weak test, let us see what the data shows:

There is seemingly some connection, though it is perhaps not too strong.

In what I had just done, it was nice that an analytics tool was provided with the data. But what should one do when there is no tool provided? Our colleagues at the Watson Research Center have developed a general tool called Many Eyes that is something of a general analytics platform. So let’s see what it can do: I’ll take some data from data.gov and plug it into Many Eyes and see what happens. I took data from November 1983 on state-by-state percentage of households with telephone service and got this:

Unfortunately, there is no “imagesc” feature for Many Eyes, so it isn’t too useful on the present data set. I did the same thing for November 2009 data and get:

Again, it isn’t too useful as presented. Separately, however, one can compute the change and plot that to get:

Now you can see something; applying analytics does require a modicum of skill. Somehow Maryland, Illinois, and Montana have less telephone penetration in 1983 than 2009. I wonder why. Excluding Montana, regions that are typically thought of as rural seem to have had the biggest gains, places like the West Virginia and the Deep South. Now if only this was put together with real historical and sociological research, perhaps I would have a story to go along with the statistics on the adoption of this technology. I don’t think it is viral, but perhaps you have more insight into how to use data analytics to study the the history of science and technology.

Posted in Uncategorized | 6 Comments »

Bhullar Brothers, Please Come to Syracuse

December 9, 2010

Good day Señor Per-Mathias Høgmo. After our time in the Poconos, on Tuesday I went to Armonk for a new hire orientation and then followed that up by a visit to the world’s most famous arena for some unfinished business.

One of the things covered in the orientation was the evolution of the IBM logo and the recognizability of the current eight horizontal bar version. Apparently, IBM is the second-most recognized global brand, partly due to the logo. My officemate Marc Millstone is a proponent of simplicity in design, and a devotee of Paul Rand, the man who designed the IBM logo. I wonder if Taniya is a devotee as well. A recently developed logo that was heavily panned was one for the clothing retailer Gap. James Yu, my ECE 425 project partner, rode the wave of the backlash against Gap by creating a little web application that allows people to make their own Gap-like logo, perhaps replacing the G in Gap with other letters. It became viral.

Some of the reasons James cites for the spread include: timeliness, importance to people, funness, and inclusion in Facebook. Lav, we’re now living in the age of the crowd, aren’t we? And to think, the first time I heard the word viral in a non-disease context was only six years ago.

An interesting viral phenomenon surrounds Jeremy Lin, the former Ivy League cager who has landed with the Golden State Warriors.

I think it would be interesting to start a viral campaign to bring the basketball-playing Bhullar brothers, Sim (Gursimren) and Tanveer, to the Syracuse Orange. What would it take? I think we’d be fine with timeliness, because Bhullar-mania is in its incipient stages. I think we’d be fine with importance to people, because the Orange have quite a large following but even more importantly, I think the people of India are poised to become crazy for basketball and the Bhullars. Whatever we do, we can include it on the Facebook and other social networks. The key aspect needs to be the fun. What could we do to make a fun viral campaign and bring about some finished business?

Posted in Uncategorized | 7 Comments »

Stories and Statistics

December 3, 2010

Señor Milman Parry, that final hyperlink in your Antarctic post about Stories vs. Statistics was an interesting read for me. It touches on various things that I have interests in, including storytelling (which I learned a bit about from Prof. Minkowski), semantics and pragmatics (which I learned a bit about from Prof. von Fintel), and the foundations of probability (which I learned a bit about from Prof. Fine).

I have been recently writing a magazine article about business analytics, which is more literary and has more storytelling than my other published articles. I like one of the lines I wrote in the article related to the human X factor in service science that you discussed: “lathes do not tire of going around in circles and voluntarily leave the company, whereas human workers might.” Another human factor in business analytics is the way input is accepted and output is reported, usually through what are known as dashboards. For adoption by the business community, predictive statistical and signal processing methods need to take stories as input and produce stories as output, but sprinkled with numbers that could be examined if desired. I will elaborate on this and other points once this article moves forward in the publication pipeline.

Coming back to “Stories vs. Statistics,” I find it interesting that the author associates ‘gods’ with probability, because that is one view I have as well. Einstein wrote that “He does not throw dice,” but I would say that He is the throw of the dice. (It is a matter of interpretation: aleatoric uncertainty vs. epistemic uncertainty.) Do I have any statistics to support that view? No, but I do have stories.

Posted in Uncategorized | 3 Comments »

Information Ashvins