308,745,538December 21, 2010
As I had foreshadowed, the US Census released its first set of data today: population counts for the several states. The country as a whole had a 9.7% increase in population and Massachusetts (where I was counted) went from 6.35 million to 6.55 million people. Although the Census hasn’t released more specific geographic data yet, I would guess that Cambridge is still the 4th or 5th most populous city in the Commonwealth, after Boston, Worcester, and Springfield. The District of Columbia finally reversed its decades of population loss by gaining 29,664 people since 2000. Unsurprisingly, the biggest gainers were from the South and West, with Nevada showing a 35.1% increase. California now has a population that exceeds 37 million, though still dwarfed by Uttar Pradesh which is the most populous subnational entity. The only state to lose population was Michigan; I wonder if Dave Bing can do anything about that; maybe bring back Eric Devendorf to promote the state with other prominent residents. Unsurprisingly, the political implications of the recently released census data have been analyzed by Nate Silver.
Although the US Census has been a source of data for hundreds of years, and indeed its tabulation was one of the launching points for IBM, the world is increasingly awash in “big data”. (As a side note, I’ve been becoming more and more of a fan of Canadian conventions w.r.t. punctuations outside of quotation marks.) As an example sticking to government, the CIO of the United States, Vivek Kundra, has been big on releasing data through data.gov. As he points out in a recent report, he is also big on cloud computing and cloud-based Infrastructure-as-a-Service (IaaS) offerings. These areas definitely seem like very interesting areas for research.
Indeed, I have a feeling that the connection between information technology and information theory might grow in the future. You might recall Ronald Kline’s article on the emergence of “Information Technology” as a keyword. Now that Google has put up a source of big data and an analytics tool, one might even try to roughly test Kline’s basic argument in an empirical way. This is what you get when you plug in terms such as “information theory” and “information technology”:Just as Kline describes, there was a blossoming of discourse around “information technology” in the mid-1960s when prominent humanists and social scientists proclaimed the advent of a new type of society based on the processing of information. Moreover, as Kline describes, by the early 1980s, several discourse communities including policy analysts, business writers, managers, information scientists, and social scientists had adopted the term.
Of course, one can also observe the great flowering of the term “information theory” after 1948 and its seeming decline after 1967. Not only might might big data be a boon for historians, as I have suggested here, it might also be useful for textual analysis of a different kind.
As you know I had once suggested that the phrase “arbitrarily small probability of error” is a formula in information theory, just like there are formulae in epics and greek mathematics. Though a rather weak test, let us see what the data shows:
In what I had just done, it was nice that an analytics tool was provided with the data. But what should one do when there is no tool provided? Our colleagues at the Watson Research Center have developed a general tool called Many Eyes that is something of a general analytics platform. So let’s see what it can do: I’ll take some data from data.gov and plug it into Many Eyes and see what happens. I took data from November 1983 on state-by-state percentage of households with telephone service and got this:
Now you can see something; applying analytics does require a modicum of skill. Somehow Maryland, Illinois, and Montana have less telephone penetration in 1983 than 2009. I wonder why. Excluding Montana, regions that are typically thought of as rural seem to have had the biggest gains, places like the West Virginia and the Deep South. Now if only this was put together with real historical and sociological research, perhaps I would have a story to go along with the statistics on the adoption of this technology. I don’t think it is viral, but perhaps you have more insight into how to use data analytics to study the the history of science and technology.