Glad to see your work described previously now appearing in a journal paper, but also glad to know that it is doing social good. One of the main things you did was look for buildings from satellite imagery, which is really quite a neat thing. As you know, I have been quite intrigued by the science of cities, and perhaps data from satellite imagery can be useful to make empirical statements there. Can one see municipal waste remotely? In anticipation of that, perhaps I can dig through some data on cities that I happen to have, and see if there are interesting statements to be made regarding scaling laws within cities (in contrast to most work that has focused on scaling laws among cities, though I should note the work of Batty, et al.). As examples, I will consider recent data from our hometown of Syracuse, NY and also data from Mohenjo Daro of the Indus Valley civilization.
As you can guess, the Syracuse data was gathered from my service on an IBM Smarter Cities Challenge team, by digging through some old servers held by a not-for-profit partner of the City of Syracuse. The journal paper on that is finally out, but more importantly it seems to be having some social impact. Here is a newer video on impacts of what we did there.
The data on Mohenjo Daro is from actual digging, rather than digging through computers. Built around 2600 BCE, Mohenjo Daro was one of the largest settlements of the ancient Indus Valley Civilization and one of the world’s earliest major urban settlements. Mohenjo Daro was abandoned in the 19th century BCE, and was not rediscovered until 1922. The data I will use was initially mapped by British archaeologists in the 1930s in their excavation of Mohenjo Daro, and collected in the paper [Anna Sarcina, “A Statistical Assessment of House Patterns at Moenjo Daro,” Mesopotamia Torino, vol. 13-14, pp. 155-199, 1978.].
Before getting to the data, though, let me describe some theoretical work on the distribution of house sizes from a recent paper of Bettencourt, et al. in a new open access journal from AAAS. From the settlement scaling theory developed, they make a prediction on the distribution of house areas. In particular, the overall distribution should be approximately lognormal. This prediction is borne out in archeological data of houses in pre-Hispanic cities in Mexico. The basic argument for why the lognormal distribution should arise is from a multiplicative generative process and the central limit theorem. A reference therein attributes the argument back to William Shockley in studying the productivity of scientists, but according to Mitzenmacher it goes back even further. (Service times in call centers also appear to be approximately lognormal, among other phenomena)
Anyway, coming to our data, let me first show the rank-frequency plot of the surface area (m2) of 183 houses in Mohenjo Daro.
Now I show the rank-frequency plot of the living area (ft2) of 41804 houses in Syracuse (data from July 2011).
What do you think? Does it look approximately lognormal? I’ll soon write another blog post with some formal statistical analysis, and some other nuggets from these data sets.
Incidentally, as requested, I seem to be making creativity a part of my research agenda (from an information theory and statistical signal processing perspective). I spoke about fundamental limits to creativity at the ITA Workshop in San Diego in February (though the talk itself ended up being slightly different than the abstract). I also organized a special session on computational creativity, which was fun.
I think someone should connect creativity and cities in some precise informational way, and perhaps you are the man to do it.