## Flying, flying, digging, digging

October 27, 2015As you know, I’ve been travelling quite a bit the last month or so. I think I may have put on more miles per unit time than ever before. While flying around, I read a good number of popular books that I had been meaning to, in the broad area of information science. For example, I read *Bursts* by Lazslo Barabasi and learned more about Transylvanian history than I intended. I also read *Social Physics* by my one-time collaborator Sandy Pentland, as well as *The Life and Work of George Boole: A Prelude to the Digital Age* by Desmond MacHale. I had received this last book as a gift for giving one of the big talks at the When Boole Meets Shannon Workshop at University College Cork in early September. An extensive biography, it also emphasizes how Boole’s *The Laws of Thought* makes a strong connection between logic and set theory on the one hand and probability theory on the other, a hundred years before Kolmogorov. When Boole was reading extracts from the book-in-progress to his wife-to-be Mary Everest, [p. 148]:

She confessed that she felt comforted by the fact that the laws by which the human mind operates were governed by algebraic principles!

Incidentally, in 1868, Mary Boole also wrote a book, *The Message of Psychic Science*, which had the following rather prescient passage inspired by Babbage’s computer and Jevons’ syllogism evaluator [p. 267]:

Between them they have conclusively proved, by unanswerable logic of facts, that calculation and reasoning, like weaving and ploughing, are work, not for human souls, but for clever combinations of iron and wood. If you spend time doing work that a machine could do faster than yourselves, it should only be for exercise, as you swing dumb-bells; or for amusement as you dig in your garden; or to soothe your nerves by its mechanicalness, as you take up knitting; not in any hope of so working your way to the truth.

Speaking of iron and wood, one last book I read in my travels is *Why Information Grows* by Cesar Hidalgo, and a first thing he discusses is how solids are needed to store information. As he says, close to my heart [p. 34]:

Schrodinger understood that aperiodicity was needed to store information, since a regular crystal would be unable to carry much information.

Let me list my travel venues for you:

- Wired A.I. 2015 (Tokyo), September 28-29
- SONIC Year 3 Annual Review Meeting (Champaign), September 30 – October 1
- 53rd Annual Allerton Conference on Communication, Control, and Computing (Monticello), September 29 – October 2
- HajekFest: A Workshop on Networks, Games, and Algorithms (Urbana), October 3
- IEEE Information Theory Workshop (Jeju City), October 11-15
- The (curious case of the) Watson Intelligence (Chicago), October 18
- Santa Fe Institute (Santa Fe), October 21
- Los Alamos National Laboratory (Los Alamos), October 22

And now with that travel done, I think I’ll be going hard on writing and maybe even some theorem-proving and data analytics. As we’ve discussed, I find blogging to sometimes jump start the writing/doing engines, and so here we go with cities.

As promised previously, I perform some formal tests for lognormal distributions of house sizes in Mohenjo Daro and in Syracuse. As a starting point, I used the lognfit function in matlab to find the maximum likelihood estimates of the fit parameters and also the 95% confidence intervals. The two parameters are the mean μ and standard deviation σ of the associated normal distribution. The estimated value of σ is the square root of the unbiased estimate of the variance of the log of the data. Rather than showing the rank-frequency plots as in the previous post, let me show the cumulative distribution functions. Note that in Syracuse data, about 1/5 of houses do not have a listed living area, so I exclude them from this analysis.

At least visually, these don’t look like the best of fits. To measure the goodness of fit, I use the chi-square goodness-of-fit test as implemented in matlab as chi2gof. With data ‘area’ already fit using lognfit into parameter vector ‘parmhat’, this is [h,p] = chi2gof(area,’cdf’,@(z)logncdf(z,parmhat(1),parmhat(2)),’nparams’,2). Despite the visual evidence, the chi-square test does not reject the null hypothesis of lognormality at the 5% confidence level for Mohenjo Daro. The chi-square test does reject the null hypothesis of lognormality at the 5% confidence level for Syracuse, contrary to the theory of Bettencourt, et al. I wonder what the explanation might be for this contrary finding in Syracuse: maybe some data fidelity issues?

By the way, I also promised some other nuggets and so here is one: the relationship between living area and value in Syracuse.

There is certainly more than just living area that determines value. In fact, the methodology of assessing house value is an interesting one. One more nugget is on when houses that existed in Syracuse in July 2011 were built.

I wonder if there is a way to understand this data through a birth-death process model. There is a nice theoretical paper in this general direction, “Random Fluctuations in the Age-Distribution of a Population Whose Development is Controlled by the Simple “Birth-and-Death” Process,” by David G. Kendall from the *J. Royal Statistical Society* in 1950.

*Dataclysm*by the author of the OKCupid blog, in a sense it is an expanded version of that blog. One of the big things that is pointed out is that there will be growing longitudinal data about individuals due to social media such as Facebook. Collections of pennants are eventually taken down from bedroom walls, but nothing is taken down from Facebook walls. It uses culturomics.

As I may have foreshadowed, Ron Kline’s book, *The Cybernetic Moment* (that I helped with a little bit), also uses culturomics a little bit to measure the nature of discourse.

So that is some flying, flying, and digging, digging from me. Hope you’ll contribute to the discourse so future historians have more to study. By the way, the city sizes for the various places (as per Wikipedia today) are, from large to small:

- 13,216,221 – Tokyo
- 2,722,389 – Chicago
- 435,413 – Jeju
- 84,513 – Champaign
- 67,947 – Santa Fe
- 41,250 – Urbana
- 12,019 – Los Alamos
- 5,138 – Monticello

Perhaps data for a statistical assessment?

## Leave a Reply