Yes, sir. We’re young, we’re pretty, we’re fast, and no one can beat us. I should warn you, however, that I don’t really understand duality. In the past, I’ve read that the duality between compression and error-correction “can be pursued further and is related to a duality between past and future and the notions of control and knowledge. Thus we may have knowledge of the past but cannot control it; we may control the future but have no knowledge of it,” but don’t know what it means.
Should I return to the inner frame of allied informational topics? In your Human Spark post, you took the Bumpus sparrow data and tried to create knowledge from it. In particular, you created two plots that showed the importance of various features in predicting survival. The first used the raw data directly whereas the second tried to apply some “inflation-adjustment” to roughly take into account the fact that typically all parts of a bird scale together. In particular, the inflation adjustment was to divide all length measurements by the total length, leaving weight, age, and sex unchanged. This procedure seems to implicitly assume that birds scale isometrically but in actuality, birds scale allometrically. For example, wing-bone length is related to mass according to a power law. Similarly, there seems to be an allometric relationship between head and body. Indeed, scaling laws of a power law form are a central part of mathematical biology and are some of the most prominent examples of symbolically stated laws that arise from the optimization approach. If allometry is taken into account, do you think there would be much change in the results?
In fact, this raises the more general question of whether there are any principled methods for commensurating features. I don’t know much about this topic, but surely this must be a long-standing issue in several branches of study. Perhaps you can enlighten me. Moving from supervised learning to exploratory data analysis, the method of principal component analysis (PCA) also implicitly assumes that large variance implies importance, but this could also be subject to the vagaries of commensuration, couldn’t it? I recently heard Constantine give a talk about extending PCA to settings of high-dimensionality and under adversarial noise, which was rather interesting. To motivate high dimensions, he used DNA microarrys as an example, but I heard from some biologists last night that these are actually falling out of favor very quickly, so maybe a new example is needed for the beginning of high-dimensional statistics talks. As you know, PCA is used in population genetics, a field that inherently deals with very high-dimensional data.
Indeed, I think some of the questions about hominids and the evolutionary acquisition of the “human spark” that Alan Alda discussed in the first episode might be answerable using techniques from high-dimensional statistics for archeological genetics. Definitely a fancier approach than the basic morphological methods he showed (and that are the essence of analyzing the Bumpus data), but whether it is better remains to be seen.