Archive for April, 2010


Know the Future, Control the Past

April 30, 2010

Yes, sir.  We’re young, we’re pretty, we’re fast, and no one can beat us.  I should warn you, however, that I don’t really understand duality.  In the past, I’ve read that the duality between compression and error-correction “can be pursued further and is related to a duality between past and future and the notions of control and knowledge. Thus we may have knowledge of the past but cannot control it; we may control the future but have no knowledge of it,” but don’t know what it means.

Should I return to the inner frame of allied informational topics?  In your Human Spark post, you took the Bumpus sparrow data and tried to create knowledge from it.  In particular, you created two plots that showed the importance of various features in predicting survival.  The first used the raw data directly whereas the second tried to apply some “inflation-adjustment” to roughly take into account the fact that typically all parts of a bird scale together.  In particular, the inflation adjustment was to divide all length measurements by the total length, leaving weight, age, and sex unchanged.  This procedure seems to implicitly assume that birds scale isometrically but in actuality, birds scale allometrically.  For example, wing-bone length is related to mass according to a power law.  Similarly, there seems to be an allometric relationship between head and body.  Indeed, scaling laws of a power law form are a central part of mathematical biology and are some of the most prominent examples of symbolically stated laws that arise from the optimization approach.  If allometry is taken into account, do you think there would be much change in the results?

In fact, this raises the more general question of whether there are any principled methods for commensurating features.  I don’t know much about this topic, but surely this must be a long-standing issue in several branches of study.  Perhaps you can enlighten me.  Moving from supervised learning to exploratory data analysis, the method of principal component analysis (PCA) also implicitly assumes that large variance implies importance, but this could also be subject to the vagaries of commensuration, couldn’t it?  I recently heard Constantine give a talk about extending PCA to settings of high-dimensionality and under adversarial noise, which was rather interesting.  To motivate high dimensions, he used DNA microarrys as an example, but I heard from some biologists last night that these are actually falling out of favor very quickly, so maybe a new example is needed for the beginning of high-dimensional statistics talks.  As you know, PCA is used in population genetics, a field that inherently deals with very high-dimensional data.

Indeed, I think some of the questions about hominids and the evolutionary acquisition of the “human spark” that Alan Alda discussed in the first episode might be answerable using techniques from high-dimensional statistics for archeological genetics.  Definitely a fancier approach than the basic morphological methods he showed (and that are the essence of analyzing the Bumpus data), but whether it is better remains to be seen.


Information Ashvins

April 23, 2010

Salve, Señor Gaius Mucius Scaevola.  I mentioned, but didn’t really elaborate on the oral storyteller.  The storyteller is the bard, the singer of tales, the epic poet.

Horace wrote this rule for the epic poet in his Ars Poetica: “semper ad eventum in medias res / non secus ac notas auditorem rapit et quae / desparat tractata nitescere posse relinquet.”  The purport of this rule is the following, as given by Thomas Schütte.  “… the mandate to begin in the middle of things is a mandate to try by every means to move the listener, the one who attends or hears the poet’s words.  The listener is addressed ‘as though’ he or she knew the narrative beforehand and this prior knowledge is a resource for the poet, who therefore may act with greater freedom in presenting the narrative in a new order.  But the listener’s prior knowledge also asserts a demand, for the poet must make such history worthwhile and what escapes the poet’s powers cannot be deliberated upon and recaptured; the poet must move on, abandoning what is beyond [his] creative resources.”  Wouldn’t you agree that that’s what we’ve done with this web log so far, at least the in medias res part?

I was exposed to Albert B. Lord‘s The Singer of Tales, which is referred to in that post-Paths Ahead note, in a class taught by Prof. Minkowski.  An emphasis in that class besides formulaic language in epic poetry was on the unusualness of frame stories in the Mahābhārata, which are discussed here.  These frame stories involved conversation or dialogue between pairs Vaiśampāyana and Janamejaya, Ugraśravas and Śaunaka, Saṃjaya and Dhṛtarāṣṭra, Vasiṣṭha and Parāśara, Lomaśa and Yudhiṣṭhira, and others.  This conversational/dialogic style is a hallmark of Saṃskṛta texts which I feel is a good style to have.  The blog has been of this form so far, and I see no reason to change it up.  Do you?

The Minkowski paper I linked to has its own sort of frame.  It begins with a description of a study by that opponent of Krishna Maheshwari, Michael Witzel, which discusses the Jaiminīya Brāhmaṇa and its frame story involving the Aśvins, the divine doctors and twin brothers.  The twin doctors are “young, handsome, brilliant and agile,” basically everything that the Louisville Lip was too. They bring the dawn light and all of the metaphors that that entails.  As discussed here, they are “the personification of coordinated action by a duality.”  “Their harmonious ability to coordinate themselves in good works is a model for all happy dualities.”  Now that we have both become doctors, I hope we will coordinate to do good and do well in the future, especially in the areas of information science, information theory, information systems and allied topics: strive to be Information Ashvins.


Human Spark

April 17, 2010

What up, Señor Trapper John, M. D.?  Thanks for setting the table for me to mention this brief note I wrote after Paths Ahead, which includes some discussion about storytelling and what a solution is.  In less than a week into my foray into the industrial world, I picked up some new jargon related to this discussion of ours: the DIKW pyramid, where the four levels are data, information, knowledge, and wisdom.

Did you see any of the three-part miniseries on PBS hosted by Alan Alda on what makes humans different from other animals, especially other primates, and especially in the area of intelligence?  If you didn’t, you can watch the full episodes online.  I thought it was really interesting.  For a preview of the type of stuff in the miniseries, take a look at this column.  The Discovery Channel series Life features primates this week, which might be interesting as well.

I wouldn’t call myself an expert in dimensionality reduction and supervised classification, but I wouldn’t call myself a birdbrain in the subject either.  Coming to your Bumpus sparrow data challenge and whether brain size predicts survival, let me first describe the dataset a bit.  There are eleven measurement dimensions: sex, age (adult or young, provided only for males), total length, alar extent (wingspan), weight, length of beak and head, length of humerus, length of femur, length of tibiotarsus, width of skull, and length of keel of sternum.  Some sparrows perished and some survived a severe storm.  Presumably the two features: length of beak and head, and width of skull are related to brain size.

The logistic regression done by Janzen and Stern that you linked to was done on males and females separately and did not consider age.  (I suspect this is because categorical variables and missing data are not well handled by logistic regression.)  The technique outputs a weight vector equal in length to the number of features, whose magnitude may be interpreted in certain situations to show the relative importance of the different features.  For the males, the top two features were total length and weight.  Skull width was sixth, and the length of head and beak came in eighth out of nine.

I did a similar analysis using the random forest classifier, which readily handles categorical data and missing values and provides a measure of feature importance based on the classification error on permuted “out-of-bag” data.  Here are the feature importance values I found.


Here again, total length is first, and weight is second.  The length of beak and head is seventh and width of skull is last out of eleven.  This analysis and the logistic regression analysis seem to indicate that the head size features are not predictive of survival.

However, what is obscured in the analysis is the relative body part lengths and widths because of the overall scaling of the birds.  I redid the same random forest feature importance calculations but this time, I divided the alar extent, width of skull, and the five smaller lengths by the total length for each bird, and used those normalized measurements as the features.  The importance values are quite different.

The total length and weight don’t dominate.  The two head size features are right there after alar extent and length of humerus.  Long wings are unsurprisingly really important, but after that, head size is important as well.  I wouldn’t go so far as to say that this shows that large brain size leads to survival, but that possibility isn’t precluded by this simple analysis either.  There is a story to be told, but I don’t see the story being a scientific law.


The Nature of Solutions

April 12, 2010

Oh my, we seem to be heading into Changeux-Connes territory with this conversation turning to mind, matter, and mathematics.  So in some sense, with this Newtonian synthesis concept, you bring up the question of what constitutes a solution to a problem.  As your doctoral advisor Alan Willsky likes to point out with regard to the evolution of Bayesian estimation theory, the Wiener filter is a formula, the Kalman filter is an algorithm, and the particle filter is essentially a simulation.  Incidentally, as I think you know, I have some interest in the history of particle filters but have not yet found a description of why the British defense establishment was originally interested in them.

Pablo Parrilo, in a talk with Stephen Boyd at the Paths Ahead in the Science of Information and Decision Systems symposium, discussed the nature of solutions in an era with huge computational resources.  How do you think solutions specified as convex programming or Monte Carlo simulation (like a particle filter) fit into the combination of the logical and the statistical in the grand scheme of reasoning?

Anyway, as you had suggested, it is superior to be practical, so let me come back to the potentially practical question of learning a scientific law, here in the context of mind and matter.

The physical and biological basis for intelligence has been a question that has been around since the time of the ancients.  In particular, people have wondered why some individuals are smarter than others.  Is there something about the brains of the intelligent that is different from the brains of the unintelligent.  This article reviews many facts and theories, including the classical fact that total brain volume is correlated with intelligence.  A more recent finding is that intelligent brains are more efficient in the sense of information processing as measured through graph-theoretic quantities.  Some of my own work looks at the relationship between brain volume and memory capacity as well as the potential functional significance of structural properties of neuronal networks, so this is all very interesting to me.

The argument that brain volume is correlated with intelligence leads to the argument that brain volume is correlated with survival.  That is, larger brains give survival advantage.  But one might ask whether there is any direct way to measure survival of the fittest, a central underpinning of evolutionary theory.  A classic data set that has been analyzed several times (including work using logistic regression) is the Bumpus field sparrow data set (available from The Field Museum in Chicago, whose biological specimen collection I saw in 1999).  During a winter storm, some sparrows were killed and some survived.  It has been argued that the survivors were more fit than the others.  But was it because they had bigger or more efficient brains?

So now with all of that lead-up, finally to the practical question.  Given this data set with measurements in several phenotypic dimensions and binary survival labels, can an expert in  dimensionality reduction and supervised classification learn a scientific law to explain it?  Moreover, will the scientific law take the form of a formula, an algorithm, a simulation, a decision boundary, a story, or something else altogether?


Models That Compute Are Superior

April 10, 2010

Howdy, Señor JoJo Morasco.  I start work in Yorktown Heights this week. 

Do you remember one of the epigraphs in my S. M. thesis, which was spoken by Drupada in the Mahābhārata: “Of beings, those that are endowed with life are superior.  Of living beings, those that are endowed with intelligence are superior.  Of intelligent creatures, men are superior.  Of men, the twice-born are superior.  Of the twice-born, students of the Veda are superior.  Of students of the Veda, those of cultured understanding are superior.  Of cultured men, practical persons are superior.”  As you indicated and this confirms, Indic thinkers of old held practical algorithms for computation in the highest regard, even higher than the ultimate for the Greeks: cultured understanding.

Here’s a quotation related to models and practicality that Pat Kreidl sometimes alluded to.  Box and Draper wrote in their book on empirical model-building, “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

You asked how one learns scientific laws.  Schmidt and Lipson have a way that uses the power of evolutionary computation to find scientific laws via symbolic regression.  I don’t know if I’m fully down with this approach though, because symbolically stated laws and rules don’t allow for exceptions and uncertainty. 

One of the parts I thought was interesting in the second paper by Narasimha that you linked to was this: “Nīlakantha (1444-1545 CE), declared that ‘logical reasoning is of little substance, and often indecisive’ — words that would seem to go totally contrary to the approach that was used in Hellenist schools, which followed the Euclidean method of going from well stated axioms through a process of purely logical deduction to theorems or conclusions.”  Old school AI was all about logical deduction, and didn’t really deliver on its promise.  A new push that I think is poised to make an impact is the combination of logical stuff with statistical stuff — kind of in a way the combination of semantics with pragmatics — that this news article calls a grand unified theory of AI.  This combination/unification actually isn’t all too different from the “Newtonian synthesis” of axiomatism and computational positivism described in the Narasimha paper, innit?


Computing Muhurtham

April 6, 2010

Hmm, information geometry.  As you know, there has been recent work in the distributed learning community on finding coherent probability assignments.  Perhaps for similar problems, there is a way to use your differential equation based methods that operate on manifolds.  As Amari argues, information geometry is also useful for understanding the probabilistic structure of data collected in neuroscience and elsewhere.  Graphical models and variational inference are central concepts.

Rather than talking more about modern methods of inference, though, let me step back to inference problems faced by the ancients.  These mathematical problems were often driven by astronomical calendrics, e.g. computing muhurtham.  As noted on p. 4 of (Kim Plofker, Mathematics in India, Princeton: Princeton University Press, 2008):  “it is not really possible to understand the structure and context of mathematics in India without recognizing its close connections to astronomy.  Most authors of major Sanskrit mathematical works also wrote on astronomy, often in the same work.  Astronomical problems drove the development of many mathematical techniques and practices, from ancient times up through the early modern period.”  Moreover, as argued by Sorenson, “The earliest stimulus for the development of estimation was apparently provided by astronomical studies in which planet and comet motion was studied using telescopic measurement data.  The motion of these bodies can be completely characterized by six parameters, and the estimation problem that was considered was that of inferring the values of these parameters from the measurement data.”

Within the tradition of Newton, Gauss fixed a model of astronomical motion and determined methods to estimate the parameters.  This approach is very different from pure machine learning, where no model with a specified parametrization is provided a priori.  The model-based approach raises the question, however, of how the model of astronomical motion is arrived at in the first place and whether it can be trusted.  Indeed, model-based approaches were often not trusted by Indic thinkers of old, as noted in these two papers on the philosophy of computational positivism.  So let me ask you, how does one learn scientific laws?  Moreover, do you think there is value in connecting concepts from the philosophy of science like falsifiability or positivism with formal notions from learning theory like VC dimension?