Data Yes. Wisdom No.

May 5, 2010

Hymie, this post is brought to you by the decade of smart. 

Rating feature importance is tricky business.  With the sparrows, I did a least squares linear fit of the logarithm of weight with the logarithm of each of the length and width features (albeit a bit suspect statistically), finding the coefficient of the linear term to be 0.23 for total length, 0.23 for alar extent, 0.21 for length of beak and head, 0.29 for length of humerus, 0.26 for length of femur, 0.29 for length of tibiotarsus, 0.21 for width of skull, and an outlier 0.43 for length of keel of sternum.  The isometric relationship should be 1/3, so all of these measurements have negative allometry except for one. 

You asked about principled methods for commensurating features.  I take it that you’re talking about how to do simple scalings of feature dimensions.  In fact, standard classification trees with axis-aligned splits and random forests built from them are not affected by scaling dimensions.  However, other pattern recognition methods most surely are.  I may be wrong, but it is my impression that there are few existing principles.  I invite any reader who knows otherwise to share that with Hymie and me.  A related thing that has only recently started receiving attention is automatically learning kernels

Anyways, I divided the length features by weight raised to the exponent that was fit and obtained random forest out-of-bag feature importance values as before.  Total length seemed to again be confounding matters, so I took it out.  Here are the importances. 

The Bumpus sparrow data hasn’t provided me with any wisdom; it’s just too tricky.  There’s a large dataset of body part measurements of beings that possess the human spark that I think would be interesting to look at: the Anthropometric Survey of the United Provinces (1941).  Unlike sparrow survival, however, it is not clear what the response variable to be examined is.  Maybe some unsupervised analysis is the way to go, but I’m not typing up all those numbers.

Also, I’m done with the birds.  Nevermore.





  1. […] Ashvins The Ultimate Machinists « Data Yes. Wisdom No. Neanderthals May 7, 2010 I’m at Janelia Farm at the moment and one of the topics of […]

  2. […] archeological evidence is difficult, so it is unclear whether H. floresiensis is distinct.  Brain size and scaling laws are the basis for several of the arguments both for and against.]  Would there be positive social […]

  3. […] in statistics (for estimation rather than detection as you detailed previously).  You had mentioned the work of Mahalanobis, but it seems that Galton was previously inspired to come up with the […]

  4. […] Incidentally, you won’t have to wait too much longer for the release of some 2010 U.S. census data, so I’m sure you’ll go crazy with that.  Somehow population data about us is more interesting than population data about birds, no? […]

  5. […] on with my scientometrics meme (this is getting worse than the birds, eh?), I went ahead and collected data for all years from 2004 to 2011 (the 2011 set does not yet […]

