h1

Remote Sensing

September 7, 2014

Acquiring information from afar is often quite important, whether to reduce the cost of ground investigation, to get a wider view, or perhaps to conceal surveillance activities.  A couple weeks ago at the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining you had a ‘social good’ paper on using remote sensing data to predict which villages have more poor people than others, based on whether there were more houses with metal roofs or with thatch roofs.  An earlier presentation of this work was given at DataKind, under whose auspices the work was carried out together with the the charity GiveDirectly.

Congratulations on the best paper award for this work!

Incidentally I also enjoyed your work on tennis analytics at the same conference and was therefore glad I attended part of the Large-Scale Sports Analytics workshop, in addition to the data-driven educational assessment workshop I was running.  

Coming back to remote sensing and somewhat related to the last post, remote sensing of waste production can potentially be used to sense alien civilizations.  Though more apropos to your work, apparently night-time light remote sensing is becoming a common approach to poverty detection, thinking of night lights as light pollution.  A few papers in a variety of journals on this topic include this, this, and this.  I wonder, though, whether there is a way to measure “signal pollution” as a way to do remote sensing to build on the idea of information metabolism.  With information pollution, maybe it is low-entropy signals one should look for, rather than high-entropy signals.

Perhaps artistic things you can see from the air?

SFig-2

 

h1

Information Metabolism and Waste Production

August 6, 2014

Your computational aboriginal art is really quite amazing!  I think you can give AARON a run for its money.  In your post previous to your artistic one, you brought up the notion of lateral thinking.  I think part of why it is difficult is because we draw heavily on memory as part of perception.  This is well-captured in Leslie Valiant’s Probably Approximately Correct book [p. 142]:

In different parts of the world the same word may have different meanings, or the distribution of examples may be different.  In these cases the Invariance Assumption would be violated, and shared meaning would not be achieved.  Misunderstandings would result.  There are pernicious obstacles to shared meaning even beyond those inherent in differences in meaning and distributions.  These further impediments are imposed by the constraint of a limited mind’s eye interacting with an internal memory full of beliefs.  We may all be looking at the same world through our mind’s eyes, but since we have much control of what information to allow in, dependent on our beliefs, we may not see the same world.  In the mind’s eye we process not only the information coming from outside, but also information internally retrieved from our long-term memory.

As you know, I was in Quebec City last week for the workshops following the Computational Neuroscience Meeting. I spoke about associative memories and how a little bit of circuit noise can improve recall, but also heard an interesting talk by Byron Yu on how it may be difficult to learn things you are not used to.  Details on learning things off-subspace will soon be published in an experimental neuroscience Nature paper.  

All of this talk about informational inputs, outputs, and internals has gotten me thinking about whether one can define a useful notion of information metabolism, in analogy to metabolism, which (as per Wikipedia) is the set of life-sustaining chemical transformations within the cells of living organisms. These enzyme-catalyzed reactions allow organisms to grow and reproduce, maintain their structures, and respond to their environments.  [There is apparently already Kepinski’s notion of information metabolism, but I believe I want something different.]

In an earlier post, I suppose I did discuss metabolic processes of waste, primarily allometric scaling laws for waste production by mammals.  Though as I mentioned, some of renewed interest in scaling laws comes from the science of cities.  One might wonder, then about scaling laws for waste as functions of city population rather than animal mass.  Luis Bettencourt has called cities a sort of social reactor that is part star and part network, and so understanding physical flows might be inspirational for understanding informational flows.  Of course sanitation is super essential to public health and welfare in its own right.

As it turns out, there are quite a few people interested in scaling laws for waste in cities, and there seems to be a growing debate in the scientific literature.  Let me summarize discussions on air pollution.  

  • Using data from the Emissions Database for Global Atmospheric Research (EDGAR), Marcotullio et al. find that larger cities have more greenhouse gas emissions (CO2, N2O, CH4, and SF6), in that a small increase in population size in any particular area is associated with a disproportionately larger increase in emissions, on average.
  • Curating a variety of data sources on CO2 emissions, Rybski et al. argue that cities in developing countries are different from cities in developed countries.  In particular, in developing countries, large cities emit more CO2 per capita than small cities with power-law exponent 1.15, where in developed countries large cities are more efficient with power-law exponent 0.80.  (These exponent numbers seem to have numerical significance, as per Bettencourt)
  • Fragkias et al. also look at CO2 emissions, focusing on cities (metropolitan statistical areas) in the United States, but do not find too much increase in efficiency, but find a near-proportional increase.
  • Oliveira et al.  also consider CO2 emissions in American cities (more concentrated than MSAs), but find a strongly superlinear increase, with a power-law exponent of 1.46.

So there seems to be a great deal of uncertainty on what is happening empirically.  Notwithstanding, theories that link together emissions with traffic congestion have also been proposed.

To add some more fuel to the fire, I thought I might plot out some data too.  Rather than emissions, I looked at air quality measures.  As an example, I took population data and air quality data for some cities in India, and joined them, ignoring data where either was missing.  Note there are often multiple measuring stations within a given city, and I treat them as having their own air quality value, but the same population value.  Here are the results for sulfur dioxide.

So maybe nothing too conclusive there.  I wonder if there are other air quality indicators that have some connection to city population.

Sorry for this information snack full of empty calories.

h1

Aboriginal Art

July 10, 2014

Budyari yaguna, Señor William Dawes.  In my Australian adventure, I not only came across media about creativity but also creative media, especially of the aboriginal variety. Before getting to that, however, let me give a shout out to the Bhullar brothers, about whom a viral campaign we never did start, for getting a toehold in the NBA.

Tarisse KingOn the first day of the trip, we saw a traditional aboriginal didgeridoo and dance performance at Currumbin in Queensland. Later in the trip, we saw the aboriginal-inspired contemporary dance production Patyegarang at the Sydney Opera House.  Didgeridoo street performers greeted us at Circular Quay before we made our way by ferry to Manly.

Some of the beach-front galleries there were of aboriginal art.  I was especially drawn to the works of Tarisse King including her Earth Images. The dot paintings are mesmerizing in a unique way.  They are meant to represent a view of the earth from above.

Tarisse King: Fire

Upon looking at her paintings, I wondered to myself whether something similar could be created using Gaussian processes and morphological image processing.  Here is my attempt at computer art via the following Matlab script:

seed = 1234; %random seed
n = 50; %grid size
r = 10; %repititions per point
s = 8; %resize scale for skeleton
l = 0.05; %squared exponential kernel parameter

rng(seed);

%create a grid
[X1,X2] = meshgrid(linspace(0,1,n),linspace(0,1,n));
x = [X1(:),X2(:)];

%covariance calculation using squared exponential kernel
K = exp(-squareform(pdist(x).^2/(2*l^2)));
[V,D]=eig(K);
A=V*(D.^(1/2));

%sample from Gaussian process
gaussian_process_sample = A*randn(n^2,1);

%calculate skeleton of the peaks
skeleton = bwmorph(imresize(reshape(real(gaussian_process_sample),n,n),s)>0,'skel',Inf);

%plot the painting on a black background using randomly perturbed copies of points from the Gaussian process sample and overlay the skeleton
figure; hold on;
scatter(repmat(x(:,1),r,1)+randn(r*n^2,1)/100,repmat(x(:,2),r,1)+randn(r*n^2,1)/100,12,repmat(gaussian_process_sample,r,1)+randn(r*n^2,1)/50,'filled');
h = imshow(cat(3,ones(n*s,n*s),ones(n*s,n*s),0.8*ones(n*s,n*s)),'XData',linspace(0,1,n*s),'YData',linspace(0,1,n*s));
set(h,'AlphaData',skeleton);
axis on; axis image; whitebg(gcf,'k'); set(gca,'XTick',[],'YTick',[],'box','on');
colormap([[linspace(0.04,1,24).';ones(40,1)],[zeros(24,1);linspace(0,0.84,40).'],zeros(64,1)]);

seed = 1234
1234
seed = 1235
1235
seed = 1236
1236

What do you think?

h1

Random Episodic Silent Thought

July 9, 2014

G’day mate.  I had a very nice time in Australia and our olfaction stuff was well received at the SSP Workshop.  While I was away, the computational creativity stuff debuted in its Chef Watson manifestation, but that was only one of many creativity-related things I came across during my trip.

On the flight back, I watched The Lego Movie, which in addition to featuring a 1980-Something Space Guy like we used to play with at the Mehrotra residence, is a commentary on the value of creativity.  I hadn’t realized beforehand that the movie’s theme was the supremacy of creatively building things over only following the instructions.  I’m glad I watched it.

I came across articles about creativity in The Atlantic and the New York Times Bits Blog.

Another pleasant viewing experience on the flight was the Australian Broadcasting Corporation’s documentary miniseries Redesign My Brain with Todd Sampson.  It helped me understand how several parts of your research flow together.  The first part of the miniseries utilizes the concept of neuroplasticity to show how Lumosity-like exercises can improve brain function along three dimensions: speed of thought, attention, and memory. I think the first of these can be related to typical Shannon theory, the second to some of your new information theory stuff incorporating Bayesian surprise, and the third to your new associative memory stuff.  The second part of the miniseries is all about human creativity starting with divergent thinking and then moving on to four criteria for creativity: effectiveness, novelty, elegance, and genesis.  The divergent thinking, effectiveness, and novelty are very much part of the computational creativity process we espoused, the Chef Watson app is elegant, and the extension to fashion, business processes, etc. that you talk about is the genesis. 

The last part of the creativity episode is about lateral thinking.  I wonder if and how you can investigate or model lateral thinking using information theory and statistical signal processing, and whether you’d want to include it in your research agenda.

h1

Clusters

June 28, 2014

A friend of the blog was recently asking both of us how to cluster time series (of possibly different lengths), and in response to that query I had looked at the paper “A novel hierarchical clustering algorithm for gene sequences” by Wei, Jiang, Wei, and Wang who are bioinformatics researchers from China and Canada.  The basic idea is to generate feature vectors from the raw time series data, define a distance function on the feature space, and then use this distance measure to do (hierarchical) clustering.  At the time, I also flipped through a survey article on clustering time series, “Clustering of time series data—a survey” by Liao who is an industrial engineer from Louisiana.  As he says, the goal of clustering is to identify structure in an unlabeled data set by organizing data into homogeneous groups where the within-group-object similarity is minimized and the between-group-object dissimilarity is maximized and points out five broad approaches: partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods.  There are of course applications in all kinds of fields, such as biology, finance, and of course social media analytics where one might want to cluster Twitter users according to the time series patterns of tweeting sentiment.

But any technique seems to require some notion of similarity to proceed.  As Leslie Valiant says in his book, Probably Approximately Correct [p. 159]:

PAC learning as we have described it is a model of supervised learning.  One of its strengths is that it is essentially assumption free.  Attempts to formulate analogous theories for unsupervised learning have not been successful.  In unsupervised learning the learner appears to have to make specific assumptions about what similarity means.  If externally provided labels are not available, the learner has to decide which groups of objects are to be categorized as being of one kind, and which of another kind.

I hold the view that supervised learning is a powerful natural phenomenon, while unsupervised learning is not.

So maybe clustering is not a powerful natural phenomenon (but would Rand disagree?), but I’d like to do it anyway.  As some say, clustering is an art rather than a science, but I like art, don’t you? In some sense the question boils down to developing notions of similarity that are appropriate.  Though I must admit I do have some affinity for the notion of “natural kinds” that Bowker and Star sometimes talk about when discussing schemes for classifying various things into categories.  

Let me consider a few examples of clustering to set the stage:

  1. When trying to understand the mapping between neural activity and behavior, is it important to cluster video time series recordings of behavior into a discrete set of “behavorial phenotypes” that can then be understood.  This was done in a paper by Josh Vogelstein et al., summarized here.  An essentially Euclidean notion of similarity was considered.
  2. When trying to understand the nature of the universe and specifically dark matter,  a preprint by my old Edgerton-mate Robyn Sanderson et al., discusses the use of the Kullback-Leibler divergence for measuring things in a probabilistic sense, without having to assert the notion of similarity too much in the original domain.
  3. To take a completely different example, how might people in different cultures cluster colors into named categories?  In fact this has been studied in a large-scale worldwide study, which has made the raw data available.  How does frequency become a categorical named color, and which color is most similar to another?

Within their domains, these clusterings seem to be effective, but is there a general methodology?  One idea that has been studied is to ask people what they think of the results of various formal clustering algorithms, a form of anthropocentric data analysis, as it were.  Can this be put together algorithmically with information-theoretic ideas on sampling distortion functions due to Niesen and all?

Another idea I learned from Jennifer Dy, who I met in Mysore at the National Academy of Engineering‘s Indo-American Frontiers of Engineering Symposium last month, is to actually create several different possible clusterings and then let people decide.  A very intriguing idea.

Finally, one might consider drawing on universal information theory and go from there.  A central construct in universal channel coding is the maximum mutual information (MMI) decoder, which doesn’t require any statistical knowledge, but learns things as it goes along.  Misra and Weissman modified that basic idea to do clustering rather than decoding, in a really neat result. Didn’t make it into Silicon Valley, as far as I can tell, but really neat.  Applications to dark matter?

You are currently en route to Australia to, among other things, present our joint work on olfactory signal processing at the IEEE Statistical Signal Processing workshop.  One paper on active odor cancellation, and the other on food steganography.  Do let me know of any new tips or tricks you pick up down under: hopefully with some labels rather than forcing me to do unsupervised learning.  Also, what would you cluster angelica seed oil with?

h1

Scaling Laws for Waste

April 13, 2014

It has been a long while since I last wrote a blog post.  In the meanwhile, a lot of things have happened, and apropos to the previous post, I have switched from the schedule of a research staff member to the schedule of a starting assistant professor.  Definitely some different elements to it.  More time teaching and less time working on a food truck, to say the least.

Since last I’ve posted, I’ve also gotten much more into tweeting, which perhaps does take away some of my impetus for blogging: limited attention, information overload, and all that.  As one of the great new communication media, I’m also getting interested in Twitter as an object of research study.  Indeed, some of my undergraduate researchers this summer will be looking at social media analytics.

Since the last post, I’ve also gotten further interested in resource recovery and other problems of environmental engineering, though not at all versed in the subject yet. One of the most valuable resources from which to recover energy, nutrients, water, and solids is animal waste.  Indeed, there have even been wars over the control of guano.  

I’ve had some longstanding interest in allometric scaling laws for various things, and I suppose I’ve made you at least somewhat interested.  When I was visiting Santa Fe last summer, I feel like my interest in this topic was renewed, largely due to the enthusiasm of Luis Bettencourt on scaling laws for cities.  As it turns out, there are a lot of parallels to neurobiological scaling.

With all that as preface, do you have any idea how the amount of waste produced by an animal scales with the size of the animal?  Do you think it would be allometric scaling?

In fact this question for urine has been studied in the literature more extensively than I would have expected.  In the paper, “Scaling of Renal Function of Mammals,” Edwards takes data on the mass [kg] and the urine volume [mL/24 hours] for 30 mammalian species and finds an allometric relation with power law exponent 0.75, which is the same power-law exponent as for metabolic rate as given by Kleiber’s Law. (One theoretical derivation based on elasticity is due to McMahon.)

The same urine volume exponent is presented in a paper, “Scaling of osmotic regulation in mammals and birds,” by Calder and Braun. Turning to water loss through feces, Calder and Braun say:

Fecal losses should, in absence of size-related differences in food quality, digestive efficiency, and/or reabsorption, scale in parallel to the intake that supplies metabolic requirements, but the only allometric expression we have found in the literature has M0.63 scaling [in mL/day].

where the scaling law is quoted from a paper by Blueweiss, Fox, Kudzma, Nakashima, Peters, and Sams, “Relationships between body size and some life history parameters,” from the journal Oecologia.  The original statement in that paper regarding defecation is measured in the units g/g/day and gives a power-law exponent -0.37 based on data from mammals, but this measure of [g/g/day] already normalizes once by body weight, which is why there is no issue, 1 – 0.37 = 0.63, assuming constant liquid content.  The original data used by Blueweiss, et al. is said to be from a paper by Sacher and Staffeldt, “Relation of gestation time to brain weight for placental mammals: implications for the theory of vertebrate growth,” though I didn’t see it in there.

A contributing factor to all of this is of course food intake, via assimilation efficiency of various foods.  In a paper “Allometry of Food Intake in Free-Ranging Anthropoid Primates” in Folia Primatologica, Barton reports the power-law exponent for daily intake in grams dry weight per 24 hours as probably a little bit more than 0.75 using limited data on 9 species of primates (including humans).  For cattle, Illius in a paper, “Allometry of food intake and grazing behaviour with body size in cattle,” talks about intake with exponent a little bit less than 0.75.

Having talked about urine and feces, what about CO2?  A paper “Direct and indirect metabolic CO2 release by humanity” by Prairie and Duarte quotes allometric laws on respiration and defecation from the book The Ecological Implications of Body Size by Peters, which maybe I should read.

So what does this have to do with information?  I wonder if there is a notion of information metabolism with an associated scaling law like Kleiber’s.  There is a notion of a ‘garbage tape’ in the thermodynamics of computation following Landauer, and so I wonder what fraction of information is put into the garbage tape as a function of the size of the computation.  

Anyway, good to get back into blogging, hopefully without too much garbage.  After all, we don’t want too much information pollution, nor municipal solid waste in cities for that matter.

h1

Scheduling Time

November 3, 2013

One of the interesting things at IBM, at least for me, has been how common it is for people to use Lotus Notes to first see when people might be available, and then to schedule meetings.  Unfortunately there is no way to see why someone else has a slot blocked off; only that it is.  That is why I think an improvement would be to allow circles of visibility, so certain people can see why your calendar is blocked off and whether it seems possible to propose a shift.  Though maybe in hierarchical organizations, this would lead to others requiring membership in circles and imposing value judgments on commitments.

Of course the best for me is the graduate student approach of having very little of one’s time scheduled.  Not just from the scheduling viewpoint itself, but also in the sense of not being too busy to allow thinking time.  As some say, “People with loose, flexible schedules, on the other hand, seem pretty boss”.  A fairly thoughtful self-help (in contrast perhaps to much self-help on time management) article puts forth several good ideas succinctly: slow down, stop trying to be a hero, go home, minimize meetings, go dark, leave the office for lunch, give up on multitasking, and say no.  Doing these things seems to make things nice and slow.

Although some of my work these days is about managing, there is also an element of making things.  That is why I find this article about the difference between the manager’s schedule (which is often partitioned into one-hour blocks) and the maker’s schedule (for people like programmers and writers that generally prefer to use time in units of half a day at least: units of an hour are barely enough time to get started writing) so insightful.  As is discussed:

When you’re operating on the maker’s schedule, meetings are a disaster. A single meeting can blow a whole afternoon, by breaking it into two pieces each too small to do anything hard in. Plus you have to remember to go to the meeting. That’s no problem for someone on the manager’s schedule. There’s always something coming on the next hour; the only question is what. But when someone on the maker’s schedule has a meeting, they have to think about it.

For someone on the maker’s schedule, having a meeting is like throwing an exception. It doesn’t merely cause you to switch from one task to another; it changes the mode in which you work.

I suppose that is why many professors hide away at home or elsewhere when they want to get some serious thinking or writing done:  to avoid handling exceptions.  Can one obtain the managerial benefits of meetings without their negative impacts on making?

To address this, of course one needs to first determine why have meetings in the first place.  A central role of meetings is for coordination.  But perhaps this role of meetings can be eliminated if there is a possibility of ambient awareness.  This term is something that Clive Thompson uses in his book Smarter Than You Think.  As he says regarding meetings [p. 217]:

But younger workers were completely different.  They found traditional meetings vaguely confrontational and far preferred short, informal gatherings.  Why?  Because they were more accustomed to staying in touch ambiently and sharing information online, accomplishing virtually the tasks that boomers grew up doing physically.  Plus, the younger workers had the intuition—which, frankly, most older workers would agree with—that most meetings are a fantastic waste of time.  When they meet with colleagues or clients, they prefer to do it in a cafe, in clusters small enough—no more than two or three people—that a serious, deep conversation can take place, blended with social interaction, of a sort that is impossible in the classic fifteen-person, all-hands-on-deck conclave.

Besides ongoing coordination, though, another purpose of meetings is to perform planning in the first place.  Again, though, it raises the question of whether planning is really necessary.  For physical work requiring a great deal of equipment and lead time, planning seems required, but what about knowledge work?  In an article about Shannon, Bob Gallager essentially argues against too much planning, saying:

In graduate school, doctoral students write a detailed proposal saying what research they plan to do. They are then expected to spend a year or more carrying out that research. This is a reasonable approach to experimental research, which requires considerable investment in buying and assembling the experimental apparatus. It is a much less reasonable approach to Shannon-style research, since writing sensibly about uncharted problem areas is quite difficult until the area becomes somewhat organized, and at that time the hardest part of the research is finished.

And yet I did write a doctoral thesis proposal and do have ongoing coordination meetings.  It would be interesting though, if instead of a doctoral thesis proposal document, I had written ongoing doctoral thesis tweets.  We had previously discussed microblogging a little bit, but this ambient awareness concept of Thompson may enable making people aware of what is going on, without having to have meetings, and also let someone like me write about uncharted problems in somewhat unorganized ways.

Although we often thinking of having as many followers as possible as a goal, this may not be the best use of microblogging as a cognitive tool.  As Thompson says [p. 234]:

The lesson is that there’s value in obscurity.  People who lust after huge follower counts are thinking like traditional broadcasters.  But when you’re broadcasting, there’s no to and fro.  You gain reach, but lose intimacy.  Ambient awareness, in contrast, is more about conversation and co-presence—and you can’t be co-present with a zillion people.  Having a million followers might be useful for hawking yourself or your ideas, but it’s not always great for thinking.  Indeed, it may not even be that useful for hawking things.

Perhaps obsurity has some connection to allowing oneself to not do anything too? 

Anway, that was what I wanted to ramble about.  Perhaps I should have scheduled some time with you on Lotus Notes to review things and make sure I had a good plan for this blog post…

Follow

Get every new post delivered to your Inbox.