## Vincent Yan Fu Tan

March 15, 2011

We start off the interviews with Vincent Tan, currently a postdoctoral researcher at the University of Wisconsin and formerly a co-blogger of mine on the LIDS blog.  This interview has been edited for length.

Kush: You have done quite a bit of theoretical analysis on learning probabilistic graphical models (Markov random fields), especially in the asymptotic regime. Have any of your results been counterintuitive or surprising? Do you think that counterintuitive results advance the field more than intuitive ones?

Vincent Tan: Most of the results described in my PhD are intuitively pleasing. However, one of the results that I developed in collaboration with Choi, Anandkumar and Willsky is in the project in which we endeavoured to learn latent tree-structured graphical models. Such models are characterized by the fact that only a subset of variables is observed; the other variables are hidden. The use of latent variables in modelling has been shown to be an extremely useful construct. For example, in computer vision, a very active area of research is the use of the scene context, which can be regarded as a hidden variable. Indeed, if one knows that an image is that of an airport runway, then one would expect to find airplanes and certainly not animals. We were able to show that there are simple, intuitive algorithms that can consistently recover the latent tree structure under relatively mild assumptions on the underlying model. This result is somewhat counterintuitive because it seems impossible to identity probabilistic models consistently given only partial information, yet we are able to demonstrate that this can indeed be done with low sample and computational complexity.

Counterintuitive results challenge the scientist to examine prevailing assumptions and to think deeply about the consequences of the results. For example, in our project on learning latent trees, it begs the question whether we can further relax the existing assumptions or to develop new algorithms to learn latent graphical models which have loops and hence have greater modeling power. However, greater modeling power does not equate to better predictive power due in part to overfitting. Of course, counterintuitive results also have the ability to raise eyebrows in the community and for other scientists to take notice of one’s work (which can only bode well for one’s citation index). I believe that both intuitive as well as counterintuitive results are of value but care has to be exercised in the application of the latter to ensure that all assumptions made are valid.

Lav: In San Diego last month, Sergio Verdú presented Shannon’s inequality $P_e \ge \frac{1}{6}\frac{ H(X|Y) }{ \log M + \log \log M - \log H(X|Y) }$. Do you think it is good, bad, or ugly?

VT: I thought that the talk was very interesting. It’s a little known inequality that the information theory community ought to know about. Having said that though, this inequality seems to be of limited utility given that Fano’s inequality can do the job equally well in many scenarios. If Prof. Verdu had given a concrete example on how “Shannon’s inequality” can be used in a “real-life scenario” (and yield better results than Fano’s) then the information theory community will stand up and take notice. Otherwise, it’s as good as a homework problem. Come to think of it, it should be made a homework problem (with copious amounts of hints though).

K: You have a keen interest in public affairs as evidenced by your many letters in The Straits Times. How was it that you chose to enter into electrical and information sciences rather than pursue other academic interests?

VT: I was supposed to go into public service as a diplomat or a high school teacher after my undergraduate studies at Cambridge. The Public Service Commission of Singapore sponsored my undergraduate studies and expected me to work in the civil service upon graduation. However, I chose to do research in part because I enjoyed the mathematical content as an undergraduate and hoped to do more as a PhD student and later on as a researcher. Hence I decided to take up the A*STAR PhD fellowship to pursue my PhD at MIT. I am not ruling out a return to the public service in future.

I thought that the talk was very interesting. It’s a little known inequality that the information theory community ought to know about.  Having said that though, this inequality seems to be of limited utility given that  Fano’s inequality can do the job equally well in many scenarios. If Prof. Verdu had given a concrete example on how “Shannon’s inequality” can be used in a “real-life scenario” (and yield better results than Fano’s) then the information theory will stand up and take notice. Otherwise, it’s as good as a homework problem. Come to think of it, it should be made a homework problem (with copious amounts of hints though).