A Carative Approach to AI Governance

December 15, 2022

How goes it Señor? Continuing to share material from talks I’ve given but haven’t necessarily published, here’s what I said at Carnegie Mellon’s Heinz College last month. Titled “A Carative Approach to AI Governance”, it was a refinement of talks I’ve given over the last year. The word ‘carative’ is a contrast to the word ‘curative’ and is related to the holistic nature of nurses caring for patients as opposed to more pointed curing of patients that doctors are trained for. My earlier view on carative AI is in this blog post.


I began with a positionality statement, highlighting:

1. Our par par nana (great great grandfather) Ishwar Das Varshnei being the first Indian to study at MIT in 1904-1905, taking glass manufacturing technology back to India, teaming up with Lokmanya Tilak to start the crowd-funded Paisa Fund Glass Works, and using its factory and school to fight for svarajya, a concept that not only included independence for India from the British, but also equality, liberty, and justice among the people.

2. Our baba (grandfather) Raj Kumar Varshney studying control engineering at Illinois and returning to India to apply the concepts to food production at the Allahabad Agricultural Institute. The institute’s founder Sam Higginbottom was an advisor to Mahatma Gandhi as he sought ways to reduce poverty.

3. My studying electrical engineering, working at a company with artificial intelligence as a focus, and trying to use that technology for equality and reducing poverty.

Further, I contrasted the morally-driven industrialist Henry Heinz (appropriate given the venue where I was speaking) and the muckraking investigative journalist Upton Sinclair in the early days of the processed food industry. I made the case that my worldview shaped by forefathers and foremothers is oriented towards that of Heinz. Heinz tried to work from the inside to improve things constructively while Sinclair worked from the outside to improve things by critically exposing ills.

Control Theory Perspective on AI Governance

Now to governance. A governor is a device to measure and regulate a machine. It is also known as a controller. Thus governance is the act or process of overseeing the control and direction of something. AI governance has lots of different definitions floating around these days, but they usually refer to responsible or trustworthy AI. The definitions may refer to laws and regulations, and may be quite philosophical. I recently heard 2023 predicted to be the “year of AI governance.” Interestingly, the etymological root for governance and cybernetics is the same Greek work kybernetes (cybernetics is an old word for AI-like things).

But instead of dwelling on laws, philosophy, or etymology for the time being, let’s come back to my engineering roots and look at a typical control system such as the one drawn below.

As an example, you can imagine this to be the thermostat in your home. You set a reference temperature that you want the home to be. If the temperature sensor measures too cold, then the controller in the thermostat triggers the furnace to turn on and stay on until the sensor finds the temperature has reached the reference. This is how control engineers view the world. Baba would think in this way not only for tractors as the system, but also fields of crops as the system.

What does the development lifecycle of machine learning look like to a control engineer? Societal values are the reference, data scientists are the controllers, modeling is the system input, the machine learning model is the system, testing is the sensor, facts are the measured output, and the different values and facts represents misalignment. There is necessarily a duality between what is valued and what is measured. Moreover, principles and value elicitation lead to the values, which act as the guiding light for the machine learning model.


Schiff et al. survey AI ethics principles coming out of private industry, governments and civil society — most of which come from economically-developed countries and are based in Western philosophy. They find five common coarse-grained principles are espoused: (1) privacy, (2) fairness and justice, (3) safety and reliability, (4) transparency (which includes explainability), and (5) social responsibility and beneficence. There are some differences across sectors, however. Governments tend to further emphasize economic growth and productive employment as well as including discussions of ‘arms races’ between countries. Private industry mainly sticks to the five common principles and is sometimes accused of only putting out principles for the purpose of ethics washing. Civil society emphasizes shifting power to the vulnerable and may base their principles on critical theory.

Value Elicitation

Using the principles as a starting point, value elicitation is a process to specify the desired behavior of the system. Principles plus the context of the use case for which the machine learning model is being developed lead to values. We can think of four levels of value elicitation:
0. Should you work on this problem?
1. Which pillars of trustworthiness are of concern?
2. What are the appropriate metrics for those pillars of trustworthiness?
3. What are acceptable ranges of the metric values?

At Level 0, the Ethical OS helps us ask whether we should even be working on a problem or whether it is too against our values. Some of the considerations are:
1. Disinformation: does the system help subvert the truth at a large scale?
2. Addiction: does the system keep users engaged with it beyond what is good for them?
3. Economic inequality: does the system serve only well-heeled users or eliminate low-income jobs?
4. Algorithmic bias: does the system amplify social biases?
5. Surveillance state: does the system enable repression of dissent?
6. Loss of data control: does the system cause people to lose control of their personal data and the monetization it might lead to?
7. Surreptitious: does the system do things that the user doesn’t know about?
8. Hate and crime: does the system make bullying, stalking, fraud or theft easier?

There is no right or wrong answer to these considerations. Some people will be comfortable with some and not with others. Elicitation is all about getting those preferences.

At Level 1, assuming that accuracy or other similar predictive performance measures are always important, we ask which among (1) fairness, (2) explainability, (3) uncertainty quantification, (4) distributional robustness, (5) adversarial robustness, and (6) privacy are of concern for the use case. People find it difficult to directly give their preferences about this, so some more appropriate dimensions are asking whether:
1. Disadvantage: decisions have the possibility of giving systematic disadvantage to certain groups or individuals
2. Human-in-the-loop: the system supports a human decisionmaker
3. Regulator: regulators (broadly construed) audit the model
4. Recourse: affected users can challenge decisions
5. Retraining: the model will be frequently retrained
6. People data: the system uses data about people, including possibly sensitive information
7. Security: the data and model are kept behind lock-and-key

The answers to these questions can ground a conditional preference network (CP-net) that eventually determines which pillars of trustworthiness are important for a given use case as follows.

There’s a recent extension of CP-nets called SEP-nets that could even go as far as relating the context of use cases to the appropriate dimensions to get to the pillars of trustworthiness.

Level 2 elicitation for specific quantitative metrics is interesting. For example, as it is well known, there is a boatload of fairness metrics. Elicitation could be done by having a person ponder about their use case in terms of worldviews. An alternative, recently proven out in practice, is to do metric elicitation by having people compare confusion matrices in a pairwise fashion. Level 3 elicitation for acceptable metric value thresholds or ranges is hard, and I don’t think there are the greatest of proven methods yet. In my book, I suggest creating many different models to visualize what is possible and also posing this as a variation on a trolley problem. It is also not fully clear how to elicit values from a group of people since typical aggregation methods (e.g. voting) drown out minority voices. Perhaps facilitated participatory design sessions are the only decent way.

Testing and Facts

Machine learning testing is different from regular software testing because of the oracle problem: we don’t know what the right answer is supposed to be. The way around it is through metamorphic relations: feeding in two different data points for which we don’t know what the right answer is supposed to be, just that it should be the same for both inputs. Generating test cases that have good coverage is very much an open problem, and not one solved simply through adversarial samples. Testing should be done not only for accuracy, but for all of the metrics from Level 2 value elicitation and accompanied by uncertainty estimates.

Test results are also called facts in the parlance of the AI FactSheets initiatve. They can be rendered out in a factsheet that allows for good comparison with values and permits understanding of misalignment. Different consumers of factsheets may require different amount of detail. The system developers can declare that their facts conform to the values with a kind of factsheet known as a supplier’s declaration of conformity (SDoC).

Something’s Missing

Control theory, pillars of trust, quantitative test results, acceptable ranges of metrics and all that jazz is great — it has the flavor of consequentialism. In fact, the consequentialism vs. deontology (outcomes vs. actions) debate is the framing of and one of the main debates in AI ethics. For example, we explored it in the context of the Pac-man game a few years ago.

However, this debate (and AI ethics more generally) is missing something. Consequentialism vs. deontology is a very Western philosophy-centered debate. Birhane et al. ask why we should think that Western philosophy captures everything. They further say “The field would benefit from an increased focus on ethical analysis grounded in concrete use-cases, people’s experiences, and applications as well as from approaches that are sensitive to structural and historical power asymmetries.” To me, they are pointing to care.

Lokmanya Tilak (the person our par par nana teamed up with to start the glass works) based a lot of his political philosophy on the Bhagavadgītā, with a focus on one aspect of it: niṣkāma karma, or desireless action. But there’s a second-order reading of the Bhagavadgītā that goes beyond this. Carpenter writes: “Kṛṣṇa speaks in terms of duties—duties whose claim on us cannot be over-ridden by any other sort of consideration. I want to dispute this characterisation of Kṛṣṇa, firstly, to contest the interpretation of the niṣkāma karma (desireless action) principle on which the charge of deontology rests. But I want to dispute it also, and more importantly, because I think Kṛṣṇa’s moral voice is rather more rich and interesting than our classifications of ‘deontological’ and ‘consequentialist’ (even broad consequentialist) allow.”

Furthermore, she says that “Arjuna is not the exclusive author of his own law; the social order into which he was born, the place he was born into, the endowments with which he came to it, and even his personal history (where this refers only very partially to his own choices), wrote a ‘law’ just for him.” She says that the main message of Kṛṣṇa’s philosophy is svadharma, one’s personal duty based on station, reputation, skill and family — all of these things together are svabhāva. (The sva- prefix is the same as in svarajya, self-.) This is the context and grounding Birhane et al. speak of that arise in other philosophies. There is no one universal, generalizable, abstract law or duty for everyone. This perspective is different from both consequentialism and deontology as it is seeking right outcomes and right actions based on context and abilities. It is also different from moral particularism, because there is still some sense of right and wrong — it is not a complete free-for-all.

Carative Approach to AI

What is AI’s svadharma? The goal of machine learning practitioners is usually generalization and abstraction. We usually want to do the same exact thing without thinking about the differences between decisions in lending, college admissions, hiring, parole, etc. They are all just binary classification problems and we use the same algorithms. We don’t bring in context, so there is no svabhāva and there is thus no svadharma.

To bring in svadharma, what we need instead is a carative approach to AI. We must start with the real-world problem as experienced by the most vulnerable, listen to them and understand their values (this is the context), meet them where they are and work toward a solution to their problem all the way to the end, and conduct a qualitative assessment of the entire solution by interviewing the affected communities. We did precisely that in a project with Simpa Networks on scoring applications for home solar panels in rural Indian villages, and it worked.

This is also the paradigm of action research. In (1) defining the problem, (2) planning how to solve the problem, (3) acting upon the solution, (4) evaluating the solution, and (5) learning and generalizing from it, action research centers the values of members of marginalized groups and includes them as co-creators. In caring professions like nursing, practitioners do the first four steps, but usually not the fifth. That’s the research, and it is not against the spirit of having one’s own duty. We should learn to figure out best practices on pillars of trust, metrics, ranges of metric values, etc. from working on real problems. Although this amounts to an uncountably infinite variety of AI contexts, use cases, and problems, we can pick a few prototypical examples across industries and sectors, and generalize from them, perhaps using SEP-nets.


AI governance is a control problem, but it does not make sense without context. We have been blinded by the use of Western philosophy as a starting point; maybe neither consequentialism nor deontology are the right way to look at precise and grounded applications of a general-purpose technology like AI. A specific AI system’s svadharma comes from its lineage, its creators, its capabilities, and especially its context for use. Carative AI governance implies a focus on the entire development lifecycle, not just modeling, and making choices that center the most marginalized throughout.


Do Asian-American Machine Learning Researchers Work on Social Justice?

December 2, 2022

Sat Sri Akal, Señor Bhagat Singh Thind. Continuing with posts summarizing talks that are not associated with published work, let me describe what I said to the Asian Coalition of Professionals (in a joint colloquium with the math department) at SUNY Albany in October.

The talk actually had two titles: “On the Intersection of Machine Learning and Society” and “Do Asian-American Machine Learning Researchers Work on Social Justice?” I had originally submitted the latter title to the Albany hosts, but was recommended to go with something like the former. The former ended up being a hook to get a broader audience in the room.

After a general introduction to the history of AI, I laid out this taxonomy of machine learning research and its intersection with society. The four leaves are: (1) AI for social good, (2) ethical principles, (3) metrics and algorithms, and (4) feminism, decoloniality, etc.

As an example of (1), I described my project on using satellite imagery to estimate poverty in western Kenya. I also explained how computer scientists and engineers tend to be oriented toward abstraction, scaling, and optimism, whereas social workers tend to be oriented toward accompaniment, theory of change, and critique.

For (2), I overviewed Schiff et al.’s analysis of AI ethics principles amounting to 5 common coarse-grained ones: (a) privacy, (b) fairness and justice, (c) safety and reliability, (d) transparency, and (e) social responsibility and beneficence. Moreover, governments tend to emphasize economic growth and productive employment as well as engage in ‘arms races’; private industry sticks to the five common principles and is sometimes accused of ethics washing; civil society emphasizes shifting power to the vulnerable and has some foundation of its principles in critical theory.

To illustrate (3), I talked about our bias mitigation pre-processing algorithm for improving fairness in machine learning. I also described the AI Fairness 360 open source toolkit and the Enhanced Edition for AI Fairness 360 that contains additional algorithms that IBM can license to customers.

Finally, (4) set up the remainder of the talk. I flashed this photoillustration from Karen Hao’s Technology Review article entitled “Inside the Fight to Reclaim AI from Big Tech’s Control”:

which poses the achievement of ethical AI as a power struggle, and also flashed a screenshot from Kathy Baxter’s Ethical AI Maturity Model, which suggests that ethical AI in an organization begins with “woke” teams. This leaf represents a fight for social justice.

Coming back to my original title: Which machine learning researchers are fighting for social justice in their work? The photoillustration above does not contain anyone of an Asian ethnicity, so should we think there aren’t any doing so? It is also important to note that it is reckless to try to make general statements about many different people and peoples with many different lived experiences.

A couple of years ago, Yu Tao and I had conducted an empirical study that showed that Asians are more focused on technical-only machine learning (in contrast to research at the intersection of machine learning and society) compared to Hispanic, Black, and White machine learning researchers. To do so, we collected all 71,605 arXiv papers from March 1997 to September 2020 having to do with machine learning and analyzed the set of 103,094 unique authors — 99,460 of whom had technical-only papers, 1,904 had only papers at the intersection of machine learning and society, and 1,730 had papers of both kinds. We estimated the race/ethnicity and gender of the authors using ethnicolr and genderize.io. (Yes, this kind of inference has limitations.) We found that not only Asians, but also males are significantly (Cochran-Armitage trend test) more focused on technical-only machine learning research. Moreover, within the Hispanic, Black, and White race/ethnicity groups, females were more likely to study the intersection of machine learning and society than males. However, the situation was completely reversed among the Asian race/ethnicity group. Female Asian machine learning researchers are less inclined towards societal topics than male Asian machine learning researchers.

Why is that the case? Such data analysis cannot tell us. It cannot uncover the reasons for individual researchers’ motivations. The analysis also did not disentangle the different leaves of machine learning + society, and did not disambiguate Asian-Americans from Asians outside the US. So last year, Yu, April Wang, Dakuo Wang, and I interviewed more than fifteen female Asian-American machine learning researchers to try to find out. We haven’t fully finished the coding of the semi-structured interviews and done the analysis, but let me present some hypotheses.

  • You may be part of a group that has faced marginalization and want to fight for justice for your group.
  • You may be “woke” and believe in fighting for justice for all.
  • You may find the research at the intersection of machine learning and society to contain interesting problems.
  • You may believe that such research is a path to career advancement and funding.
  • You may view conducting research at the intersection of machine learning and society to be a service to society (broadly construed).

The immigration story of the vast majority of Asian-Americans in the United States begins after 1965.

In this group, many Asian-American machine learning researchers had some academic or social capital (and occasionally monetary capital) with which they launched their lives. Despite overt and covert discrimination (which may be discounted by the community), activism and fighting for justice has not been common. So it seems unlikely that the first hypothesis is true. However things are starting to change, especially with respect to educational opportunities and fighting Asian hate during the Covid-19 pandemic.

It is an interesting question whether Asian-Americans are woke or not, and what influences that.

The intersection of machine learning and society does certainly lead to lots of nice problems to work on, and is starting to lead to best paper awards, grant opportunities, and the like. So it could be a plausible reason for Asian-American researchers to pursue such research.

Service to society is seemingly part of Asian culture, including in diasporas. And this may be what motivates the small numbers of Asian-American researchers that pursue it. Interestingly, Wang et al. write that Chinese “families believe that men should engage in social work with higher status, while women’s duty is to take better care of the family” so could that be a reason that Asian men are more likely to be working on the societal aspects of machine learning than Asian women?

Stay tuned for a paper by Tao, Varshney, Wang, and Wang for further discussion and analysis of our interviews.


Three Levels of AI Auditing

November 29, 2022

The phrase hook, line, and sinker usually refers to fooling or deceiving someone, but I don’t think it has to. It can also just mean convincing someone thoroughly. The question of what an AI audit should precisely be, especially auditing for equity, is receiving greater attention. In my mind, there are three levels of AI auditing, just like the hook, line, and sinker. The hook is a journalistic approach with an individual attention-grabbing, persuasive narrative example that is easy to grasp, but may or may not reveal a systemic problem. The second level is an outside-in study done of a system that may reveal a pattern, and it might not even be exactly related to the first level hook. The third level of auditing requires access to the internals of the system and could be quite detailed. The decision maker doesn’t even have to be AI; human decision making can be audited in the same ways.

Let’s look at a few examples to get a better sense of what I mean.

Gender Shades

The first-level audit in Joy Buolamwini’s Gender Shades work was her using a white mask to show how a face tracking algorithm didn’t work well on dark-skinned faces.

Her second-level audit was creating the small Pilot Parliaments Benchmark dataset and using it to report intersectional differences in the gender classification task of commercial face attribute classification APIs.

She didn’t have access to the internals, but with access to some of the embedding spaces used by the models, we did a third-level analysis. We found that skin type and hair length are unlikely to be contributing factors to disparities, but it is more likely that there is some mismatch between how dark-skinned female celebrities present themselves and how dark-skinned female non-celebrities and politicians do, especially in terms of cosmetics.

Apple Card

The discovery of gender bias in the Apple Card began with a single example reported in a tweet, and it hooked a lot of people.

Enough so that the New York State Department of Financial Services launched a detailed third-level investigation, eventually exonerating Apple Card and Goldman Sachs, the financial firm that backed it.


ProPublica understood these levels as they disseminated the findings of their famous study on the COMPAS algorithm for predicting criminal recidivism. The main article included both the hook of individual stories, like those of Bernard Parker and Dylan Fugett, as well as statistical analysis of a large dataset from Broward County, Florida. The more detailed articles of the first level and second level were published alongside.

Northpointe (now Equivant), the maker of COMPAS did their own analysis to refute ProPublica’s analysis on the same data, so still second-level. (The argument hinged on different definitions of fairness.) The Wisconsin supreme court ruled that COMPAS can continue to be used, but under guardrails. I don’t think there has ever been a third-level analysis that breaks open the proprietary nature of the algorithm.

Asylee Hearings

Reuters did the same thing as ProPublica in a story about human (not AI) judgements in asylum cases. A first-level part of the story focuses on two women: Sandra Gutierrez and Ana, who have very similar stories of seeking asylum, but were granted and not granted asylum by different judges. A second-level part of the story focuses on the broader pattern across many judges and a large dataset.

Given that all of the data is public, Raman et al. did a third-level in-depth study on the same issue. They found that partisanship and individual variability among judges have a strong role to play in the decisions, without even considering the merits of a case. This is a new study. It will be interesting to track what happens because of it.


There are many other examples of first-level audits (e.g. an illustration of an object classifier labeling black people as gorillas, a super-resolution algorithm making Barack Obama white, differences in media depictions of Megan Markle and Kate Middleton, language translations from English to Turkish to English showing gender bias, and gender bias in images generated from text prompts). They sometimes lead to second- and third-level audits (e.g. image cropping algorithms that prioritize white people and disparities in internet speed), but often they do not.

So What?

Each of the three levels of audits have a role to play in raising awareness and drawing attention, hypothesizing a pattern, and proving it. To the best of my knowledge, no one has laid out these different levels in this way, but it is important to make the distinction because they lead to different goals, different kinds of analysis, different parties and access involved, and so on. As the field of AI auditing gets more standardized and entrenched, we need to be much more precise in what we’re doing — and only then will we achieve the change we want to see, hook, line, and sinker.


There is No Generic Algorithmic Fairness Problem

November 28, 2022

Hello Lav. I know you’re a proponent of block diagrams and other similar abstractions because they permit a kind of elegance and closure that helps us make progress as scientists. I’m all for it too, except when we need to take that progress all the way down to applications that have very contextual nuances. For example, I’d be very happy if these kinds of diagrams of algorithmic fairness and bias mitigation (drawn by Brian d’Alessandro et al. and Sam Hoffman, respectively) were all we needed, but let me talk through how context matters. (This will be a discussion different from choosing the most appropriate fairness metric, which has its own nuance, and from asking whether fairness should even be viewed as a quantitative endeavor.) This was part of a presentation I made for NIST in August.

Let’s go through a series of 7 real-world examples.

The first is clinical prediction of postpartum depression and its fairness across race. This one is almost as generic as you can get because the protected attribute is clear, it is a typical machine learning lifecycle, and so on. But the nuance in this application is the importance of focusing only on the riskiest patients (top decile). Someone who does not know what they are doing might just classify patients at above or below the mean or median risk.

The second is skin disease diagnosis from medical images. Here the nuance is that although the protected attribute is somewhat clear (skin type), it is not given with the dataset. The images have to be segmented so that only healthy skin is used to estimate a skin color, which is then grouped according to individual typology angle (ITA). The ITA and its groupings are itself not without controversy.

The third is legal financial obligations (fines and fees) in Jefferson County, Alabama. Here the fairness does not have to do with machine learning or artificial intelligence-based predictors, but on analyzing human decision-making to discover and finely characterize bias issues. All the analysis happens before the d’Alessandro and Hoffman block diagrams.

The fourth is the Ad Council’s “It’s Up to You” campaign for Covid-19 vaccine awareness and wanting all people to receive the messaging equally effectively. In this application having to do with targeted advertising, the labels are highly imbalanced so that a typical classifier would just always predict one of the classes, which also doesn’t really allow for bias mitigation. Here, the class imbalance has to be dealt with first before dealing with algorithmic fairness.

The fifth is “bust or boom” prediction in ESPN’s fantasy football site, where the predictions for individual players have unwanted bias with respect to their team membership. The nuance here is that the fairness component (AI Fairness 360 in this case) is just a tiny part of the overall, highly complicated system architecture, and cannot just be thrown in haphazardly.

The sixth is in predicting the need for preventative health care management while using health cost or utilization as a proxy, and the racial bias against blacks in the United States it yields. This problem became quite well-known due to a study by Obermeyer et al. However, being more nuanced and splitting up different health care costs as proxies (in-patient, out-patient, emergency) yields much less racial bias, whereas a typical data engineering step is to add together all health costs into a single variable.

The seventh is in child mortality prediction in sub-Saharan Africa, where there may be bias in prediction quality across countries. The nuance in this problem is that it also has a significant problem with concept drift over time, so that bias is not the main issue. If drift and bias are not disentangled, then the results will not be meaningful.

Studying a genericized fairness problem is a good thing to do from a pedagogical perspective, but it is only the starting point for working on a real-world problem with its entirety of context.


Consent, Algorithmic Disgorgement and Machine Unlearning

November 28, 2022

Good afternoon Señor Greg Rutkowski. I’ve been giving some talks recently that are not associated with published work. I think it might be useful to have some of that content posted online, so here we go. This one is from a presentation I gave to the Future of Privacy Forum in September.

There is growing interest in the public policy world about algorithmic disgorgement: the required destruction of machine learning models that were trained on data that they weren’t supposed to be trained on. This interest stems from the Federal Trade Commission ordering Weight Watchers to destroy a model trained on data from unconsenting children users of a healthy eating app in March.

To understand the concept and its implications, there are three relevant facts:
1. training machine learning models can take weeks;
2. machine learning models contain imprints of their training data points, which can be extracted using clever techniques; and
3. deleting training data once a model has already been trained is not useful.

I had not heard the term algorithmic disgorgement before I was asked to do the presentation. As I was doing my research, I came across the article “Algorithmic Destruction” by Tiffany C. Li. One of the nice quotes in that article is “What must be deleted is the siloed nature of scholarship and policymaking on matters of artificial intelligence.” To this point, even though I was speaking to policymakers, I did not want to shy away from a little bit of relevant math to make sure they understand what is really going on. It would be a disservice to them otherwise.

The derivative of a function f′(x) tells you its slope. The slope is known as the gradient for multi-dimensional functions ∇f(x).

If you want to get down from the summit of Mount Everest the fastest you can, you always want to keep going down the steepest part that you can. That is known as gradient descent.

Machine learning models are mathematical functions and many machine learning models are trained using a version of gradient descent.

Specifically, an algorithm takes a labeled training data set {(x1,y1),(x2,y2),(x3,y3),…,(xn,yn)} and produces a model by minimizing a loss function L(f(x)) by performing gradient descent on the loss function. The labeled training data set may be historical loan approval decisions about real people made by loan officers. When you’re taking small step after small step walking down Mount Everest, you can imagine that it takes a long time. Similarly, despite advances in algorithms and hardware accelerators, it can take weeks to train a large model.

Different algorithms have different loss functions, which means that they have different Mount Everests underneath and yield different models. One kind of algorithm is a neural network. Depending on the algorithm used to train them, models can have a little or a large imprint of the training data. Some of the models below are really jagged and bend around individual data points. It is in these models that the imprint is large.

Through sophisticated methods known as model inversion attacks, it is possible to get a good idea of what the training data points were, just from the model. And how do model inversion attacks work? Why of course by gradient descent! (GD in the picture below.) They’re able to figure out a training data point by taking small steps toward what the model is confident about.

Once we have a trained model, we’re in a ‘fruit of the poisonous tree’ situation. The training data is the tree and the model is the fruit. Cutting down the tree — deleting the training data — does not help us remove the poison from the fruit — exclude the imprint of tainted training data points.

All these facts seem to imply that if a data point should not have been in the training set, then to guarantee its information cannot be retrieved, we must re-train the model from scratch without the data point in question, which may be computationally unreasonable.

But not so fast my friends! There is a new category of technical approaches known as machine unlearning that can come to the rescue. They are ways to get a new model equivalent to training from scratch without an objectionable data point, but in a way that does not involve having to compute too much. There are two main approaches:
1. being smart about structuring the training process (like a Chinese wall) so that only a small piece of the model has to be retrained, and even then, only from very close to the bottom of Mount Everest; and
2. using gradients!

In the gradient-based approach, you can figure out the influence of specific (tainted) training data points by tracing back how they created the underlying Mount Everest or loss landscape, and zeroing out their influence without having to retrain the model at all. (We have a paper that Prasanna is presenting tomorrow at NeurIPS that does something similar, but for training data points that lead to unfairness, rather than ones that have a consent issue.)

To the best of my knowledge, for all the policy discussion happening around algorithmic disgorgement, no one had connected the problem to the technical solution of machine unlearning. Similarly, in the burgeoning technical research direction of machine unlearning, no paper I am aware of uses the specific policy issue or terminology as a starting point. We need more people to heed Tiffany C. Li’s call for breaking the silos between policymaking and AI scholarship. It might also be a good idea to have some standardized traceable protocols for how objectionable data points are tagged and then have their influence removed from models.


Special Place

November 6, 2022

Good day sir. In my previous post, I referred to a couple of recent papers by Chetty et al. that study economic mobility and connectedness in the context of social capital. One of the ‘al.’ in the author list of those papers is Monica Bhole, who was a little younger than us growing up in the Indian community in Syracuse. This community was a special place (and not because it is the birthplace of the Indian-American speller who initiated a long chain of Indian-American winners of the national spelling bee and birthplace of the first Indian-American Miss America). The nurturing we had in this diaspora community and its effect are not captured in the Chetty et al. kind of studies because they miss out on a milieu that is not simply pairwise friendship relations and because they are limited to income as a success variable. Pioneers thrown into the deep end of a foreign country and culture, and having only each other to rely upon is a unique experience. I think you’ve found some of this unique experience in your cohort of White House Fellows, and I gleaned that this has been a positive for you.

I kind of spoke of this in late August at the memorial service of Mrs. Ashutosh at Drumlins:

This is an event I am sad to attend because of the loss that it represents. But it is also an event I am happy to attend because it gives us all a chance to reunite as an extended family. This diasporic extended family that gave us all the encouragement to become who we were supposed to become, authentically. Poetry, drama, argument, criticism, rhythm, melody, roots, knowledge of the world, and even silent reflection have always been around us because of the tone that auntie set.

But I just as easily could have been speaking about any of the women we have had in our lives, including Mamma, who we lost in May. I was talking with Sunita about this later at Drumlins: the trope of an inauthentic, gossipy, Indian auntie drawn to conspicuous consumption is far removed from our experiences. We have known strong, honest, conscientious, hard-working, interesting, beautiful aunties. We need to tell these stories and share the experiences that shaped us and our values. Journaling and blogging are known to be therapeutic, and I will start doing some more of it now.



October 28, 2022

Hello Mr. White House Fellow. It was nice visiting you in the Washington, DC area a few weeks ago while I was attending EAAMO. I hadn’t expected so many papers there to be focused on school choice and the drawing of attendance boundaries, which are very American problems that have a strong role to play in structural inequity. A series of studies by Chetty et al. show that the friends and neighbors of the family in early life have a very strong role to play in your future earnings — much more than your own family characteristics. (Income is not the best proxy for success, but it is something.) Parents want to give their kids the best opportunity they can, and the peer group is the biggest thing they can do. The introduction to the page linked above on school choice states that “many still choose their schools by buying a home in their desired district.” That was the case for Sonia and me 22 months ago when we moved into a new home, and I think it was the case for Mamma and Papa when you and I were in early elementary school and we moved into a new home.

There are many similarities in the situation of the two homes. Both are in neighborhoods next to a small body of water (Hardscrabble Lake; Snooks Pond) off of a long-named, hilly and wooded road (Hardscrabble Road; Woodchuck Hill Road) that connects a state route (120; 92) and an unnumbered ‘ville’-ending artery (Pleasantville Road; Jamesville Road). They are both in academically-strong, predominantly white, affluent for their county, New York State public central school districts (Chappaqua; Fayetteville-Manlius) having 3 elementary schools (grades K-4), 2 middle schools (grades 5-8), and 1 high school (grades 9-12). Both are located near the boundary of the district and closer to the high school of a different smaller school district (Briarcliff Manor; Jamesville-Dewitt). They are within the attendance boundary of the one elementary school of three (Roaring Brook; Mott Road) that divides its attendance between middle schools and goes to the older (built 1928; 1932) middle school with neo-classical architecture (Robert E. Bell; Wellwood).

For all the similarities, however, there are differences that I’ve been observing over the last 22 months. Whereas the famous coach who lives in our Fayetteville neighborhood coached college lacrosse (Roy Simmons, Jr.), the famous coaches who lived in our Chappaqua neighborhood coached professional basketball (Stu Jackson, Jeff Van Gundy). On the sidelines of kids’ soccer games, parents in Chappaqua talk about which swimming and tennis club they’re joining. There are a lot of business consultants and finance guys. The school board takes retreats to the luxurious Mohonk Mountain Resort. Despite both districts being filled with mostly upper middle class people (the top 20%), the income distribution is quite different and I think it makes a difference.

Chappaqua Central School District Economic Statistics (from censusreporter.org)

Chappaqua Central School District Economic Statistics

Fayetteville-Manlius Central School District Economic Statistics (from censusreporter.org)

Fayetteville-Manlius Central School District Economic Statistics

Chetty et al.’s work makes clear that at the very top of the socioeconomic status (SES) scale compared to any other regime, “the highest-SES individuals tend to have particularly high-SES friends.” Like in J. M. Barrie’s Peter Pan, it bears out that “Mr. Darling had a passion for being exactly like his neighbours”.

At Mott Road, our music teacher Mrs. Clark with the help of other teachers and parents put on a musical production of Peter Pan for which Mamma challengingly sewed the crocodile costume, and all the other costumes were just spare clothes of various kinds contributed by families. This year at Roaring Brook, my kids are participating in a production of Peter Pan as well, but it is being directed and produced by outside professionals with professional costumes, lighting, etc. and we have to pay a not insignificant fee for them to participate. To me, this kind of approach foments greater class hierarchy and divides. I’m sure I was sheltered, naïve, and not woke, but I didn’t feel divides being fomented as much in Fayetteville as in Chappaqua. (“We ought to use the pluperfect and say wakened, but woke is better and was always used by Peter.”)

Now I don’t see other options as an individual actor, being where we are locationally and with present day structures in place, wanting to give my kids the best chance they can have. Not letting the kids take the opportunity we have put in front of them would be a disservice to them. Despite valuing some amount of equity, I have seemingly cast a hypocritic die as a card-carrying ‘dream hoarder‘. Unlike anyone else in the neighborhood, I do my own lawn-mowing, weed-whacking, and hedge-trimming with the hope that my kids see my actions and don’t end up being very entitled, but I don’t think that will do much. I have to let them fly like Wendy, John, and Michael Darling, but I have to keep them grounded in some other way.


How and Why I Independently Published A Book

May 9, 2022

Good afternoon Señor Horace Greeley. Many people have asked, so I’d like to recount how and why I independently wrote and published the book entitled Trustworthy Machine Learning, which is available for free in html and pdf formats at http://www.trustworthymachinelearning.com and as an at-cost paperback at various Amazon marketplaces around the world (USA, Canada, UK, Germany, Netherlands, Japan, …).

Why I Wrote A Book

Writing a book is a big effort and a big commitment, so why do it? Just like you shouldn’t do a startup company just to be able to say you did a startup, it can’t be just because you want to have written a book. It has to be because you have something unique to say that the world needs to hear, and it is just bursting out of you.

I’d had the nondescript want for a book for a long time. But three years ago, I felt that there was something I needed to say. That was my approach and worldview for doing data science and machine learning that I had honed over a decade in an environment that few others experienced. And it felt like the deep learning revolution was missing some important things. I was ready to speak.

How It Started

In May 2019, I flew to Madrid to represent Darío at Fundación Innovación Bankinter’s Future Trends Forum. That trip was the only time in my life I’ve sat in business class and it was fortuitous because it happened mere days after I had a painful back spasm. After the meeting concluded, I had a few hours to kill before proceeding onwards to Geneva for the AI for Good Global Summit. Instead of risking my back with any tourism, I sat in a park (the thin green area on the map) and wrote down an entire outline for the book I was imagining. That outline ended up being close to that of the eventual finished product. Look below for exactly what I typed into the notes app of my phone that afternoon.

 Age of Artificial intelligence 
    General purpose technology
 Overview and Limitations 
   Limitations of book
   Biases of author
     Diverse voices

 Detection theory
   Confusion matrix
   Bayesian detection
   Robust (minimax) detection
   Neyman-Pearson detection
   Chernoff-Stein, mutual information theory, kl divergence
 Directed graphical models

 Finite samples
   Administrative data
   Temporal biases
   Cognitive biases/prejudice (quantization)
      Quantization only by words so don't have to introduce quantization and clustering
   Sampling biases
   Causal basis included

Machine learning
 Risk minimization
 Decision stumps
    Trees, Forests
    Margin-based methods
    Neural networks
 Adversarial methods
 Data augmentation
 Causal inference
 Causal discovery

 Epistemic uncertainty in machine learning
 Distribution shift
 Adversarial robustness
   (Causal foundations included in each pillar)

 Explainability and interpretability
   Direct global
   Distillation / simple models
   Post hoc local
 Value alignment
   Unified theory
   Preference elicitation
   Specification gaming

 Professional codes
 Lived experience
 Social good
   Types of problems with examples
 Open platforms 

Summer and Fall of 2019

Once I was back from Europe, the summer was upon us and that meant having our social good student fellows with us and their projects in full steam. That, along with my other work, also meant days full of meetings: a manager’s schedule rather than a maker’s schedule, so I didn’t do anything further on the book all summer. Here is my calendar on one of those summer days (and this wasn’t atypical).

In the fall of 2019, I had the honor of spending three months at IBM Research – Africa, in Nairobi, Kenya. Because of the time difference, I made myself only available for meetings 8 am to 11 am Eastern, which often meant entire mornings (East Africa Time) with no meetings (except for the nice conversations with the Africa lab researchers). Even though I thought I could use that time to start writing the book, I didn’t. Instead, the sabbatical turned out to be a great time to recover and recharge (while also doing some stuff on maternal, newborn and child health). Recovery is underappreciated.

Starting to Write

Back home, and with my calendar still mostly bare, I blocked off 90 minutes for writing every day starting on January 2, 2020. I started getting into a flow and put some words and equations down on paper (really this Overleaf). I made good progress on an introduction chapter and a detection theory chapter.

Then in mid-February, Bob Sutor stopped by my office and said that an acquisitions editor for the publisher he worked with on Dancing with Qubits was looking to publish a book on responsible and ethical AI, and connected me with Tushar. Coincidentally, the same week, an acquisitions editor for Manning Publications emailed me cold about my possible interest in writing a book. I had good conversations with both editors and I was naïvely happy at the perfect confluence of events.

I filled out book proposals for both companies. Here is the one I did for Packt:

and here is the one I did for Manning:

I was completely honest in explaining what I wanted to do (mix of math and narrative), who it was for, and so on. I even sent over the couple of chapters I had already written. Both publishers were happy and accepted my proposal. Both made very similar offers in the contractual terms, which wasn’t particularly important for me because I wasn’t doing this for the money. Manning had an early access program through which readers could access chapters as they were being written (which is what I wanted and also why I had made the Overleaf open when I was writing the first two chapters), so I decided to go with them. I signed on the dotted line on March 17, 2020.


Things did not go as I thought they might. Everything had shut down a week earlier because of the Covid-19 pandemic, and the shutdown did not abate in any way. I was sitting on a dilapidated sofa in my basement trying to complete other work, taking the kids outside to kick a soccer ball around once in a while, and plotting out how to get scarce groceries — not exactly conducive to writing. Certainly no more 90 minute blocks of time daily.

More turbulent than that, however, was the publisher trying to shoehorn me into what they wanted. My proposal was very clear that the book would have a decent amount of math and no software code examples, would be a tour of different topics, and would be centered on concepts. But that didn’t seem to matter once things were underway. As I soon learned, Manning religiously follows Bloom’s taxonomy, and understanding concepts is very low on the totem pole. As instructed, I doggedly kept trying to push my text higher in the taxonomy, but it was mostly a farce to me, where I would just use the word “sketch” or “appraise” while still saying what I was going to say. I was also ruthlessly trying to reduce the math at their insistence. For example, the chapter on uncertainty as a concept morphed into evaluating safety.

There was a lot of back and forth, and a lot of frustration. Eventually, on February 16, 2021, the book was available for sale in the $40-$60 range through the early access program with the first four chapters available. We celebrated. I got a lot of positive feedback from people I know.

But the turbulence didn’t calm down. More Bloom, less math, and less of myself. I am not someone who uses the word “grok“. I didn’t want this to be a prescriptive recipe book because I don’t believe that that is what trustworthy machine learning is all about.

The book reached 320 sales by the time the first 12 chapters had been posted, which in my opinion is pretty darn good for something that is not even complete and with an underwhelming marketing effort.

Then came an ending and a rebirth. On September 10, 2021, the acquisitions editor reached out and said that the publisher would be ending the contract and the rights to the content would revert back to me. I guess the sales weren’t what they needed and the content continued to be mismatched from the desires of their typical buyers. This turn of events ended up being more of an emotional relief than anything else.

Did the book improve because of all that back and forth? On balance, I’d say yes. So no hard feelings.


I am not one to leave things unfinished, and I wasn’t going to let the ending of the contract hold me back from finishing the manuscript that I had toiled on for so very long at that point. I vowed to complete the whole thing by the end of the calendar year. In less than 4 months, I wrote the remaining 6 chapters: an unbridled pace much faster than what I had been doing before.

I didn’t think much about what the route to get it out would be in September or October. Tushar reached out and offered to bring it to market through Packt, but I just wanted to focus on finishing it. And I did, on December 30!

By that time, I had made up my mind to post it online with a Creative Commons license to begin with. I created the website http://www.trustworthymachinelearning.com and posted a pdf of version 0.9. I quietly spread the word and kept getting a lot of positive response from acquaintances.

Independently Published

While a diverse panel I had assembled was giving version 0.9 a look over and providing feedback, I did a bunch of soul-searching on what this book was for and why I was doing it. I also pored over what people had written about self-publishing in today’s age. I clearly wasn’t in it for the money — I was more than happy for anyone in the world to learn from it without paying. In fact, empowering people, no matter their station in life, is one of the messages of the book. I wanted its message to ring far and wide.

While everyone has a little vanity in them, like I said at the beginning of this post, I hadn’t written the book just to have written a book. This was also not a book aiming for some kind of book award. I wasn’t going to be using it for an academic tenure or promotion case, or any other stamp of approval. I didn’t want IBM to be involved in any explicit way (Manning had actually sought that out through a sponsorship deal). I enjoy doing a little formatting and aesthetic stuff here and there, and copy-editing. The previous experience hadn’t shown me that a publisher would necessarily do the right kind of marketing. Kindle Direct Publishing is really easy, doesn’t require any capital investment, and has very wide reach.

Putting all of that thinking together, despite not having heard of others in my orbit doing it before, I decided to independently publish the book. It has been up on Amazon since February 16, 2022 at the lowest possible price that Amazon allows for covering their costs. I’ve been very happy with my decision. It suits me and my worldview.


That very day, February 16, I made a social media push about the book, and that very night, I received this very kind email from Michael Hassan Tarawalie:

Dear sir, 

It is an honor to come in contact with you, sir. Am a student at the electrical and electronic department, faculty of engineering Fourah Bay College, University of Sierra Leone.

Sir your book has helped me.

One of the very first citations to the book was in the influential report by NIST entitled “Towards a Standard for Identifying and Managing Bias in Artificial Intelligence”.

There have been several great reviews of the book on Amazon from people I don’t know. It has become almost a cottage industry for people to hold up their copy of the paperback in large meetings I attend on Zoom and for others to post photos holding their copy on social media.

As of today, 481 copies of the book have been printed and shipped across the world in less than 3 months. Even though I’m not tracking it, I’m sure lots of people have accessed the free pdf and used it to uplift themselves.

This is what I wished for.

It always seems impossible until it’s done.

Nelson Mandela

The Pandemic Bandwagon

March 21, 2020

A Wuhan-shake to you señor. Hope you’re doing alright with the shelter-in-place order for Santa Clara County and California. We’re on pause here in New York.

Yesterday morning, I was all psyched up to do a blog post and accompanying twitter thread on 12 Data Science Problems in Mitigating and Managing Pandemics Like COVID-19 that would go through several issues related to the crisis that have some avenue for data science (broadly construed) to contribute to. I’ve been glued to twitter the last few evenings and a lot of different people have been posting various things. I have things to share, I thought, so why not me?

A wise person asked me to reflect on whether it would be a sensible thing to do. She emphasized that “there are so many people who are jumping on the bandwagon trying to help. Some mean well while some are capitalizing on the situation. And of those that mean well, some are offering silly things.” As you’ve told me on occasion, Shannon was wary of the bandwagon as well, and much preferred the “slow tedious process of hypothesis and experimental validation.” He noted that “a few first rate research papers are preferable to a large number that are poorly conceived or half-finished.” What would he have said to streams of consciousness offered up 280 characters at a time? Adam Rogers wrote yesterday afternoon that “the chatter about a promising drug to fight Covid-19 started, as chatter often does (but science does not), on Twitter.”

I woke up this morning wishing for science, not chatter. I realized that I am not among “men ages 24–36 working in tech” predisposed to “armchair epidemiology.” I turned 37 a whopping five months ago!

Rogers continued: “Silicon Valley lionizes people who rush toward solutions and ignore problems; science is designed to find solutions by identifying those problems.”

So lets talk about problems and how run-of-the-mill data scientists working in isolation, both literally and figuratively, usually lack the requisite problem understanding to make the right contribution.

In dealing with global disease outbreaks, such as the ongoing novel coronavirus pandemic, we can imagine four main opportunities to help: surveillance, testing, management, and cure. We are primarily concerned with zoonotic diseases: diseases that transfer from animals to humans.  By surveilling, we mean tools and techniques for predicting or providing early warnings of outbreaks of novel or known pathogens. By testing, we mean diagnosing individual patients with the disease. By managing, we mean the tools and techniques for better understanding and limiting the spread of the outbreak, providing care, and engaging the citizenry.  By curing, we mean the development of therapeutic agents to administer to infected individuals. In all of these areas, the lone data scientist working without true problem understanding can be misguided at best and detrimental at worst.


  1. Zoonotic pathogen prediction. There are a large number of known pathogens, but for most of them, it is not known whether they can transfer from animals into humans (and develop into outbreaks). It may be possible to predict the likely candidates by training on features of known zoonotic pathogens. We tried doing it a few years ago in partnership with disease ecologist Barbara Han who defined the relevant features, but didn’t get very far because the features of pathogens are not available in a nice clean tabular dataset; they are locked up inside scientific publications. Knowledge extraction from these very specialized documents automatically requires a lot of expert ecologist-annotated documents, which is not tenable. Even if we were able to pull together a dataset suitable for predicted zoonoses, we wouldn’t know how to make heads or tails of the results without the disease ecologists.
  2. Informed spillover surveillance. Once a pathogen is known as a zoonotic disease and has had an outbreak, it is important to monitor it for future outbreaks or spillovers. Reservoir species harbor pathogens without having symptoms and without dying, waiting for a vector to carry the disease to humans and start another outbreak. In the first year of the IBM Science for Social Good initiative, we partnered with the same disease ecologist to develop algorithms for predicting the reservoir species of primates for Zika virus in the Americas so that populations of those species could be further scrutinized and monitored. Without Barbara, we would have had no clue about what problem to solve, what data sources to trust, how to overcome severe class imbalance in the prediction task (by combining data from other viruses in the same family), and how the predictions could inform policy.
  3. Outbreak early warning. The earlier we know that an outbreak is starting, the earlier actions can be taken to contain it. There are often small signals in various reports and other data that indicate a disaster is beginning. BlueDot knew something was up with the novel coronavirus as early as December 30, 2019, but they’ve been at this for quite a while and have a team that includes veterinarians, doctors, and epidemiologists. Even then, their warnings were not heeded as strongly as they could have been.


  1. Group testing. There are shortages of COVID-19 tests in certain places. Well-meaning data scientists ask the question: isn’t there a smart way to test more people with the same number of tests (and I’ve seen it asked several different times already, including in an email that a friend from grad school sent both of us). Eventually, someone points out the method of group testing, which has been known since WWII. But even that is not the solution for the current method of testing (PCR). You pointed out in your response to the friend that group testing would require a serological test for COVID-19, which isn’t ready yet. A case of solving a problem with an already known solution that is actually not a relevant problem.
  2. Deep learning from CT images. Deep neural networks have achieved better accuracy than expert physicians in several medical imaging tasks in radiology, dermatology, ophthalmology, and pathology, so it is natural that several groups would try training them for diagnosing COVID-19. Again, a well-meaning effort, but sometimes not executed very well. E.g. this paper uses CT images of COVID-19-confirmed patients from China as the positive class and images of healthy people from the United States as the negative class — which may introduce spurious correlations and artificially inflate the accuracy.  Even if this task is well done, will it find its way into clinical practice?  That has not yet been the case in those tasks mentioned above despite the initial demonstrations having happened several years ago.
  3. Classifying breathing patterns. A paper posted to arXiv with the title Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner claims that “According to the latest clinical research, the respiratory pattern of COVID-19 is different from the respiratory patterns of flu and the common cold. One significant symptom that occurs in the COVID-19 is Tachypnea. People infected with COVID-19 have more rapid respiration.” but the authors provide no reference to this clinical research and I haven’t been able to track it down myself. If there isn’t really any distinguishing difference between respiration patterns with flu and COVID-19, then this work is in vain, and could have been avoided by conferring with clinicians.


  1. Spatiotemporal epidemiological modeling. Once an outbreak has started, it is important to model its spread to inform decision making in the response. This is the purview of epidemiology and has a lot of nuance to it. Small differences in the input can yield large differences in the output. This should be left to the experts who have been doing it for many years.
  2. Data-driven decision making. Another aspect to managing an outbreak is collecting primary (e.g. case counts), secondary (e.g. hospital beds, personal protective equipment), and tertiary (e.g. transportation and other infrastructure) information. This is highly challenging and in a disaster situation requires both formal and informal means. During the 2014 Ebola outbreak, we observed that there was a lot of enthusiasm for collecting, collating, and visualizing the case counts, but not so much for the secondary and tertiary information, which, according to the true experts, is really the most important for managing the situation. The same focus on the former is true now, but at least there is some focus on the latter. Enthusiasm is great, but better when directed to the important problems.
  3. Engaging the public. In managing outbreaks, it is critical to inform the public of best practices to limit the person-to-person spread of the disease (which may go against cultural norms) and also to receive information from the situation on the ground. This has been done to effect in the past such as during the Ebola outbreak and in certain places now, but seems to be lacking in many other places. Misinformation and disinformation in peer-to-peer and social network platforms appears to be rampant, but there seems to be little ‘tech solutioning’ in this space so far – perhaps the energy is being spent elsewhere.


  1. Drug repurposing. Interestingly, drugs developed for particular diseases also have therapeutic effect on other diseases. For example, chloroquine, an old malaria drug has an effect on certain cancers and anecdotally seems to show an effect on the novel coronavirus. By finding such old generic drugs whose safety has already been tested and which might be inexpensive and already in large supply, we can quickly start tamping down an outbreak after the therapeutic effect is confirmed in a large-scale clinical trial. But such findings of repurposing are difficult to notice at large scale without the use of natural language processing of scientific publications. A consortium recently released a collection of 29,000 scientific publications related to COVID-19 (CORD-19), but there is very little guidance for NLP researchers on what to do with that data and no subject matter expert support. Therefore, it seems unlikely that anything of much use will come out of it.
  2. Novel drug generation and discovery. Repurposing has its limits; we must also discover completely new drugs for new diseases. State-of-the-art generative modeling approaches have begun that journey, but are currently difficult to control. And moreover, consulting subject matter experts is required to figure out what desirable properties to control for in the generation: things like toxicity and solubility. Finally, generating sequences of candidate drugs in silico only makes sense if there is close coupling with laboratories that can actually synthesize and test the candidates.

In my originally envisioned post, I was going to end with a sort of cute twelfth item: staying at home.  Apart from lumberjacks, data scientists are among the professions most suited to not spreading the coronavirus according to this data presented by the New York Times. But in fact, this is not merely a cute conclusion: it is the one right contribution that data scientists can truly make well while in isolation off the bandwagon. When the fog clears, however, lets be deliberate and work interdisciplinarily to create full, well thought out, and tested solutions for mitigating and managing global pandemics.



June 3, 2018

Comment ça se plume? The venerable Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) reconvenes this week. The Great AI War of 2018 revisits New Orleans for another skirmish. 

Following my previous posts on AISTATS paper counts, ICASSP paper counts, ICLR paper counts, and SDM paper counts, below are the numbers for accepted NAACL papers among companies for long papers, short papers, industry papers, and all combined.

Company Paper Count (Long)
Microsoft 10
Amazon 5
Facebook 5
Tencent 4
DeepMind 3
Google 3
JD 3
Adobe 2
Elemental Cognition 2
PolyAI 2
Siemens 2
Agolo 1
Aylien 1
Bloomberg 1
Bytedance 1
Choosito 1
Data Cowboys 1
Educational Testing Service 1
Fuji Xerox 1
Grammarly 1
Huawei 1
Improva 1
Interactions 1
Intuit 1
Philips 1
Samsung 1
Snap 1
Synyi 1
Thomson Reuters 1
Tricorn (Beijing) Technology 1
Company Paper Count (Short)
Google 3
Microsoft 3
Facebook 2
Adobe 1
Alibaba 1
Amazon 1
Ant Financial Services 1
Bloomberg 1
Educational Testing Service 1
Infosys 1
PolyAI 1
Preferred Networks 1
Roam Analytics 1
Robert Bosch 1
Samsung 1
Tencent 1
Thomson Reuters 1
Volkswagen 1
Company Paper Count (Industry)
Amazon 6
eBay 4
Airbnb 1
Boeing 1
Clinc 1
Educational Testing Service 1
Google 1
Interactions 1
Microsoft 1
Nuance 1
ZEIT online 1
Company Paper Count (Total)
Microsoft 14
Amazon 12
IBM 10
Facebook 7
Google 7
Tencent 5
eBay 4
Adobe 3
DeepMind 3
Educational Testing Service 3
JD 3
PolyAI 3
Bloomberg 2
Elemental Cognition 2
Interactions 2
Samsung 2
Siemens 2
Thomson Reuters 2

My methodology was to click on all the pdfs in the proceedings and manually note affiliations.