Archive for May, 2022

h1

How and Why I Independently Published A Book

May 9, 2022

Good afternoon Señor Horace Greeley. Many people have asked, so I’d like to recount how and why I independently wrote and published the book entitled Trustworthy Machine Learning, which is available for free in html and pdf formats at http://www.trustworthymachinelearning.com and as an at-cost paperback at various Amazon marketplaces around the world (USA, Canada, UK, Germany, Netherlands, Japan, …).

Why I Wrote A Book

Writing a book is a big effort and a big commitment, so why do it? Just like you shouldn’t do a startup company just to be able to say you did a startup, it can’t be just because you want to have written a book. It has to be because you have something unique to say that the world needs to hear, and it is just bursting out of you.

I’d had the nondescript want for a book for a long time. But three years ago, I felt that there was something I needed to say. That was my approach and worldview for doing data science and machine learning that I had honed over a decade in an environment that few others experienced. And it felt like the deep learning revolution was missing some important things. I was ready to speak.

How It Started

In May 2019, I flew to Madrid to represent Darío at Fundación Innovación Bankinter’s Future Trends Forum. That trip was the only time in my life I’ve sat in business class and it was fortuitous because it happened mere days after I had a painful back spasm. After the meeting concluded, I had a few hours to kill before proceeding onwards to Geneva for the AI for Good Global Summit. Instead of risking my back with any tourism, I sat in a park (the thin green area on the map) and wrote down an entire outline for the book I was imagining. That outline ended up being close to that of the eventual finished product. Look below for exactly what I typed into the notes app of my phone that afternoon.

Introduction
 Age of Artificial intelligence 
    General purpose technology
 Trustworthiness
 Overview and Limitations 
   Overview 
   Limitations of book
   Biases of author
     Diverse voices

Preliminaries
 Uncertainty
   Aleatoric
   Epistemic
 Detection theory
   Confusion matrix
   Costs
   Bayesian detection
   ROC
   Calibration
   Robust (minimax) detection
   Neyman-Pearson detection
   Chernoff-Stein, mutual information theory, kl divergence
 Causality
 Directed graphical models

Data
 Finite samples
 Modalities
 Sources
   Administrative data
   Crowdsourcing
 Biases
   Temporal biases
   Cognitive biases/prejudice (quantization)
      Quantization only by words so don't have to introduce quantization and clustering
   Sampling biases
   Poisoning
 Privacy
   Causal basis included

Machine learning
 Risk minimization
 Decision stumps
    Trees, Forests
    Perceptron
    Margin-based methods
    Neural networks
 Adversarial methods
 Data augmentation
 Causal inference
 Causal discovery

Safety
 Epistemic uncertainty in machine learning
 Distribution shift
 Fairness
 Adversarial robustness
   (Causal foundations included in each pillar)
 Testing

Communication
 Explainability and interpretability
   Direct global
   Distillation / simple models
   Post hoc local
 Value alignment
   Unified theory
   Preference elicitation
   Specification gaming
 Factsheets
 Blockchain

Purpose
 Professional codes
 Lived experience
 Social good
   Types of problems with examples
 Open platforms 

Summer and Fall of 2019

Once I was back from Europe, the summer was upon us and that meant having our social good student fellows with us and their projects in full steam. That, along with my other work, also meant days full of meetings: a manager’s schedule rather than a maker’s schedule, so I didn’t do anything further on the book all summer. Here is my calendar on one of those summer days (and this wasn’t atypical).

In the fall of 2019, I had the honor of spending three months at IBM Research – Africa, in Nairobi, Kenya. Because of the time difference, I made myself only available for meetings 8 am to 11 am Eastern, which often meant entire mornings (East Africa Time) with no meetings (except for the nice conversations with the Africa lab researchers). Even though I thought I could use that time to start writing the book, I didn’t. Instead, the sabbatical turned out to be a great time to recover and recharge (while also doing some stuff on maternal, newborn and child health). Recovery is underappreciated.

Starting to Write

Back home, and with my calendar still mostly bare, I blocked off 90 minutes for writing every day starting on January 2, 2020. I started getting into a flow and put some words and equations down on paper (really this Overleaf). I made good progress on an introduction chapter and a detection theory chapter.

Then in mid-February, Bob Sutor stopped by my office and said that an acquisitions editor for the publisher he worked with on Dancing with Qubits was looking to publish a book on responsible and ethical AI, and connected me with Tushar. Coincidentally, the same week, an acquisitions editor for Manning Publications emailed me cold about my possible interest in writing a book. I had good conversations with both editors and I was naïvely happy at the perfect confluence of events.

I filled out book proposals for both companies. Here is the one I did for Packt:

and here is the one I did for Manning:

I was completely honest in explaining what I wanted to do (mix of math and narrative), who it was for, and so on. I even sent over the couple of chapters I had already written. Both publishers were happy and accepted my proposal. Both made very similar offers in the contractual terms, which wasn’t particularly important for me because I wasn’t doing this for the money. Manning had an early access program through which readers could access chapters as they were being written (which is what I wanted and also why I had made the Overleaf open when I was writing the first two chapters), so I decided to go with them. I signed on the dotted line on March 17, 2020.

Turbulence

Things did not go as I thought they might. Everything had shut down a week earlier because of the Covid-19 pandemic, and the shutdown did not abate in any way. I was sitting on a dilapidated sofa in my basement trying to complete other work, taking the kids outside to kick a soccer ball around once in a while, and plotting out how to get scarce groceries — not exactly conducive to writing. Certainly no more 90 minute blocks of time daily.

More turbulent than that, however, was the publisher trying to shoehorn me into what they wanted. My proposal was very clear that the book would have a decent amount of math and no software code examples, would be a tour of different topics, and would be centered on concepts. But that didn’t seem to matter once things were underway. As I soon learned, Manning religiously follows Bloom’s taxonomy, and understanding concepts is very low on the totem pole. As instructed, I doggedly kept trying to push my text higher in the taxonomy, but it was mostly a farce to me, where I would just use the word “sketch” or “appraise” while still saying what I was going to say. I was also ruthlessly trying to reduce the math at their insistence. For example, the chapter on uncertainty as a concept morphed into evaluating safety.

There was a lot of back and forth, and a lot of frustration. Eventually, on February 16, 2021, the book was available for sale in the $40-$60 range through the early access program with the first four chapters available. We celebrated. I got a lot of positive feedback from people I know.

But the turbulence didn’t calm down. More Bloom, less math, and less of myself. I am not someone who uses the word “grok“. I didn’t want this to be a prescriptive recipe book because I don’t believe that that is what trustworthy machine learning is all about.

The book reached 320 sales by the time the first 12 chapters had been posted, which in my opinion is pretty darn good for something that is not even complete and with an underwhelming marketing effort.

Then came an ending and a rebirth. On September 10, 2021, the acquisitions editor reached out and said that the publisher would be ending the contract and the rights to the content would revert back to me. I guess the sales weren’t what they needed and the content continued to be mismatched from the desires of their typical buyers. This turn of events ended up being more of an emotional relief than anything else.

Did the book improve because of all that back and forth? On balance, I’d say yes. So no hard feelings.

Finishing

I am not one to leave things unfinished, and I wasn’t going to let the ending of the contract hold me back from finishing the manuscript that I had toiled on for so very long at that point. I vowed to complete the whole thing by the end of the calendar year. In less than 4 months, I wrote the remaining 6 chapters: an unbridled pace much faster than what I had been doing before.

I didn’t think much about what the route to get it out would be in September or October. Tushar reached out and offered to bring it to market through Packt, but I just wanted to focus on finishing it. And I did, on December 30!

By that time, I had made up my mind to post it online with a Creative Commons license to begin with. I created the website http://www.trustworthymachinelearning.com and posted a pdf of version 0.9. I quietly spread the word and kept getting a lot of positive response from acquaintances.

Independently Published

While a diverse panel I had assembled was giving version 0.9 a look over and providing feedback, I did a bunch of soul-searching on what this book was for and why I was doing it. I also pored over what people had written about self-publishing in today’s age. I clearly wasn’t in it for the money — I was more than happy for anyone in the world to learn from it without paying. In fact, empowering people, no matter their station in life, is one of the messages of the book. I wanted its message to ring far and wide.

While everyone has a little vanity in them, like I said at the beginning of this post, I hadn’t written the book just to have written a book. This was also not a book aiming for some kind of book award. I wasn’t going to be using it for an academic tenure or promotion case, or any other stamp of approval. I didn’t want IBM to be involved in any explicit way (Manning had actually sought that out through a sponsorship deal). I enjoy doing a little formatting and aesthetic stuff here and there, and copy-editing. The previous experience hadn’t shown me that a publisher would necessarily do the right kind of marketing. Kindle Direct Publishing is really easy, doesn’t require any capital investment, and has very wide reach.

Putting all of that thinking together, despite not having heard of others in my orbit doing it before, I decided to independently publish the book. It has been up on Amazon since February 16, 2022 at the lowest possible price that Amazon allows for covering their costs. I’ve been very happy with my decision. It suits me and my worldview.

Afterwards

That very day, February 16, I made a social media push about the book, and that very night, I received this very kind email from Michael Hassan Tarawalie:

Dear sir, 

It is an honor to come in contact with you, sir. Am a student at the electrical and electronic department, faculty of engineering Fourah Bay College, University of Sierra Leone.

Sir your book has helped me.

One of the very first citations to the book was in the influential report by NIST entitled “Towards a Standard for Identifying and Managing Bias in Artificial Intelligence”.

There have been several great reviews of the book on Amazon from people I don’t know. It has become almost a cottage industry for people to hold up their copy of the paperback in large meetings I attend on Zoom and for others to post photos holding their copy on social media.

As of today, 481 copies of the book have been printed and shipped across the world in less than 3 months. Even though I’m not tracking it, I’m sure lots of people have accessed the free pdf and used it to uplift themselves.

This is what I wished for.

It always seems impossible until it’s done.

Nelson Mandela