a short note on “Rebooting AI” by Marcus & Davis

Disclaimer: I received the hard copy of <Rebooting AI> from the publisher, although I had by then purchased the Kindle version of the book myself on Amazon. I only gave a quick look at the book on my flight between UIUC and NYC and wrote this brief note on my flight back to NYC from Chicago. I also felt it would be good to have even a short note by a machine learning researcher to balance all those praises by “Noam Chomsky, Steven Pinker, Garry Kasparov” and others.   <Rebooting AI> is a well-written piece (somewhat hastily) summarizing the current state of

Discrepancy between GD-by-GD and GD-by-SGD

The ICLR deadline is approaching, and of course, it’s time to write a short blog post that has absolutely nothing to do with any of my manuscripts in preparation. i’d like to thank Ed Grefenstette, Tim Rocktäschel and Phu Mon Htut for fruitful discussion. Let’s consider the following meta-optimization objective function: $$\mathcal{L}'(D’; \theta_0 – \eta \nabla_{\theta} \mathcal{L}(D; \theta_0))$$ which we want to minimize w.r.t. θ₀. it has become popular recently thanks to the success of MAML and its earlier and more recent variants to use gradient descent to minimize such a meta-optimization objective function. the gradient can be written down as* $$\nabla_{\theta_0} \mathcal{L}'(D’; \theta_0 – \eta \nabla_\theta \mathcal{L}(D; \theta_0) =

BERT has a Mouth and must Speak, but it is not an MRF

Update on June 9 2021: i still don’t know the fate of the hypothetical manuscript by Chandel et al., but i’ve noticed that Kartik Goyal, Chris Dyer & Taylor Berg-Kirkpatrick fixed this issue (https://arxiv.org/abs/2106.02736) in this blog post by using BERT’s conditional as a proposal distribution in Metropolis-Hastings, to sample from the distribution defined using the potentials defined by the BERT’s single-token conditionals’ logits. It was pointed out by our colleagues at NYU, Chandel, Joseph and Ranganath, that there is an error in the recent technical report <BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model> written

Are we ready for self-driving cars?

Last Monday (April 29), I had an awesome experience of having been invited and participating in the debate event organized by the Review and Debates at NYU (http://www.thereviewatnyu.com/). By being born and raised in South Korea, I can confidently tell you that i cannot remember a single moment where I participated in any kind of formal debate nor a single chance in which i was taught how to make an argument for or against any specific topic. My mom often tells me I draw way too gloomy picture of Korean K-12 education I had, but it is true that our

On the causality view of “Context-Aware Learning for Neural Machine Translation”

[Notice: what an unfortunate timing! This post is definitely NOT an april fool’s joke.] Sebastien Jean and I had a paper titled <context-aware learning for neural machine translation> rejected from NAACL’19, perhaps understandable because we did not report any substantial gain in the BLEU score. As I finally found some time to read Pearl’s <Book of Why> due to a personal reason  (yes, personal reasons sometimes can help), I thought I wrote a short note on how the idea in this paper was originally motivated. As I was never educated in causal inference or learning, I was scared of using a term