perhaps because my post-graduate training and career afterward has almost entirely focused on optimizing a “loss function” defined as the “average” loss of individual data instances within a large amount “data” set by “mechanistically” adjusting the parameters of a large predictor, my own thought process itself also started to resemble this process; think of an objective function, that is measurable, and figure out a systematic way to optimize this objective function. although it was not long after this started that i began to question this whole process; it probably helped that i was diagnosed with a thyroid cancer (pretty rare
Category: Research
Drug Discovery may be in the Cold War Era
as scientists (yes, i identify as a scientist myself, although i can see how this can be debatable,) we are often trained and encouraged to uncover mechanisms behind mysterious phenomena in the universe. depending on what kinds of these mysterious phenomena we consider, we are classified into different buckets; if you are working on biological phenomena, you’re a biologist. if you are working with languages, you’re a linguist (are linguists scientists? a good question, but i will leave this question for another post in the future.) if you are working with problem solving, i think you’d be considered a computer
Global AI Frontier Lab at New York University
earlier this year, we officially launched the Global AI Frontier Lab at New York University (NYU), directly under the Office of the President. the Global AI Frontier Lab was created at NYU in collaboration with the Korea’s Ministry of Science and ICT (specifically with its Institute for Information communication Technology Planning and Evaluation IITP), in order to support research in artificial intelligence (AI) and facilitate international collaboration between a growing body of AI researchers at NYU and those in Korea. together with Yann, i am co-directing this lab. the Global AI Frontier Lab focuses on three research themes; (1) Fundamental
Softmax forever, or why I like softmax
[UPDATE: Feb 8 2025] my amazing colleague Max Shen noticed a sign mistake in my derivation of the partial derivative of the log-harmonic function below. i taught my first full-semester course on <Natural Language Processing with Distributed Representation> in fall 2015 (whoa, a decade ago!) you can find the lecture note from this course at https://arxiv.org/abs/1511.07916. in one of the lectures, David Rosenberg, who was teaching machine learning at NYU back then and had absolutely no reason other than kindness to sit in at my course, asked why we use softmax and whether this is the only way to turn
Amortized Mixture of Gaussians (AMoG): A Proof of Concept for “Learning to X”, or how I re-discovered simulation-based inference
here’s my final hackathon of the year (2024). there are a few concepts in deep learning that i simply love. they include (but are not limited to) autoregressive sequence modeling, mixture density networks, boltzmann machines, variational autoencoders, stochastic gradient descent with adaptive learning rate and more recently set transformers. so, as the final hackathon of this year, i’ve decided to see if i can put together a set transformer, an autoregressive transformer decoder and a mixture density network to learn to infer an underlying mixture of Gaussians. i’ve got some help (and also misleading guidances) from Google Gemini (gemini-exp-1206) ,