here’s my final hackathon of the year (2024). there are a few concepts in deep learning that i simply love. they include (but are not limited to) autoregressive sequence modeling, mixture density networks, boltzmann machines, variational autoencoders, stochastic gradient descent with adaptive learning rate and more recently set transformers. so, as the final hackathon of this year, i’ve decided to see if i can put together a set transformer, an autoregressive transformer decoder and a mixture density network to learn to infer an underlying mixture of Gaussians. i’ve got some help (and also misleading guidances) from Google Gemini (gemini-exp-1206) ,
Category: Research
Stochastic variational inference for low-rank stochastic block models, or how i re-discovered SBM unnecessarily
Prologue a few weeks ago, i listened to Sebastian Seung’s mini-lecture at Flatiron Institute (CCM) about the recently completed fruit fly brain connectome. near the end of the mini-lecture, sebastian talked about the necessity of graph node clustering based on the type-level connectivity patterns instead of node-level connectivity patterns. i thought that would be obviously easy to solve with latent variable modeling and ChatGPT. i was so wrong, because ChatGPT misled me into every possible wrong corner of the solution space over the next two weeks or so. eventually, i implemented a simple variational inference approach to latent variable clustering,
<The Atomic Human> by Neil Lawrence
i can’t recall exactly but it was sometime in 2013 when Neil Lawrence visited Aalto University (it was january, apparently!). he gave a talk in a pretty small lecture room which was completely packed (and i was there as well.) he talked about his years-long effort in introducing probabilistic interpretation (and thereby extensions) to (hierarchical) unsupervised learning, which was back then being consumed by deep learning based approaches. that’s when i first learned clearly the intuition and motivation behind so-called GP-LVM (Gaussian process latent variable models). that was beautiful, or to be precise, how neil delivered his inspiration, motivation and
An outrageous idea: a society-level forever clinical trial
when i got tenure earlier, i thought that would change how i work and live. it was true, but it wasn’t because of tenure but because of my thyroid cancer (see https://kyunghyuncho.me/sharing-some-good-news-and-some-bad-news/ if you’re curious.) when i was promoted to become a full professor, i thought that would change how i work and live, but to be frank, it didn’t. though, i started to think about what i should be able to think about, now that i have become a full professor with tenure, implying (at least in my mind) that i have an obligation not only to carry on
Continued musing on DPO
This post continues from the earlier post on fixing DPO (https://kyunghyuncho.me/a-proper-preference-optimization-loss-and-its-gradient/). by the way, the dinner reservation was at Ramro (https://www.ramronyc.com/, https://maps.app.goo.gl/jwpyPvy2pjNsxS6h9), and i recommend you try it out. a very interesting cuisine! Direct Preference Optimization let’s start by stating the direct preference optimization (DPO) loss for each example $(x,y_+, y_-)$: \[\log \left( 1 + \exp \left(-\left(\beta \log \frac{\pi(y_+)}{\pi(y_-)}-\gamma \log \frac{\pi_0(y_+)}{\pi_0(y_-)}\right) \right) \right).\] this takes a slightly different form from the original DPO loss. in the original DPO loss, $\gamma = \beta$ was forced, which leaves the scale (or entropy) of the reference model $\pi_0$ uncontrollable. this formulation above is