Blog – Page 3 – Kyunghyun Cho

December 21, 2024December 21, 2024 kyunghyuncho

i sensed anxiety and frustration at NeurIPS’24

Personal

last week at NeurIPS’24, one extremely salient thing was the anxiety and frustration felt and expressed by late-year PhD students and postdocs who were confused by the job market that looks and feels so much different from what they expected perhaps when they were applying for PhD programs five or so years ago. and, some of these PhD students and postdocs are my own under my supervision. this makes me reflect upon what is going on or what has been going on in artificial intelligence research and development. this post will be more of less a stream of thoughts rather

November 27, 2024November 28, 2024 kyunghyuncho

<The Atomic Human> by Neil Lawrence

Personal, Research

i can’t recall exactly but it was sometime in 2013 when Neil Lawrence visited Aalto University (it was january, apparently!). he gave a talk in a pretty small lecture room which was completely packed (and i was there as well.) he talked about his years-long effort in introducing probabilistic interpretation (and thereby extensions) to (hierarchical) unsupervised learning, which was back then being consumed by deep learning based approaches. that’s when i first learned clearly the intuition and motivation behind so-called GP-LVM (Gaussian process latent variable models). that was beautiful, or to be precise, how neil delivered his inspiration, motivation and

October 31, 2024October 31, 2024 kyunghyuncho

An outrageous idea: a society-level forever clinical trial

Personal, Research

when i got tenure earlier, i thought that would change how i work and live. it was true, but it wasn’t because of tenure but because of my thyroid cancer (see https://kyunghyuncho.me/sharing-some-good-news-and-some-bad-news/ if you’re curious.) when i was promoted to become a full professor, i thought that would change how i work and live, but to be frank, it didn’t. though, i started to think about what i should be able to think about, now that i have become a full professor with tenure, implying (at least in my mind) that i have an obligation not only to carry on

May 29, 2024May 29, 2024 kyunghyuncho

Continued musing on DPO

Research

This post continues from the earlier post on fixing DPO (https://kyunghyuncho.me/a-proper-preference-optimization-loss-and-its-gradient/). by the way, the dinner reservation was at Ramro (https://www.ramronyc.com/, https://maps.app.goo.gl/jwpyPvy2pjNsxS6h9), and i recommend you try it out. a very interesting cuisine! Direct Preference Optimization let’s start by stating the direct preference optimization (DPO) loss for each example $(x,y_+, y_-)$: \[\log \left( 1 + \exp \left(-\left(\beta \log \frac{\pi(y_+)}{\pi(y_-)}-\gamma \log \frac{\pi_0(y_+)}{\pi_0(y_-)}\right) \right) \right).\] this takes a slightly different form from the original DPO loss. in the original DPO loss, $\gamma = \beta$ was forced, which leaves the scale (or entropy) of the reference model $\pi_0$ uncontrollable. this formulation above is

April 27, 2024April 28, 2024 kyunghyuncho

Fixing DPO but I have a dinner reservation …

Research

Direct preference optimization (DPO; https://arxiv.org/abs/2305.18290) is all the rage, i heard. i also hear from my students that DPO, which minimizes the following loss, often results in weird behaviours, such as unreasonable preference toward lengthy responses (even when there is no statistical difference in lengths between desirable and undesirable responses.) i won’t go into details of these issues, but i feel like there’s a relatively simple reason behind these pathologies based on basic calculus. \[\mathcal{L}_{\mathrm{dpo}}(\theta) = -\log \left(1 + \exp \left(- \log \frac{p_{\theta}(y|x)}{p_{0}(y|x)}+ \log \frac{p_\theta(y’|x)}{p_{0}(y’|x)}\right)\right),\] where $p_0$ is the so-called reference model from which $y$ and $y’$ were drawn independently

« Prev 1 2 3 4 5 … 18 Next »