in the previous post (How to think of uncertainty and calibration …), i described a high-level function $U(y, p, \tau)$ that can be used for various purposes, such as (1) retrieving all predictions above some level of certainty and (2) calibrating the predictive distribution. of course, one thing that was hidden under the rug was what this predictive distribution $p$ was. in this short, follow-up post, i’d like to give some thoughts about what this $p$ is. to be specific, i will use $p(y|x)$ to indicate that this is a distribution over all possible answers $\mathcal{Y}$ returned by a machine
Category: Research
How to think of uncertainty and calibration …
since i started Prescient Design almost exactly a year ago and Prescient Design joined Genentech about 4 months ago, i’ve begun thinking about (but not taking any action on) uncertainty and what it means. as our goal is to research and develop a new framework for de novo protein design that includes not only a computational component but also a wet-lab component, we want to ensure that we balance exploration and exploitation carefully. in doing so, one way that feels natural is to use the level of uncertainty in a design (a novel protein proposed by our algorithm) by our
Manifold mixup: degeneracy?
i’ve been thinking about mixup quite a bit over the past few years since it was proposed in [1710.09412] mixup: Beyond Empirical Risk Minimization (arxiv.org). what a fascinatingly simple and yet intuitively correct idea! we want our model to behave linearly between any pair of training examples, which thus helps our model generalize better to an unseen example which is likely to be close to an interpolated point between some pair of training examples. if we consider the case of regression (oh i hate this name “regression” so much..) we can write this down as minimizing $$-\frac{1}{2} \| \alpha y
Supporting female researchers and researchers from under-represented groups, together with CIFAR
if i had to pick organizations that have impacted my current career path most, CIFAR would be very near (if not at) the top of this list. there are a few reasons behind this. first, CIFAR started a program named “Neural Computation & Adaptive Perception” (NCAP) in 2004, supporting research in artificial neural networks, which has become a dominant paradigm in machine learning as well as more broadly artificial intelligence and all adjacent areas, including natural language processing and computer vision. i started my graduate study in 2009 with focus on restricted Boltzmann machines and graduated in 2014 with a
Restricted Boltzmann machines or contrastive learning?
my inbox started to over-flow with emails that urgently require my attention, and my TODO list (which doesn’t exist outside my own brain) started to randomly remove entries to avoid overflowing. of course, this is perfect time for me to think of some random stuff. This time, this random stuff is contrastive learning. my thought on this stuff was sparked by Lerrel Pinto’s message on #random in our group’s Slack responding to the question “What is wrong with contrastive learning?” thrown by Andrew Gordon Wilson. Lerrel said, My understanding is that getting negatives for contrastive learning is difficult. Lerrel Pinto