it’s typically not a part of any formal training of PhD students to learn how to write a review. certainly there are materials online that aim to address this issue by providing various tips & tricks of writing a review, such as Reviewing Advice – ACL-IJCNLP 2021 (aclweb.org), but it’s not easy to learn to write something off of a bullet-point list of what should be written. it’s thus often left for student authors to learn to review by reading the reviews of their own papers. this learning-to-review-by-reading-one’s-own-reviews strategy has some downsides. a major one is that people are often

## How to think of uncertainty and calibration … (2)

in the previous post (How to think of uncertainty and calibration …), i described a high-level function $U(y, p, \tau)$ that can be used for various purposes, such as (1) retrieving all predictions above some level of certainty and (2) calibrating the predictive distribution. of course, one thing that was hidden under the rug was what this predictive distribution $p$ was. in this short, follow-up post, i’d like to give some thoughts about what this $p$ is. to be specific, i will use $p(y|x)$ to indicate that this is a distribution over all possible answers $\mathcal{Y}$ returned by a machine

## How to think of uncertainty and calibration …

since i started Prescient Design almost exactly a year ago and Prescient Design joined Genentech about 4 months ago, i’ve begun thinking about (but not taking any action on) uncertainty and what it means. as our goal is to research and develop a new framework for de novo protein design that includes not only a computational component but also a wet-lab component, we want to ensure that we balance exploration and exploitation carefully. in doing so, one way that feels natural is to use the level of uncertainty in a design (a novel protein proposed by our algorithm) by our

## Manifold mixup: degeneracy?

i’ve been thinking about mixup quite a bit over the past few years since it was proposed in [1710.09412] mixup: Beyond Empirical Risk Minimization (arxiv.org). what a fascinatingly simple and yet intuitively correct idea! we want our model to behave linearly between any pair of training examples, which thus helps our model generalize better to an unseen example which is likely to be close to an interpolated point between some pair of training examples. if we consider the case of regression (oh i hate this name “regression” so much..) we can write this down as minimizing -\frac{1}{2} \| \alpha y

## Supporting female researchers and researchers from under-represented groups, together with CIFAR

if i had to pick organizations that have impacted my current career path most, CIFAR would be very near (if not at) the top of this list. there are a few reasons behind this. first, CIFAR started a program named “Neural Computation & Adaptive Perception” (NCAP) in 2004, supporting research in artificial neural networks, which has become a dominant paradigm in machine learning as well as more broadly artificial intelligence and all adjacent areas, including natural language processing and computer vision. i started my graduate study in 2009 with focus on restricted Boltzmann machines and graduated in 2014 with a