When do duplicates/frequencies matter in classification?

an interesting urban legend or wisdom is that a classifier we train will work better on examples that appear more frequently in the training set than on those that are rare. that is, the existence of duplicates or near-duplicates in the training set affects the decision boundary learned by a classifier. for instance, imagine training a face detector for your phone’s camera in order to determine which filter (one optimized for portraits and the other for other types of pictures). if most of the training examples for building such face detector were taken in bright day light, one often without

Three faces of sparsity: nonlinear sparse coding

it’s always puzzled me what sparsity means when computation is nonlinear, i.e., decoding the observation from a sparse code using nonlinear computation, because the sparse code can very well be turned into a dense code along the nonlinear path from the original sparse code to the observation. this made me write a short note earlier, as in a few years back, and i thought i’d share my thoughts on sparsity here with you: in my mind, there are three ways to define sparse coding. these are equivalent if we constrain the decoder to be linear (i.e., $x = \sum_{i=1}^{d’} z_i

Are JPEG and LM similar to each other? If so, in what sense, and is this the real question to ask?

last night, Douwe Kiela sent me a link to this article by Ted Chiang. i was already quite drunk already back then, quickly read the whole column and posted the following tweet: Delip Rao then retweeted and said that he does not “buy his lossy compression analogy for LMs”, in particular in the context of JPEG compression. Delip and i exchanged a few tweets earlier today, and i thought i’d state it here in a blog post how i described in the following tweet why i think LM and JPEG have the same conceptual background: one way in which I

[NeurIPS’22] Chasing reviewers

as some of you may have noticed, i was one of the program chairs of NeurIPS’22 which just ended last Friday (December 9 2022). it was a two-week-long conference with the first week being in person in New Orleans which was followed by the virtual week. program chairs were mostly tasked with running the review process for the main track of the conference and inviting keynote speakers, and there were other organizing committee members who have taken care of various other aspects of the conference, including expos, workshops, tutorials, datasets and benchmark track, social events, affinity workshops and many more,

Donation to “청포도” 보호종료아동을 위한 커뮤니티 케어 센터

Prof. Sukyoung Ryu, who’s the Dean of School of Computing at KAIST (my alma mater) and a professor of computer science, posted on her facebook wall about a donation campaign by the “Green Grapes” Community Care Center for Children Graduating from Social Protection Program (“청포도” 보호종료아동을 위한 커뮤니티 케어 센터; i just made up this translation and am sure it doesn’t do the justice to the original Korean name.) this center’s main mission is to provide various educational programs as well as run support programs for those who are exiting social support systems for minors, such as foster homes, group

1 2 3 13