so, it looks like watermarking is a thing that is coming back to its (controversial) life. the idea of watermarking is to enable content producers to mark their own contents so as to track where those contents are being consumed without introducing too much of disruption. one of the simplest watermarking techniques i run into quite often is on a plan with their entertainment system; when you watch a movie on an airplane, you often notice the airline code (e.g. “DL” in the case of Delta) embroiled on the screen once a while. i presume the heightened interest in watermarking
Blog
Follow-up donation after 2 years
after receiving the Samsung Ho-Am Prize 2.5 years ago (early 2021), i made a few small donations here and there; i donated approximately \$85,000 to KAIST to establish a small scholarship for female students in computer science in honour of my mom, \$85,000 to Soongsil University for my dad who since then has retired from Soongsil University after more than 30 years there as a professor of korean literature and language, €30,000 EUR to Aalto University’s computer science for establishing a small scholarship to support non-EU students, \$30,000 CAD to Mila, and \$50,000 USD to CIFAR for supporting female researchers
Expectile regression
i often find myself extremely embarrassed by myself, because i learn of concepts in machine learning that i should’ve known as a professor in machine learning but had never even heard of before. one latest example was expectile regression; i ran into this concept while studying Kostrikov et al. (2021) on implicit Q learning for offline reinforcement learning together with Daekyu who is visiting me from Samsung. in their paper, Kostrikov et al. present the following loss function to estimate the $\tau$-th expectile of a random variable $X$: $$\arg\min_{m_{\tau}} \mathbb{E}_{x \sim X}\left[ L_2^\tau (x – m_{\tau}) \right],$$ where $L_2^\tau(u) =
Defining emergence
so, apparently, emergence has become a hot topic on twitter while i was away in Kigali attending ICLR, moto-taxing in Kigali, injuring myself and breaking my phone running and tracking, seeing a majestic group of gorillas and being back at AIMS Rwanda after 4 years. the mountain gorillas were majestic. i do not want to discuss any particular paper/tweet/blog, because this topic seems to attract a weird set of people arguing for weird things, when in fact there are just a couple of different views into a single phenomenon, which is only natural in science and engineering. that said, if
When do duplicates/frequencies matter in classification?
an interesting urban legend or wisdom is that a classifier we train will work better on examples that appear more frequently in the training set than on those that are rare. that is, the existence of duplicates or near-duplicates in the training set affects the decision boundary learned by a classifier. for instance, imagine training a face detector for your phone’s camera in order to determine which filter (one optimized for portraits and the other for other types of pictures). if most of the training examples for building such face detector were taken in bright day light, one often without