A random thought on retrieval-augmented generation

retrieval-augmented generation (RAG) is all the rage in the world of LLM’s (i heard.) RAG confuses me quite a bit, since it’s unclear to me how RAG should work. in particular, i have a major confusion in how language models should be trained to be good at retrieval augmented generation. it’s a simple confusion, and let me describe it here. let $D$ be an entire training corpus i have prepared to train a language model. a naive way to train a language model is to \[\max_{\theta} \sum_{x \in D} \log p_{\theta}(x).\] this whole process of learning can be thought of

[TMLR] how to check the difference between revisions

Together with Openreview, TMLR strives to provide as much useful information as possible to reviewers and action editors in order to improve the quality of reviewing and publication. As part of this effort, we provide a way for reviewers as well as action editors to easily compare revisions of their assigned submission throughout the process of reviewing. Here, we give you a brief instruction on how to do so. First, go to your assigned submission. Here, I’m using an already-accepted paper at TMLR. On the submission page, you will see “Show Revisions” button below the title: If you click “Show

[TMLR] how to set your availability as a reviewer/action editor

although this is already documented on TMLR’s homepage (https://tmlr.org) and is quite visible from the Openreview’s reviewer/action editor console, i’m writing this short post as one of the Editors-in-Chief of TMLR to encourage reviewers and action editors to set and keep their availability up-to-date on Openreview. when you go to https://openreview.net/, you see “TMLR” as one of the active venues, as shown in the screenshot below. if not, you can go directly to the TMLR page by going to https://openreview.net/group?id=TMLR. when you log in to Openreview at TMLR, you will see a link to your own console. if you’re a