####Readings
An Explanation of In-context Learning as Implicit Bayesian Inference
The LM learns from examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, author study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, ==in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt.== We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning1. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.
GPT , GPT2
Projects
PCW :![Screenshot 2023-07-31 at 19.24.47](../../../../Library/Application Support/typora-user-images/Screenshot 2023-07-31 at 19.24.47.png)
Permutation Invariance
In ICL, since there is no natural ordering between context examples, we have prior knowledge that the output embedding $z_1, \ldots, z_n=f\left(\tilde{x}_1, \ldots, \tilde{x}_n\right)$ should be permutation equivariant, while the previous approach by concatenating them into a sequence $\tilde{x}=\left[\tilde{x}_1, \ldots, \tilde{x}_n\right]$ introduces unnatural ordering (i.e., distribution shift) and sacrifice the symmetries between $\tilde{x}_1, \ldots, \tilde{x}_n$.
![Screenshot 2023-07-31 at 19.22.39](../../../../Library/Application Support/typora-user-images/Screenshot 2023-07-31 at 19.22.39.png)