Because the coaching prices of machine studying fashions rise [1], continuous studying (CL) emerges as a helpful countermeasure. In CL, a machine studying mannequin (e.g., a LLM akin to GPT), is educated on a frequently arriving stream of knowledge (e.g., textual content knowledge). Crucially, in CL the info can’t be saved, and thus solely the newest knowledge is obtainable for coaching. The principle problem is then to coach on the present knowledge (typically referred to as activity) whereas not forgetting the information discovered from the previous duties. Not forgetting previous information is essential as a result of at test-time, the mannequin is evaluated on the test-data of all seen duties. That problem is usually described as catastrophic forgetting within the literature, and is a part of the stability-plasticity tradeoff.
One the one hand, the stability-plasticity tradeoff refers to conserving community parameters (e.g., layer weights) steady to not overlook (stability). Alternatively, it means to permit parameter adjustments as a way to be taught from novel duties (plasticity). CL strategies method this tradeoff from a number of instructions, which I’ve written about in a earlier article.