Heavily WIP. May never be finished. Oh well!
An (attempted!) implementation of "test-time training" language models for long context in PyTorch, where an inner memory module learns to encode memory over the sequence by optimizing its weights as it processes tokens, introduced by Learning to (Learn at Test Time): RNNs with Expressive Hidden States by Sun, et. al.
This likely won't work because I am not good at computer, but it's worth a shot anyway.
This framework will use the Hugging Face ecosystem, including the Transformers Trainer. Easier this way.
The usual.
pip3 install -r requirements.txt
Two libraries in the requirements file are custom kernels from others:
Please make sure your GPUs can support these libraries. I'll add alternative native versions later.
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States
- Titans: Learning to Memorize at Test Time
- Test-Time Training Done Right
- ATLAS: Learning to Optimally Memorize the Context at Test Time
- TNT: Improving Chunkwise Training For Test-Time Memorization
- ViT³: Unlocking Test-Time Training in Vision
Will add more papers as I find them.