Have you ever noticed that your AI chatbot get lost in the middle of a conversation, or it simply says it cannot handle prompts that are too long? Well, that is because each model has a limitation in its processing capabilities, and starts to suffer once it goes over that limit —pretty much like they suffered from some kind of a digital attention deficit disorder. But this could soon change thanks to a new method for supercharging LLM capabilities.
Current LLMs have limited context capacities. For example, ChatGPT taps just 8,000 tokens of context, while Claude handles 100,000. Tokens are the basic units of text or code used by an LLM AI to process and generate language This restricts how much background information they can harness when formulating replies. Abacus AI has developed a method that allegedly doubles the usable context length for open-source LLMs like Meta’s Llama without compromising the model’s accuracy in practical application.
Their technique involves “scaling” the position embeddings that track word locations in input texts. According to their Github page, Abacus AI claims that its scaling method drastically increases the number of tokens that a model can handle.
The researchers evaluated two scaled LlaMA variants on tasks like substring location and open-book QA. The scale 16 model maintained accuracy on real-world examples up to 16,000-word contexts, versus only 2,000 words in baseline Llama. It even showed some coherence at 20,000+ words, something that was not possible to achieve with just fine-tuning techniques.
The significance of context extension cannot be overstated. A narrow context window makes the model accurate but not really usable in complex tasks that require some background. Conversely, with an expanded context, LLMs can process and generate better responses but either take more time to do so or return sup-par results. Handling longer contexts efficiently could enable LLMs to absorb whole documents or multiple documents as background when generating text. This may lead to outputs that are more knowledge-grounded and consistent across long conversations.
However, the gains are not perfectly proportional to the scale factors.
It’s still necessary to fine tune strategies because scaling alone doesn’t guarantee high quality outputs. The Abacus team is also exploring advanced position encoding schemes from recent papers to further extend context capacity.
Their work suggests that scaling up existing LLMs is a viable path to expanding usable context length. This could democratize access to Large Language Models capable of handling lots of context at once.
Abacus AI has opened the doors of their repository “for research purposes only,” sharing code specific to their fine-tuning projects. This makes it possible to further iterate on its development and apply the fine tuning methods on virtually any open source Large Language Model.
With applications from personalized chatbots to creative writing aids, more memory-empowered LLMs could soon enable next-generation AI assistants that are conversant across diverse topics. For now, researchers are progressing rapidly to overcome technical constraints in pursuit of artificial general intelligence —meaning, generalized human cognitive abilities in an AI model. Maybe someday our digital friends will handle as many tabs as we humans can, but without the headache!
Stay on top of crypto news, get daily updates in your inbox.
Source: https://decrypt.co/151392/ai-researchers-can-double-llm-efficiency