Published On Sep 8, 2024
In this video, we’ll dive into the white paper that covers the details of our new open model family, Jamba 1.5, including the novel Transformer-Mamba architecture they’re built with, with a focus on its combination of Transformer, Mamba, and Mixture of Experts layers. We will also cover the new quantization technique, ExpertsInt8, which allowed for improved serving.
Join me as we break down the key concepts and benefits of these innovative models.
The white paper:
https://arxiv.org/pdf/2408.12570
Previous white paper video:
• Jamba: A Hybrid Transformer-Mamba Lan...
Previous white paper:
https://arxiv.org/pdf/2403.19887
ExpertInt8 commit in vllm:
https://github.com/vllm-project/vllm/...
RULER benchmark:
https://arxiv.org/pdf/2404.06654