Merge LLMs to Make Best Performing AI Model

35.2K subscribers

41,346 views

About
Share

Published On Mar 12, 2024

This video is about mergekit, how to choose and blend models. It's non technical but links to technical papers are included. You need to know how to navigate the terminal but no programming is required.

🤖 Join my Discord community:   / discord
📰 My tutorials on Medium:   / mayaakim
🐦 My twitter profile:   / maya_akim

To rent a GPU from Massed Compute (mergekit preinstalled) follow the link ⤵️
https://bit.ly/maya-akim
Code for 50% discount: MayaAkim

All links:

mergekit:
https://github.com/arcee-ai/mergekit

Open LLM Leaderboard
https://huggingface.co/spaces/Hugging...

my huggingface profile (with model configs you can copy):
https://huggingface.co/mayacinka

git installation:
https://gitforwindows.org/

lfs installation:
https://docs.github.com/en/repositori...

supported architecture for mergekit:
https://github.com/arcee-ai/mergekit/...

best blog about mergekit:
  / merge-large-language-models-with-mergekit

other really good blog about mergekit:
  / merge-large-language-models

Charles Goddard’s blog: (author of mergekit)
https://goddard.blog/about/

Mona lisa with Mohawk
https://www.designboom.com/technology...

What is YAML:
https://www.techtarget.com/searchitop...

What is Data Contamination:
https://bdtechtalks.com/2023/07/17/ll...

Goodharts law
https://www.cna.org/reports/2022/09/g...

LazyMergekit:
https://colab.research.google.com/dri...

Auto evaluation: (requires runpod profile)
https://colab.research.google.com/dri...

configuration with 14 models merged:
https://huggingface.co/EmbeddedLLM/Mi...

MoE instructions:
https://github.com/arcee-ai/mergekit/...

higher density - better results
https://github.com/arcee-ai/mergekit/...

Model family tree: (visualization)
https://colab.research.google.com/dri...
https://huggingface.co/spaces/mlabonn...

cost of training mistral:
https://www.ft.com/content/387eeeab-1...

Leaderboard is disgusting:
  / open_llm_leaderboard_is_disgusting

Merging models with different architectures:
https://arxiv.org/pdf/2401.10491.pdf

merging models different arch:
https://github.com/18907305772/FuseLLM

Blending is all you need:
https://arxiv.org/pdf/2401.02994.pdf

Model soups
https://arxiv.org/pdf/2203.05482.pdf

Ties-merging research paper:
https://arxiv.org/pdf/2306.01708.pdf

Dare merge research paper:
https://arxiv.org/pdf/2311.03099.pdf

Task arithemtic:
https://arxiv.org/pdf/2212.04089.pdf

Benchmarks

Arc benchmarks
https://deepgram.com/learn/arc-llm-be...
https://arxiv.org/pdf/1803.05457.pdf

HellaSwag
https://arxiv.org/pdf/1905.07830.pdf

MMLU
https://arxiv.org/pdf/2009.03300.pdf

TrithfulQA
https://arxiv.org/abs/2109.07958

WinoGrande
https://arxiv.org/pdf/1907.10641.pdf

GSM8K
https://arxiv.org/pdf/2110.14168.pdf

overfitting problem Ann Lotz:
https://arstechnica.com/tech-policy/2...

Benchmarks are a problem screenshots:

https://analyticsindiamag.com/the-pro...
  / llm_benchmarks_are_broken_what_can_we_do_t...
  / llm_benchmarks_are_bullshit

Attributions:
[https://commons.wikimedia.org/wiki/Fi...](https://commons.wikimedia.org/wiki/Fi...)

Timecodes:
0:00 - 1:47 - blending intro
1:48 - 3:36 - promise of blending
3:37 - 4:22 - blending steps and requirements
4:23 - 5:05 - all you need is hardware
5:06 - 5:30 - mergekit installation
5:31 - 9:23 - merge methods
10:48 - 13:31 - configurations and yaml
13:32 - 14:38 - how to run merge
14:39 - 14:42 - upload merged model
14:43 - 16:27 - best merge method
16:28 - 20:16 benchmark problems, overfitting and contamination

#mergekit #llm #localmodels

Published On Mar 12, 2024

Share/Embed

Video Link