Published On Mar 12, 2024
This video is about mergekit, how to choose and blend models. It's non technical but links to technical papers are included. You need to know how to navigate the terminal but no programming is required.
🤖 Join my Discord community: / discord
📰 My tutorials on Medium: / mayaakim
🐦 My twitter profile: / maya_akim
To rent a GPU from Massed Compute (mergekit preinstalled) follow the link ⤵️
https://bit.ly/maya-akim
Code for 50% discount: MayaAkim
All links:
mergekit:
https://github.com/arcee-ai/mergekit
Open LLM Leaderboard
https://huggingface.co/spaces/Hugging...
my huggingface profile (with model configs you can copy):
https://huggingface.co/mayacinka
git installation:
https://gitforwindows.org/
lfs installation:
https://docs.github.com/en/repositori...
supported architecture for mergekit:
https://github.com/arcee-ai/mergekit/...
best blog about mergekit:
/ merge-large-language-models-with-mergekit
other really good blog about mergekit:
/ merge-large-language-models
Charles Goddard’s blog: (author of mergekit)
https://goddard.blog/about/
Mona lisa with Mohawk
https://www.designboom.com/technology...
What is YAML:
https://www.techtarget.com/searchitop...
What is Data Contamination:
https://bdtechtalks.com/2023/07/17/ll...
Goodharts law
https://www.cna.org/reports/2022/09/g...
LazyMergekit:
https://colab.research.google.com/dri...
Auto evaluation: (requires runpod profile)
https://colab.research.google.com/dri...
configuration with 14 models merged:
https://huggingface.co/EmbeddedLLM/Mi...
MoE instructions:
https://github.com/arcee-ai/mergekit/...
higher density - better results
https://github.com/arcee-ai/mergekit/...
Model family tree: (visualization)
https://colab.research.google.com/dri...
https://huggingface.co/spaces/mlabonn...
cost of training mistral:
https://www.ft.com/content/387eeeab-1...
Leaderboard is disgusting:
/ open_llm_leaderboard_is_disgusting
Merging models with different architectures:
https://arxiv.org/pdf/2401.10491.pdf
merging models different arch:
https://github.com/18907305772/FuseLLM
Blending is all you need:
https://arxiv.org/pdf/2401.02994.pdf
Model soups
https://arxiv.org/pdf/2203.05482.pdf
Ties-merging research paper:
https://arxiv.org/pdf/2306.01708.pdf
Dare merge research paper:
https://arxiv.org/pdf/2311.03099.pdf
Task arithemtic:
https://arxiv.org/pdf/2212.04089.pdf
Benchmarks
Arc benchmarks
https://deepgram.com/learn/arc-llm-be...
https://arxiv.org/pdf/1803.05457.pdf
HellaSwag
https://arxiv.org/pdf/1905.07830.pdf
MMLU
https://arxiv.org/pdf/2009.03300.pdf
TrithfulQA
https://arxiv.org/abs/2109.07958
WinoGrande
https://arxiv.org/pdf/1907.10641.pdf
GSM8K
https://arxiv.org/pdf/2110.14168.pdf
overfitting problem Ann Lotz:
https://arstechnica.com/tech-policy/2...
Benchmarks are a problem screenshots:
https://analyticsindiamag.com/the-pro...
/ llm_benchmarks_are_broken_what_can_we_do_t...
/ llm_benchmarks_are_bullshit
Attributions:
[https://commons.wikimedia.org/wiki/Fi...](https://commons.wikimedia.org/wiki/Fi...)
Timecodes:
0:00 - 1:47 - blending intro
1:48 - 3:36 - promise of blending
3:37 - 4:22 - blending steps and requirements
4:23 - 5:05 - all you need is hardware
5:06 - 5:30 - mergekit installation
5:31 - 9:23 - merge methods
10:48 - 13:31 - configurations and yaml
13:32 - 14:38 - how to run merge
14:39 - 14:42 - upload merged model
14:43 - 16:27 - best merge method
16:28 - 20:16 benchmark problems, overfitting and contamination
#mergekit #llm #localmodels