Using GPT-4o to train a 2,000,000x smaller model (that runs directly on device)
Edge Impulse Edge Impulse
11.2K subscribers
129,600 views
0

 Published On May 29, 2024

More here: https://www.edgeimpulse.com/blog/llm-...

The latest generation LLMs are absolutely astonishing — thanks to their multi-modal capabilities you can ask questions in natural language about stuff you can see or hear in the real world ("is there a person without a hard hat standing close to a machine?") and get relatively fast and reliable answers. But these large LLMs have downsides; they're absolutely huge, so you need to run them in the cloud, adding high latency (often seconds per inference), high cost (think about the tokens you'll burn when running inference 24/7), and high power (need a constant network connection).

In this video we're distilling knowledge from a large multimodal LLM (GPT-4o) and putting it in a tiny model, which we can run directly on device; for ultra-low latency, and without the need for a network connection, scaling to even microcontrollers with kilobytes of RAM if needed. Training was done fully unsupervised, all labels were set by GPT-4o, including deciding when to throw out data, then trained onto a transfer learning model w/ default settings.

One of the models we train has 800K parameters (an NVIDIA TAO model with MobileNet backend), a cool 2,200,000x fewer parameters than GPT-4o :-) with similar accuracy on this very narrow and specific task.

The GPT-4o labeling block and TAO transfer learning models are available for any enterprise customers in Edge Impulse. There's a 2-week free trial available, sign up at https://edgeimpulse.com !

show more

Share/Embed