Large Model Robotic Arm 2.0: Seamlessly Performs Complex Tea-Making Tasks with Voice Commands!
Orbbec Orbbec
796 subscribers
142 views
8

 Published On Jul 16, 2024

Recently, the Orbbec R&D team integrated cutting-edge multimodal large language models to launch the latest 2.0 version of the large model robotic arm demonstration solution. This new solution is equipped with Orbbec’s latest depth cameras, the Gemini 335L and Femto Bolt. It can automatically execute a series of complex tasks such as making tea, arranging flowers, diffusing essential oils, and playing music based on voice commands. Previously, at the end of 2023, Orbbec successfully released the large model robotic arm 1.0, garnering widespread attention in the industry.

The Orbbec 2.0 large model robotic arm combines multimodal large language models(voice, text, vision) with robotic arm control technology to generate spatial semantic information. This helps the robotic arm to accurately understand common objects in daily life and execute corresponding actions. It can identify objects including household items, food, and industrial parts.

Take the highly challenging tea-making task as an example. The difficulty lies in the long process, which involves many steps requiring precise execution and logical coherence between them. The Orbbec R&D team uses the high-precision Gemini 335L and Femto Bolt cameras to achieve accurate positioning of target grasping poses. Combined with the understanding capabilities of the large model, after long-term algorithm optimization and simulation environment debugging, it finally realizes the understanding, planning, and automatic execution of complex tasks such as tea making.
Compared to version 1.0, the 2.0 version of the large model robotic arm has achieved upgrades in the following aspects:
● Language Model: Significant improvement in natural language processing and understanding capabilities, allowing the robotic arm to more accurately understand and execute abstract language commands.
● Planning Ability: Remarkable enhancement in complex task planning ability, accurately understanding and executing high-level tasks.
● Response Speed: Overall execution efficiency optimized, greatly reducing task understanding and planning time.
● Grasping Ability: Upgraded to a gripper design, capable of accurately recognizing and classifying different objects, adapting to more diverse tasks and environments.
● Perception Ability: Equipped with Gemini 335L and Femto Bolt cameras, with complementary binocular 3D camera and ToF camera providing higher resolution and precision in 3D vision perception.

show more

Share/Embed