30s Summary
On October 15, Nvidia released a new AI model, Llama-3.1-Nemotron-70B-Instruct, which outperforms leading AI systems like GPT-4o and Claude-3, according to the company. Nvidia enhanced Meta’s open-source model (Llama-3.1-70B-Instruct) using advanced tuning techniques and their high-end AI hardware. The performance of AI models is measured through benchmarking, and Nvidia’s model allegedly surpasses its competition significantly. Though Nvidia’s model currently doesn’t appear on the Chatbot Arena Leaderboard, it supposedly scored an 85 on an automated difficulty test, positioning it at the top.
Full Article
On October 15, Nvidia snuck in with a launch of a new artificial intelligence (AI) model. They claim it runs circles around top-of-the-line AI systems like GPT-4o and Claude-3.
Nvidia mentioned on a social media post that their fresh model, named Llama-3.1-Nemotron-70B-Instruct, is rated at the top in the AI Chatbot Arena of lmarena.AI.
Llama-3.1-Nemotron-70B-Instruct comes from tinkering with Meta’s open-source Llama-3.1-70B-Instruct. Nvidia gives it the “Nemotron” part of the name, representing their input to the final product.
Meta’s team of AI models, nicknamed the Llama “herd,” are used as open-source stepping stones for developers to innovate on. Nvidia took this as a challenge and went about creating a model that strives to be more “helpful” than well-known models, like OpenAI’s ChatGPT and Anthropic’s Claude-3.
To boost Meta’s original model, Nvidia paired carefully selected datasets with advanced fine-tuning techniques, backed up by their top-notch AI hardware. This created potentially the most “helpful” AI model around.
What makes AI “the best” can be somewhat blurry because there is no one-size-fits-all measure. It is not as simple as taking temperature readings on a thermometer, ie one clear “truth.” Assessing an AI model’s performance is more akin to evaluating a person, using comparative testing.
AI benchmarking examines how different AI models handle the same tasks, questions or problems and compares how useful the results are. This often involves human testers judging machine performance through blind evaluations due to the subjective nature of what counts as useful.
Nvidia suggests that their new Nemotron model outdoes key competitors like GPT-4o and Claude-3 by quite a distance.
The image enclosed shows the Chatbot Arena Leaderboard, where AI models compete on an automated “difficult” test. Nvidia’s Llama-3.1-Nemotron-70B-Instruct may not be appearing on the leaderboard, but if we take Nvidia’s word for scoring an 85 in this test, it outright becomes the top model in this section.
What’s even more fascinating is that Llama-3.1-70B is just a mid-level open-source AI model by Meta. They have a much larger model, the 405B version, with the number indicating how many billion parameters were trained with it. In contrast, GPT-4o is believed to be developed with over a trillion parameters.
Source: Cointelegraph