NVIDIA Corp (NVDA): Tech Industry Artificial Intelligence...

Reply Private New

Next 10 Prev Next

Send PM Follow Ignore

Followers	14
Posts	1826
Boards Moderated	0
Alias Born	06/15/2011

Dallas-Cowboys

Re: None

Wednesday, 04/10/2024 7:44:54 PM

Wednesday, April 10, 2024 7:44:54 PM

Tech Industry Artificial Intelligence
Elon Musk says the next-generation Grok 3 model will require 100,000 Nvidia H100 GPUs to train
News
By Anton Shilov published yesterday
GPU shortages and power are the two main obstacles for AI development.
Nvidia GH200 SC23 Announcement
(Image credit: Nvidia)
Elon Musk, CEO of Tesla and founder of xAI, made some bold predictions about the development of artificial general intelligence (AGI) and discussed the challenges facing the AI industry. He predicts that AGI could surpass human intelligence as soon as next year or by 2026, but that it will take an extreme number of processors to train, which in turn requires huge amounts of electricity, reports Reuters.

Musk's venture, xAI, is currently training the second version of its Grok large language mode and expects to complete its next training phase by May. The training of Grok's version 2 model required as many as 20,000 Nvidia H100 GPUs, and Musk anticipates that future iterations will demand even greater resources, with the Grok 3 model needing around 100,000 Nvidia H100 chips to train.
Advertisement
The advancement of AI technology, according to Musk, is currently hampered by two main factors: supply shortages on advanced processors — like Nvidia's H100, as it's not easy to get 100,000 of them quickly — and the availability of electricity.
Nvidia's H100 GPU consumes around 700W when fully utilized, and thus 100,000 GPUs for AI and HPC workloads could consume a whopping 70 megawatts of power. Since these GPUs need servers and cooling to operate, it's safe to say that a datacenter with 100,000 Nvidia H100 processors will consume around 100 megawatts of power. That's comparable to the power consumption of a small city.

Musk stressed that while the compute GPU supply has been a significant obstacle so far, the supply of electricity will become increasingly critical in the next year or two. This dual constraint underscores the challenges of scaling AI technologies to meet growing computational demands.

Despite the challenges, advancements in compute and memory architectures will enable the training of increasingly massive large language models (LLMs) in the coming years. Nvidia revealed its Blackwell B200 at GTC 2024, a GPU architecture and platform that's designed to scale to LLMs with trillions of parameters. This will play a critical role in development of AGI.

In fact, Musk believes than an artificial intelligence smarter than the smartest human will emerge in the next year or two. "If you define AGI as smarter than the smartest human, I think it is probably next year, within two years," Musk said in an interview on X Spaces. That means it's apparently time to go watch Terminator again, and hope that our future AGI overlords will be nicer than Skynet.