How Google makes custom cloud chips that power Apple AI and Gemini

0
6
How Google makes custom cloud chips that power Apple AI and Gemini

in a huge laboratory Google At our headquarters in Mountain View, Calif., hundreds of server racks buzz across several aisles, performing tasks far less demanding than running the world's dominant search engine or executing workloads for Google Cloud's millions of customers. universal.

Instead, they run the tests on Google's own microchip, called a tensor processing unit (TPU).

Google's TPU was initially trained for on-premises workloads and has been available to cloud customers since 2018. apple Revealed that it uses TPU to train the artificial intelligence model that powers Apple Intelligence. Google also relies on TPUs to train and run its Gemini chatbot.

“There is a fundamental belief in the world that all artificial intelligence, large language models are being trained NVIDIAand of course Nvidia has the lion's share of the training volume. But Google went its own way here.

Google is the first cloud provider to produce customized artificial intelligence chips. Three years later, Amazon Web Services announced the launch of Inferentia, the first cloud artificial intelligence chip. MicrosoftThe first customized AI chip, Maia, will not be released until the end of 2023.

But being No. 1 in AI chips hasn’t translated into leadership in the fierce competition across the board for generative AI. Google has been criticized for poor product launches, and Gemini arrives more than a year after OpenAI's ChatGPT.

However, Google Cloud's momentum is partly due to its artificial intelligence products. Google parent company Alphabet reported cloud revenue grew 29% in the latest quarter, with quarterly revenue exceeding $10 billion for the first time.

“The artificial intelligence cloud era has completely changed the way people look at companies, and this chip differentiation, the TPU itself, may be one of the biggest reasons why Google has moved from the third cloud to true parity. In the eyes of some people, maybe its The AI ​​capabilities are even ahead of the other two clouds,” Newman said.

“A simple but powerful thought experiment”

In July, CNBC visited Google's chip laboratory for the first time through the camera and interviewed Amin Vahdat, the head of customized cloud chips. He was already working at Google in 2014, when the company first considered the idea of ​​making chips.

On July 23, 2024, Amin Vahdat, Google's Vice President of Machine Learning, Systems and Cloud AI, demonstrated TPU version 4 at Google's headquarters in Mountain View, California.

Mark Ganley

“It all started with a simple but powerful thought experiment,” Wahdat said. “Some leaders in the company asked the question: What happens if Google users only want to interact with Google via voice for 30 seconds a day? How much computing power do we need to support our users?”

The group determined that Google needed Double the number of computers in the data center. So they look for better solutions.

“We realized we could build custom hardware, not general-purpose hardware, but custom hardware (in this case, tensor processing units) to support this more efficiently. In fact, more efficiently than other The hardware is 100 times higher,” Wahda said.

Google data centers still rely on general-purpose central processing units (CPUs) and Nvidia's graphics processing units (GPUs). Google's TPU is a different type of chip, called an application specific integrated circuit (ASIC), that is tailored for a specific purpose. TPU focuses on artificial intelligence. Google makes another video-focused ASIC called the Video Coding Unit.

Google also produces customized chips for its devices, similar to Apple's custom chip strategy. The Tensor G4 powers Google's new AI-enabled Pixel 9, and its new A1 chip powers the Pixel Buds Pro 2.

However, TPU is what sets Google apart. Launched in 2015, it was the first of its kind. According to Future Group.

Google coined the term after the algebraic term “tensor,” which refers to the large-scale matrix multiplications that occur rapidly in advanced artificial intelligence applications.

With the release of its second TPU in 2018, Google expanded its focus from inference to training and made its cloud customers able to use them to run workloads alongside market-leading chips like Nvidia's GPUs.

“If you use GPUs, they are more programmable and more flexible,” said Stacy Rasgon, senior semiconductor analyst at Bernstein Research. “But their supply has been limited. nervous.

The artificial intelligence boom has sent Nvidia's stock price soaring, with the chipmaker's market value jumping to $3 trillion in June, surpassing Alphabet and vying with Apple and Microsoft to become the world's most valuable public company.

“Frankly, these specialized AI accelerators are not nearly as flexible or powerful as Nvidia's platform, and that's what the market is waiting to see: Can anyone make a difference in this space?” Newman said.

Now that we know Apple is using Google's TPUs to train its AI models, the real test will come as these full AI capabilities roll out on iPhone and Mac next year.

Broadcom and TSMC

Developing a replacement for Nvidia's artificial intelligence engine is no easy task. Google's sixth-generation TPU, called Trillium, will launch later this year.

Google demonstrated the sixth version of its TPU, Trillium, to CNBC on July 23, 2024 in Mountain View, California.

Mark Ganley

“It's expensive. You need a lot of scale,” Rasgon said. “So it's not something that everyone can do. But these hyperscalers, they have the scale, the capital and the resources to go down this path.”

This process is too complex and costly for even very large enterprises to complete alone. Since the launch of the first TPU, Google has been working with Broadcoma chip developer, also helps Yuan Design its artificial intelligence chip. Broadcom Says it has spent more than $3 billion Make these partnerships possible.

“AI chips — they're very complex. There's a lot of stuff in them. So Google brings computing,” Rasgon said. “Broadcom is responsible for all peripherals. They are responsible for I/O and deserializerall the different parts surrounding that calculation. They are also responsible for packaging.

The final design is then sent to a fabrication facility, or fab, for manufacturing—primarily facilities owned by the world's largest chipmakers, British Semiconductorwhich produces 92% of the world's most advanced semiconductors.

Asked whether Google had any safeguards in place if the worst happened in the geopolitical realm between China and Taiwan, Wahda said: “It's certainly something we've prepared for and considered, but Let's hope this isn't actually a thing.

Guarding against these risks is the main reason why the White House is providing $52 billion in funding under the CHIPS Act to companies building wafer fabs in the United States. largest portion So far, this has been true for Intel, TSMC and Samsung.

Processor and power supply

Google showed off its new Axion CPU to CNBC,

Mark Ganley

“Now we can bring in the last piece of the puzzle, which is the CPU,” Vahdat said. “Many of our in-house services, whether Big querywhether it is wrenchYouTube ads and more all run on Axion.

Google was late to the CPU game. Amazon launched the Graviton processor in 2018. Alibaba Server chips to be launched in 2021.

When asked why Google didn't produce CPUs earlier, Vahdat said: “Our focus has always been to provide the greatest value to our customers, and that starts with TPUs, our video encoding units and our network. We really Think now is the time.

All these processors from non-chip manufacturers (including Google) are powered by arm Silicon architecture – a more customizable, power-efficient alternative to traditional x86 models Intel and AMD. Power efficiency is critical as AI servers are expected to run out of juice by 2027 as much power as a country every year Just like Argentina. Google's latest environmental protection Report Emissions are shown to have increased by nearly 50% from 2019 to 2023, in part due to the growth of data centers that power artificial intelligence.

“Without the efficiency of these wafers, these numbers could be in a very different place,” Wahdat said. “We remain committed to actually driving the carbon footprint of our infrastructure (24/7) and keeping it Push to zero.”

Large amounts of water are needed to cool the servers that train and run artificial intelligence. That's why Google's third-generation TPUs are starting to use direct wafer cooling, which uses far less water. This is also the case Nvidia is cooling off its latest Blackwell GPU.

Despite challenges ranging from geopolitics to electricity and water, Google remains committed to developing its generative artificial intelligence tools and manufacturing its own chips.

“I've never seen anything like it, and there's no sign of it slowing down,” Wahdat said. “Hardware is going to play a very important role there.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here