Update, 9 June 2021: Google reports this week in the journal Nature that its next generation AI chip, succeeding the TPU version 4, was designed in part using an AI that researchers described to IEEE Spectrum last year. They’ve made some improvements since Spectrum last spoke to them. The AI now needs fewer than six hours to generate chip floorplans that match or beat human-produced designs at power consumption, performance, and area. Expert humans typically need months of iteration to do this task.
Original blog post from 23 March 2020 follows:
There’s been a lot of intense and well-funded work developing chips that are specially designed to perform AI algorithms faster and more efficiently. The trouble is that it takes years to design a chip, and the universe of machine learning algorithms moves a lot faster than that. Ideally you want a chip that’s optimized to do today’s AI, not the AI of two to five years ago. Google’s solution: have an AI design the AI chip.
“We believe that it is AI itself that will provide the means to shorten the chip design cycle, creating a symbiotic relationship between hardware and AI, with each fueling advances in the other,” they write in a paper describing the work that posted today to Arxiv.
“We have already seen that there are algorithms or neural network architectures that… don’t perform as well on existing generations of accelerators, because the accelerators were designed like two years ago, and back then these neural nets didn’t exist,” says Azalia Mirhoseini, a senior research scientist at Google. “If we reduce the design cycle, we can bridge the gap.”
Mirhoseini and senior software engineer Anna Goldie have come up with a neural network that learn to do a particularly time-consuming part of design called placement. After studying chip designs long enough, it can produce a design for a Google Tensor Processing Unit in less than 24 hours that beats several weeks-worth of design effort by human experts in terms of power, performance, and area.
Placement is so complex and time-consuming because it involves placing blocks of logic and memory or clusters of those blocks called macros in such a way that power and performance are maximized and the area of the chip is minimized. Heightening the challenge is the requirement that all this happen while at the same time obeying rules about the density of interconnects. Goldie and Mirhoseini targeted chip placement, because even with today’s advanced tools, it takes a human expert weeks of iteration to produce an acceptable design.
Goldie and Mirhoseini modeled chip placement as a reinforcement learning problem. Reinforcement learning systems, unlike typical deep learning, do not train on a large set of labeled data. Instead, they learn by doing, adjusting the parameters in their networks according to a reward signal when they succeed. In this case, the reward was a proxy measure of a combination of power reduction, performance improvement, and area reduction. As a result, the placement-bot becomes better at its task the more designs it does.
The team hopes AI systems like theirs will lead to the design of “more chips in the same time period, and also chips that run faster, use less power, cost less to build, and use less area,” says Goldie.
Source: IEEE Spectrum