Typically, code’s performance on a given computer chip is estimated using performance models that test the code on a variety of architectures, after which compilers are used to automatically optimize the code for those architectures. But those performance models are imperfect, and reality can significantly diverge from their estimates, leading to inefficiencies and shortfalls. Now, MIT has highlighted a new tool by its researchers that uses machine learning to more accurately, easily and quickly predict code performance on computer chips.
In essence, the new automated pipeline – called “Ithemal” – trains itself on labeled snippets of code (“basic blocks”) and uses that training to predict how long it will take a chip to execute other, unknown blocks. This helps developers work to optimize their code on diverse “black box” chip designs where exact specifications are often unknown. The researchers also developed a benchmark suite of some 300,000 basic blocks, calling the open-source dataset “BHive.”
“Modern computer processors are opaque, horrendously complicated, and difficult to understand. It is also incredibly challenging to write computer code that executes as fast as possible for these processors,” said co-author Michael Carbin, an assistant professor at MIT and a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “This tool is a big step forward toward fully modeling the performance of these chips for improved efficiency.”
By way of example, MIT highlights that Intel provides thousands of pages of documentation describing its chips’ architectures, as well as providing its own performance models. “Intel’s documents are neither error-free nor complete, and Intel will omit certain things, because it’s proprietary,” said co-author Charith Mendis. “However, when you use data, you don’t need to know the documentation. If there’s something hidden you can learn it directly from the data.” So, when Ithemal was tested on Intel chips, it predicted code performance better than one of Intel’s own models. More broadly, they found that Ithemal reduced prediction error rates by around 50 percent compared to traditional models.
Flexibility is also key to the tool. “If you want to train a model on some new architecture, you just collect more data from that architecture, run it through our profiler, use that information to train Ithemal, and now you have a model that predicts performance,” Mendis said. Along those lines, the team also recently introduced a technique for automatically generating “Vemal” algorithms that convert code into vectors for parallel computing, again outperforming popular hand-crafted solutions.
As they move forward, the research team is working to make Ithemal less of a black box model. “Our model is saying it takes a processor, say, 10 cycles to execute a basic block. Now, we’re trying to figure out why,” Carbin said. “That’s a fine level of granularity that would be amazing for these types of tools.”