Use of optical neural networks in AI is an attractive idea that has long spurred significant work. Potential advantages include low power, high speed, and the ability to handle greater complexity. However, achieving this ideal is challenging, not least because manufacturing imperfections can degrade accuracy. Intel has proposed a directional approach for mitigating this problem in a blog posted last week and a paper published early in the month.
This excerpt from the blog nicely captures the work. It’s written by Casimir Wierzynski, senior director, office of the CTO, artificial intelligence products group.
“We considered two architectures for building an optical neural network engine out of MZIs (Mach-Zehnder inferometer). One of them, which we called GridNet, arranges the MZIs in a grid; the other, which we called FFTNet, arranges the MZIs in a butterfly-like pattern modelled after architectures for computing Fast Fourier Transforms (but in our case the weights are learned from data, so the computation will not, in general, be an actual FFT). We then trained these two architectures in a software simulation on a benchmark deep learning task of handwritten digit recognition (MNIST).
“We found that in the case of double-precision floating point accuracy, GridNet achieved higher accuracy than FFTNet (~98% vs ~95%). However, we found that FFTNet was significantly more robust to manufacturing imprecision, which we simulated by adding noise to the amount of phase-shifting and transmittance of each MZI. After setting these noise levels to realistic levels, GridNet’s performance fell below 50% while FFTNet’s remained nearly constant.
“If ONNs are to become a viable piece of the AI hardware ecosystem, they will need to scale up to larger circuits and industrial manufacturing techniques. Our finding addresses both of these issues. Larger circuits will require more devices, such as MZIs, per chip. Therefore, attempting to “fine tune” each device on a chip post-manufacturing will be a growing challenge. A more scalable strategy will be to train ONNs in software, then mass produce circuits based on those parameters. Our results suggest that choosing the right architecture in advance can greatly increase the probability that the resulting circuits will achieve their desired performance even in the face of manufacturing variations.”
A good deal more detail can be found in the paper (Design of optical neural networks with component imprecisions) published in Optics Express. Wierzynski was one of the authors along Intel colleagues and researchers from UC Berkeley.
The researchers write in their conclusion: “[Our results] provide clear guidelines for the architectural design of efficient, fault-resistant ONNs. In looking forward, it would be important to investigate algorithmic and training strategies as well. A central problem in deep learning is to design neural networks complex enough to model the data while being regularized to prevent over-fitting of noise in the training set. To this end, a wide variety of regularization techniques such as Dropout, Dropconnect, data augmentation, etc. have been developed. This problem parallels the trade-off between an ONN’s expressivity and its robustness to imprecisions presented here. Indeed, an important conclusion in is that in addition to architecture, even minor changes in the configuration of ONNs also have a great effect on the network’s robustness to faulty components.”
Link to Intel AI blog: https://www.intel.ai/optical-neural-networks/#gs.fe8yao
Link to paper: https://www.osapublishing.org/oe/fulltext.cfm?uri=oe-27-10-14009&id=411885