With the proliferation of ultrahigh-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence (AI)1, the world is generating exponentially increasing amounts of data that need to be processed in a fast and efficient way. Highly parallelized, fast and scalable hardware is therefore becoming progressively more important2. Here we demonstrate a computationally specific integrated photonic hardware accelerator (tensor core) that is capable of operating at speeds of trillions of multiply-accumulate operations per second (1012 MAC operations per second or tera-MACs per second). The tensor core can be considered as the optical analogue of an application-specific integrated circuit (ASIC). It achieves parallelized photonic in-memory computing using phase-change-material memory arrays and photonic chip-based optical frequency combs (soliton microcombs3). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 gigahertz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates3,4,5, ultralow-loss silicon nitride waveguides6,7, and high-speed on-chip detectors and modulators, our approach provides a path towards full complementary metal–oxide–semiconductor (CMOS) wafer-scale integration of the photonic tensor core. Although we focus on convolutional processing, more generally our results indicate the potential of integrated photonics for parallel, fast, and efficient computational hardware in data-heavy AI applications such as autonomous driving, live video processing, and next-generation cloud computing services.
This is a preview of subscription content
Subscription info for Chinese customers
We have a dedicated website for our Chinese customers. Please go to naturechina.com to subscribe to this journal.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data used in this study are available from the corresponding author upon reasonable request.
Batra, G., Jacobson, Z., Madhav, S., Queirolo, A. & Santhanam, N. Artificial-intelligence hardware: new opportunities for semiconductor companies. https://www.mckinsey.com/industries/semiconductors/our-insights/artificial-intelligence-hardware-new-opportunities-for-semiconductor-companies (McKinsey & Company, 2019).
Ben-Nun, T. & Hoefler, T. Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput. Surv. 52, https://doi.org/10.1145/3320060 (2019).
Herr, T. et al. Temporal solitons in optical microresonators. Nat. Photon. 8, 145–152 (2014).
Herr, T., Gorodetsky, M. L. & Kippenberg, T. J. Dissipative Kerr solitons in optical microresonators. In Nonlinear Optical Cavity Dynamics From Microresonators to Fiber Lasers (ed. Grelu, P.) Vol. 8083, Ch. 6, 129–162 (Wiley, 2015).
Raja, A. S. et al. Electrically pumped photonic integrated soliton microcomb. Nat. Commun. 10, 680 (2019).
Pfeiffer, M. H. P. et al. Photonic damascene process for integrated high-Q microresonator based nonlinear photonics. Optica 3, 20–25 (2016).
Liu, J. et al. Ultralow-power chip-based soliton microcombs for photonic integration. Optica 5, 1347–1353 (2019).
Machine Learning on AWS https://aws.amazon.com/machine-learning/ (accessed 12 October 2020).
Google Cloud AI And Machine Learning Products https://cloud.google.com/products/machine-learning/ (accessed 12 October 2020).
Zhang, C. et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA ’15) https://doi.org/10.1145/2684746.2689060 (2015).
Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. Proc. ISCA ’17 https://doi.org/10.1145/3079856.3080246 (2017).
Wang, P. S., Liu, Y., Guo, Y. X., Sun, C. Y. & Tong, X. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36, https://doi.org/10.1145/3072959.3073608 (2017).
Miller, D. A. B. Attojoule optoelectronics for low-energy information processing and communications. J. Lightwave Technol. 35, 346–396 (2017).
Agrawal, S. R. et al. A many-core architecture for in-memory data processing. In Proc. 50th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO-50 ’17) 245–258, https://doi.org/10.1145/3123939.3123985 (IEEE/ACM, 2017).
Miller, D. A. B. Are optical transistors the logical next step? Nat. Photon. 4, 3–5 (2010).
Ielmini, D. & Wong, H. S. P. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Le Gallo, M. et al. Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).
Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Hu, M. et al. Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proc. 53rd Annu. Design Automation Conf. (DAC ’16) https://doi.org/10.1145/2897937.2898010 (ACM Digital Library, 2016).
Gong, N. et al. Signal and noise extraction from analog memory elements for neuromorphic computing. Nat. Commun. 9, 2102 (2018).
Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11, 2473 (2020).
Yang, T. Y., Park, I. M., Kim, B. J. & Joo, Y. C. Atomic migration in molten and crystalline Ge2Sb2Te5 under high electric field. Appl. Phys. Lett. 95, 032104 (2009).
Koelmans, W. W. et al. Projected phase-change memory devices. Nat. Commun. 6, 8181 (2015).
Kim, S. et al. A phase change memory cell with metallic surfactant layer as a resistance drift stabilizer. In 2013 IEEE Int. Electron Devices Meeting https://doi.org/10.1109/IEDM.2013.6724727 (IEEE, 2013).
Bell, T. E. Optical computing: a field in flux: a worldwide race is on to develop machines that compute with photons instead of electrons but what is the best approach? IEEE Spectr. 23, 34–38 (1986).
Hamerly, R., Bernstein, L., Sludds, A., Soljačić, M. & Englund, D. Large-scale optical neural networks based on photoelectric multiplication. Phys. Rev. X 9, 021032 (2018).
Silva, A. et al. Performing mathematical operations with metamaterials. Science 343, 160–163 (2014).
Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).
Colburn, S., Chu, Y., Shilzerman, E. & Majumdar, A. Optical frontend for a convolutional neural network. Appl. Opt. 58, 3179–3186 (2019).
Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11, 441–446 (2017).
Tait, A. N. et al. Silicon photonic modulator neuron. Phys. Rev. Appl. 11, 064043 (2019).
Pérez, D. et al. Multipurpose silicon photonics signal processor core. Nat. Commun. 8, 636 (2017).
Galal, S. & Horowitz, M. Energy-efficient floating-point unit design. IEEE Trans. Comput. 60, 913–922 (2011).
Bangari, V. et al. Digital electronics and analog photonics for convolutional neural networks (DEAP-CNNs). IEEE J. Sel. Top. Quantum Electron. 26, https://doi.org/10.1109/JSTQE.2019.2945540 (2020).
LeCun, Y., Cortes, C. & Borges, C. J. C. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist.
Stern, B., Ji, X., Okawachi, Y., Gaeta, A. L. & Lipson, M. Battery-operated integrated frequency comb generator. Nature 562, 401–405 (2018).
Jones, R. et al. Heterogeneously integrated InP/silicon photonics: fabricating fully functional transceivers. IEEE Nanotechnol. Mag. 13, 17–26 (2019).
Marin-Palomo, P. et al. Microresonator-based solitons for massively parallel coherent optical communications. Nature 546, 274–279 (2017).
Spencer, D. T. et al. An optical-frequency synthesizer using integrated photonics. Nature 557, 81–85 (2018).
Riemensberger, J. et al. Massively parallel coherent laser ranging using soliton microcombs. Nature 581, 164–170 (2019).
Moss, D. J., Morandotti, R., Gaeta, A. L. & Lipson, M. New CMOS-compatible platforms based on silicon nitride and Hydex for nonlinear optics. Nat. Photon. 7, 597–607 (2013).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.90 (IEEE, 2016).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In 3rd Int. Conf. Learning Representations (ICLR 2015) (eds Bengio, Y. & LeCun, Y.) 4 (2015); https://arxiv.org/abs/1409.1556.
Al-Ashrafy, M., Salem, A. & Anis, W. An efficient implementation of floating point multiplier. In 2011 Saudi Int. Electronics, Communications and Photonics Conf. (SIECPC) https://doi.org/10.1109/SIECPC.2011.5876905 (2011).
Gao, L., Chen, P. Y. & Yu, S. Demonstration of convolution kernel operation on resistive cross-point array. IEEE Electron Device Lett. 37, 870–873 (2016).
Shafiee, A. et al. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proc. 2016 43rd Int. Symp. Computer Architecture (ISCA 2016) https://doi.org/10.1109/ISCA.2016.12 (2016).
Li, X. et al. Fast and reliable storage using a 5 bit, nonvolatile photonic memory cell. Optica 6, 1–6 (2019).
Ríos, C. et al. Integrated all-photonic non-volatile multi-level memory. Nat. Photon. 9, 725–732 (2015).
Feldmann, J. et al. Calculating with light using a chip-scale all-optical abacus. Nat. Commun. 8, 1256 (2017).
Gehring, H. et al. Low-loss fiber-to-chip couplers with ultrawide optical bandwidth. APL Photon. 4, 010801 (2019).
Gehring, H., Eich, A., Schuck, C. & Pernice, W. H. P. Broadband out-of-plane coupling at visible wavelengths. Opt. Lett. 44, 5089 (2019).
Nahmias, M. A. et al. Photonic multiply-accumulate operations for neural networks. IEEE J. Sel. Top. Quantum Electron. https://doi.org/10.1109/jstqe.2019.2941485 (2019).
Gehring, H., Blaicher, M., Hartmann, W. & Pernice, W. H. P. Python based open source design framework for integrated nanophotonic and superconducting circuitry with 2D-3D-hybrid integration. OSA Continuum 2, 3091–3101 (2019).
Guo, H. et al. Universal dynamics and deterministic switching of dissipative Kerr solitons in optical microresonators. Nat. Phys. 13, 94–102 (2017).
Karpov, M. et al. Dynamics of soliton crystals in optical microresonators. Nat. Phys. 15, 1071–1077 (2019).
Fialka, O. & Čadík, M. FFT and convolution performance in image filtering on GPU. In Proc. 10th Int. Conf. Information Visualisation (IV’06) https://doi.org/10.1109/IV.2006.53 (IEEE, 2006).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, https://doi.org/10.1145/3065386 (2017).
Szegedy, C. et al. Going deeper with convolutions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2015.7298594 (IEEE, 2015).
Ríos, C. et al. In-memory computing on a photonic platform. Sci. Adv. 5, eaau5759 (2019).
Gaeta, A. L., Lipson, M. & Kippenberg, T. J. Photonic-chip-based frequency combs. Nat. Photon. 13, 158–169 (2019).
Ma, Y. et al. Ultralow loss single layer submicron silicon waveguide crossing for SOI optical interconnect. Opt. Express 21, 29374–29382 (2013).
Lu, Z. et al. Broadband silicon photonic directional coupler using asymmetric-waveguide based phase control. Opt. Express 23, 3795–3808 (2015).
Farmakidis, N. et al. Plasmonic nanogap enhanced phase change devices with dual electrical-optical functionality. Sci. Adv. 5, eaaw2687 (2019).
Zhang, H. et al. Miniature multilevel optical memristive switch using phase change material. ACS Photon. 6, 2205–2212 (2019).
Atabaki, A. H. et al. Integrating photonics with silicon nanoelectronics for the next generation of systems on a chip. Nature 556, 349–354 (2018).
Wang, X. & Liu, J. Emerging technologies in Si active photonics. J. Semicond. 39, 061001 (2018).
Sun, J., Timurdogan, E., Yaacobi, A., Hosseini, E. S. & Watts, M. R. Large-scale nanophotonic phased array. Nature 493, 195–199 (2013).
This research was supported by EPSRC via grants EP/J018694/1, EP/M015173/1 and EP/M015130/1 in the UK and Deutsche Forschungsgemeinschaft (DFG) grant PE 1832/5-1 in Germany. This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-19-1-0250. W.H.P.P. gratefully acknowledges support by the European Research Council through grant 724707. We further acknowledge funding for this work from the European Union’s Horizon 2020 Research and Innovation Programme (Fun-COMP project number 780848). A.S. acknowledges support by the European Research Council though grant 682675. H.G. thanks the Studienstiftung des deutschen Volkes for financial support. We thank F. Brückerhoff-Plückelmann, S. Agarwal and W. Zhou for help with sample fabrication and discussions of the experimental results.
The authors declare no competing interests.
Peer review information Nature thanks Huaqiang Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary methods and notes. The file contains Supplementary Tables 1–2, Supplementary Figures 1–22 and Supplementary References. It gives further methodological information on the experimental setups and provides additional data to validate and illustrate the main results of the manuscript.
About this article
Cite this article
Feldmann, J., Youngblood, N., Karpov, M. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021). https://doi.org/10.1038/s41586-020-03070-1
Light: Science & Applications (2021)
Communications Physics (2021)
Light: Science & Applications (2021)
Nature Communications (2021)