【正文】
t learning algorithm for deep belief nets. Neural Computation ,2006,18(7):15271554.[8] Ruslan Salakhutdinov, Geoffrey Hinton. Deep boltzmann machines. Journal of Machine Learning ResearchProceedings Track, 2009, 9(1):448455.[9] 劉建偉, 劉媛, 羅雄麟. 波爾茲曼機研究進展[J]. 計算機研究與發(fā)展, 2014,51(1):116.Liu Jianwei, Liu Yuan, Luo Xionglin. Research and development on boltzmann machine. Journal of Computer Research and Development[J], 2014,51(1): 116 (in Chinese)[10] Bourlard H, Kamp Y. Autoassociation by multilayer perceptrons and singular value deposition[J]. Biological Cybernetics, 1988, 59(45):291294.[11] Miao Y, Gowayyed M, Metze F. EESEN: Endtoend speech recognition using deep RNN models and WFSTbased decoding[C]// Automatic Speech Recognition and Understanding. IEEE, 2015:167174.[12] Liu S, Du Z, Tao J, et al. Cambricon: An Instruction Set Architecture for Neural Networks[J]. Acm Sigarch Computer Architecture News, 2016, 44(3):393405.[13] Shen T, Fei H U. Acceleration of CNN on GPU[J]. Application of Ic, 2017.[14] Yang S, Qiang L, Hao F, et al. Accelerating CNN’s forward process on mobile GPU using OpenCL[C]// Eighth International Conference on Digital Image Processing. 2016:100334W.[15] Meloni P, Deriu G, Conti F, et al. Curbing the roofline:a scalable and flexible architecture for CNNs on FPGA[C]// ACM, 2016:376383.[16] Han X, Zhou D, Wang S, et al. CNNMERP: An FPGAbased memoryefficient reconfigurable processor for forward and backward propagation of convolutional neural networks[C]// IEEE, International Conference on Computer Design. IEEE, 2016:320327.[17] Chen T, Du Z, Sun N, et al. DianNao: a smallfootprint highthroughput accelerator for ubiquitous machinelearning[J]. Acm Sigarch Computer Architecture News, 2014, 49(4):269284.[18] Liu D, Chen T, Liu S, et al. PuDianNao: A Polyvalent Machine Learning Accelerator[C]// Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems.[19] Du Z. ShiDianNao: shifting vision processing closer to the sensor[C]// ISCA 39。15 Proceedings of the, International Symposium on Computer Architecture. 2015:92104.[20] 劉仲, 田希, 陳磊. 支持原位計算的高效三角矩陣乘法向量化方法[J]. 國防科技大學學報, 2014(6):711.Liu Zhong, Tian Xi, Chen Lei. Efficient vectorization method of triangular matrix multiplication supporting inplace calculation[J]. Journal of National University of Defense Technology, 2014, 36(6):711 and 47.[21] 劉仲, 陳躍躍, 陳海燕. 支持任意系數(shù)長度和數(shù)據(jù)類型的FIR濾波器向量化方法[J]. 電子學報, 2013, 41(2):346351.Liu Zhong, Chen Yueyue, Chen Haiyan. A Vectorization of FIR Filter Supporting Arbitrary Coefficients Length and Data Types[J]. Acta Electronica Sinica, 2013, 41(2):346351.[22] Zhang Q Q, Wang C L, Liu Z Y. Accelerating Largescale Convolutional Neural Networks Based on Convolution in Blocks[J]. 2016.[23] Dongarra J J. An extended set of FORTRAN basic linear algebra subprograms[J]. Acm Transactions on Mathematical Software, 1988, 14(1):1832.[24] NVIDIA cuDNNGPU accelerated deep learning. , 2014.[25] [26] Shi W, Cao J, Zhang Q, et al. Edge Computing: Vision and Challenges[J]. IEEE Internet of Things Journal, 2016, 3(5):63764