Papers to Read

General Introduction

  1. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature521(7553), 436-444.nature14539.pdf

[This is a general introduction by three towering figures of the field]

 

  1. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), 2278-2324.LeNet 00726791.pdf

    [This is the original LeNet by Yann Le Cun]

     

  2. Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on (pp. 6645-6649). IEEE.Speech RNN Hinton 06638947.pdf

[This work boosted MicroSoft's Speech Technology]

 

  1. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems(pp. 3104-3112).sequence-to-sequence-learning-with-neural-networks NIPS 2014 .pdf

[This leads to Google's better speech understanding, Gmail answers, ..]

 

  1. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).ReLU icml2010_NairH10.pdf

[ReLU is better than Sigmoid dealing with the vanishing gradient problem]

 

  1. Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research15(1), 1929-1958.Dropout srivastava14a.pdf

[Dropout (Brain-Damage) gives robust net work]

 

  1. Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).Batch Normalization icml2015_ioffe15.pdf

    [This technique makes training faster]

     

  2. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).Generative Adversarial Nets.pdf

    [Many said GAN is the most important paper over the past few years]

     

ImageNet Challenge Winners

 

  1. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).AlexNet-imagenet-classification-with-deep-convolutional-neural-networks.pdf [AlexNet]

     

  2. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. VGGNet 1409.1556.pdf [VGGNet]

 

  1. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015, June). Going deeper with convolutions. CVPR 2015 GoogLeNet Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf [GoogLeNet]

 

  1. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).ResNet He_Deep_Residual_Learning_CVPR_2016_paper.pdf [ResNet]

 

Diannoa Family

 

  1. Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., & Temam, O. (2014, February). Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices (Vol. 49, No. 4, pp. 269-284). ACM.DianNao p269-chen.pdf

 

  1. Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., ... & Temam, O. (2014, December). Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 609-622). IEEE Computer Society.DadianNao p609-chen.pdf

 

  1. Liu, D., Chen, T., Liu, S., Zhou, J., Zhou, S., Teman, O., ... & Chen, Y. (2015, March). Pudiannao: A polyvalent machine learning accelerator. In ACM SIGARCH Computer Architecture News (Vol. 43, No. 1, pp. 369-381). ACM.

 

  1. Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., ... & Temam, O. (2015, June). ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News (Vol. 43, No. 3, pp. 92-104). ACM.ShiDiannaop92-du.pdf

 

  1. Chen, Y., Chen, T., Xu, Z., Sun, N., & Temam, O. (2016). DianNao family: energy-efficient hardware accelerators for machine learning. Communications of the ACM59(11), 105-112.DianNao Family p105-che.pdf

 

  1. Liu, S., Du, Z., Tao, J., Han, D., Luo, T., Xie, Y., ... & Chen, T. (2016, June). Cambricon: An instruction set architecture for neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture(pp. 393-405). IEEE Press.Cambricon 07551409.pdf

 

  1. Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., ... & Chen, Y. (2016, October). Cambricon-X: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.Cabricon X 07783723.pdf

 

  1. Lu, W., Yan, G., Li, J., Gong, S., Han, Y., & Li, X. (2017, February). FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 553-564). IEEE.Flexflow HPCA2017 07920855.pdf

Vivienne Sze (MIT)

  1. Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. (2017). Efficient processing of deep neural networks: A tutorial and survey. arXiv preprint arXiv:1703.09039.MIT Sze Survey 1703.09039.pdf

 

  1. Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2017). Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits52(1), 127-138.Eyeriss 07738524.pdf

 

  1. Chen, Y. H., Emer, J., & Sze, V. (2017). Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators. IEEE Micro37(3), 12-21.MIT DataFlow IEEE Micro 07948671.pdf

     

Han Song and Bill Dally (Stanford)

 

  1. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.SqueezNet 1602.07360.pdf

 

  1. Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Deep Compression 1510.00149.pdf

 

  1. Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135-1143).Han Song 5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf

 

  1. Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016, June). EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture (pp. 243-254). IEEE Press.EIE p243-han.pdf

 

  1. Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., ... & Yang, H. (2017, February). ESE: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 75-84). ACM.ESE Sparse LSTM FPGA.pdf

 

More Compression Approaches

 

  1. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 27-40. DOI: https://doi.org/10.1145/3079856.3080254SCNN ISCA 2017 p27-Parashar.pdf

 

  1. Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N. E., & Moshovos, A. (2016, June). Cnvlutin: ineffectual-neuron-free deep neural network computing. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on (pp. 1-13). IEEE.Cnvlutin 07551378.pdf

 

  1. Judd, P., Delmas, A., Sharify, S., & Moshovos, A. (2017). Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing. arXiv preprint arXiv:1705.00125.Cnvlutin2 1705.00125.pdf

     

Google TPU

 

  1. Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., ... & Boyle, R. (2017). In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760.TPU ISCA 2017 1704.04760.pdf

 

Microsoft

 

  1. Chilimbi, T. M., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014, October). Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI (Vol. 14, pp. 571-582).Adam osdi14-paper-chilimbi.pdf

 

  1. Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., & Chung, E. S. (2015). Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper2(11).15 DCNN hardware.pdf

 

  1. Putnam, A., Caulfield, A. M., Chung, E. S., Chiou, D., Constantinides, K., Demme, J., ... & Haselman, M. (2014, June). A reconfigurable fabric for accelerating large-scale datacenter services. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on (pp. 13-24). IEEE.microsoft catapult 2014 06853195.pdf

 

FPGA

  1. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 161-170). ACM.UCLA Cong p161-zhang.pdf

 

  1. Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., ... & Esmaeilzadeh, H. (2016, October). From high-level deep neural models to FPGAs. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.HL DNN to FPGA 2016 Micro 07783720.pdf

 

  1. Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., ... & Cao, Y. (2016, February). Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 16-25). ACM.OpenCL FPGA p16-suda.pdf

 

  1. Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., & Zhou, X. (2017). DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems36(3), 513-517.DLAU Yuan Xie 07505926.pdf

 

  1. Peemen, M., Setio, A. A., Mesman, B., & Corporaal, H. (2013, October). Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on (pp. 13-19). IEEE.Memory Centric CNN 06657019.pdf

 

 

Various Acceleration Approaches

 

  1. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 1737-1746).Num Precision gupta15.pdf

 

  1. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830.Binary 1602.02830.pdf

 

  1. Vanhoucke, V., Senior, A., & Mao, M. Z. (2011, December). Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop (Vol. 1, p. 4).CPU Improvement Google VanhouckeNIPS11.pdf

 

  1. Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 13-26. DOI: https://doi.org/10.1145/3079856.3080244ScaleDeep ISCA 2017 p13-Venkataramani.pdf

 

  1. Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 548-560. DOI: https://doi.org/10.1145/3079856.3080215Scalpel ISCA 2017 p548-Yu.pdf

 

  1. Judd, P., Albericio, J., Hetherington, T., Aamodt, T. M., & Moshovos, A. (2016, October). Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.Stripes bit serial Micro 2016 07783722.pdf

     

  2. Kim, Y. D., Park, E., Yoo, S., Choi, T., Yang, L., & Shin, D. (2015). Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530.Yoo Compressed CNN 1511.06530.pdf

 

  1. Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., & Li, X. (2016, June). C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE (pp. 1-6). IEEE.C-Brain.pdf

 

  1. Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., ... & Mudge, T. (2017, February). 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International (pp. 250-251). IEEE.Michigan 2017 ISSCC.pdf

 

  1. Song, L., Qian, X., Li, H., & Chen, Y. (2017, February). PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 541-552). IEEE.PipeLayer.pdf

 

  1. Aydonat, U., O'Connell, S., Capalija, D., Ling, A. C., & Chiu, G. R. (2017). An OpenCL (TM) Deep Learning Accelerator on Arria 10. arXiv preprint arXiv:1701.03534.OpenCL Toronto.pdf

 

  1. Venkatesh, G., Nurvitadhi, E., & Marr, D. (2017, March). Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on(pp. 2861-2865). IEEE.Sparsity Intel.pdf

 

  1. You, Y. (2016). Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA (A thesis) .sparsity Analysis.pdf

 

  1. Wei, X., Yu, C. H., Zhang, P., Chen, Y., Wang, Y., Hu, H., ... & Cong, J. (2017, June). Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE (pp. 1-6). IEEE.Systolic Array Synthesis Cong.pdf

 

  1. Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., & Yang, G. (2017, May). swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International (pp. 615-624). IEEE.swDNN Taihu Light.pdf

 

  1. Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017, January). FPGA-based accelerator for long short-term memory recurrent neural networks. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific (pp. 629-634). IEEE.LSTM FPGA Cong.pdf

 

  1. Edstrom, J., Gong, Y., Chen, D., Wang, J., & Gong, N. (2017). Data-Driven Intelligent Efficient Synaptic Storage for Deep Learning. IEEE Transactions on Circuits and Systems II: Express Briefs.Eff Synap Storage.pdf

 

  1. Du, L., Du, Y., Li, Y., Su, J., Kuan, Y. C., Liu, C. C., & Chang, M. C. F. (2017). A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers.Reconfig DCNN Frank Chang.pdf
  2. Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., & Li, X. (2016, June). C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE (pp. 1-6). IEEE.C-Brain.pdf

 

  1. Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., ... & Mudge, T. (2017, February). 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International (pp. 250-251). IEEE.Michigan 2017 ISSCC.pdf

 

  1. Song, L., Qian, X., Li, H., & Chen, Y. (2017, February). PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 541-552). IEEE.PipeLayer.pdf

 

  1. Aydonat, U., O'Connell, S., Capalija, D., Ling, A. C., & Chiu, G. R. (2017). An OpenCL (TM) Deep Learning Accelerator on Arria 10. arXiv preprint arXiv:1701.03534.OpenCL Toronto.pdf

 

  1. Venkatesh, G., Nurvitadhi, E., & Marr, D. (2017, March). Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on(pp. 2861-2865). IEEE.Sparsity Intel.pdf

 

  1. You, Y. (2016). Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA (A thesis) .sparsity Analysis.pdf

 

  1. Wei, X., Yu, C. H., Zhang, P., Chen, Y., Wang, Y., Hu, H., ... & Cong, J. (2017, June). Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE (pp. 1-6). IEEE.Systolic Array Synthesis Cong.pdf

 

  1. Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., & Yang, G. (2017, May). swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International (pp. 615-624). IEEE.swDNN Taihu Light.pdf

 

  1. Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017, January). FPGA-based accelerator for long short-term memory recurrent neural networks. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific (pp. 629-634). IEEE.LSTM FPGA Cong.pdf

 

  1. Edstrom, J., Gong, Y., Chen, D., Wang, J., & Gong, N. (2017). Data-Driven Intelligent Efficient Synaptic Storage for Deep Learning. IEEE Transactions on Circuits and Systems II: Express Briefs.Eff Synap Storage.pdf

 

  1. Du, L., Du, Y., Li, Y., Su, J., Kuan, Y. C., Liu, C. C., & Chang, M. C. F. (2017). A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers.Reconfig DCNN Frank Chang.pdf

 

 

 

Additional References

****************************************************************************************************

  1. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks61, 85-117.Schmidhuber Overview NN1-s2.0-S0893608014002135-main.pdf

     

  2. LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems (pp. 396-404).handwritten-digit-recognition-with-a-back-propagation-network.pdf

 

  1. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.RNN Comparison 1412.3555.pdf

 

  1. Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.word2vec explained 1402.3722.pdf

 

  1. Lin, H. W., Tegmark, M., & Rolnick, D. (2016). Why does deep and cheap learning work so well?. Journal of Statistical Physics, 1-25.Why DL Work10.1007\s10955-017-1836-5.pdf

 

  1. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research11(Dec), 3371-3408.vincent10a.pdf

 

  1. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence35(8), 1798-1828.Learning Representation 06472238.pdf

 

  1. Ota, K., Dao, M. S., Mezaris, V., & De Natale, F. G. (2017). Deep Learning for Mobile Multimedia: A Survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)13(3s), 34.DL mobile survey.pdf

 

Video

 

 

4-1 Deep Learning Algorithms, Optimization Methods, and Hardware Accelerators (Prof. Sungjoo Yoo) https://youtu.be/ebqVpK4c3cw

 

4-2 Example of Object Detection Result (Prof. Sungjoo Yoo)https://youtu.be/MEgwTaUdmqw

 

4-3 Convoiution with Matrix Multiplication (Prof. Sungjoo Yoo)https://youtu.be/2ExjsudgDU4

 

4-4 High Performance Accelerator Architecture (Prof. Sungjoo Yoo)https://youtu.be/hAZ2t0a7rdU