Home
Session 8 - Convolutional Neural Networks
Time: Thursday, 2019-04-11, 10:30AM - 12:00PM
Room: Wilhem-Köhler-Saal, S1|03/283
Session chair: Andreas Koch
Filter-wise Pruning Approach to FPGA Implementation of Fully Convolutional Network for Semantic Segmentation
Masayuki Shimoda, Youki Sada, Hiroki Nakahara
This paper presents a hardware-aware sparse fully convolu- tional network (SFCN) for semantic segmentation on an FPGA. Seman- tic segmentation attracts interest since for self-driving car it is important to recognize road and obstacles in pixel level. However, it is hard to im- plement the system on embedded systems since the number of weights for the SFCN is so large that embedded systems cannot store them using limited on-chip memory. To realize good a trade-off between speed and accuracy, we construct an AlexNet-based SFCN which has no skip con- nections and deconvolution layers to reduce the computation costs and the latency. Furthermore, we propose a filter-wise pruning technique that sorts the weights of each filter by their absolute values and prunes them by a preset percent filter-by-filter from a small order. It is more suitable for the hardware implementation since the number of computation of each filter becomes equal. We trained the AlexNet-based SFCN by us- ing Camvid image dataset and implemented on Xilinx zcu102 evaluation board. The results show that the FPGA system is 10.14 times faster than a mobile GPU one, and its performance per power consumption is 24.49 times higher than the GPU counterpart.
Exploring Data Size to Run Convolutional Neural Networks in Low Density FPGAs
Ana Goncalves, Tiago Peres, Mário Véstias
Convolutional Neural Networks (CNNs) obtain very good results in several computer vision applications at the cost of high com- putational and memory requirements. Therefore, CNN typically run on high performance platforms. However, CNNs can be very useful in em- bedded systems and its execution right next to the source of data has many advantages, like avoiding the need for data communication and real-time decisions turning these systems into smart sensors. In this pa- per, we explore data quantization for fast CNN inference in low density FPGAs. We redesign LiteCNN, an architecture for real-time inference of large CNN in low density FPGAs, to support hybrid quantization. We study the impact of quantization over the area, performance and accu- racy of LiteCNN. LiteCNN with improved quantization of activations and weights improves the best state of the art results for CNN inference in low density FPGAs. With our proposal, it is possible to infer an image in AlexNet in 7.4 ms in a ZYNQ7020 and in 14.8 ms in a ZYNQ7010 with 3% accuracy degradation. Other delay versus accuracy ratios were identified permiting the designer to choose the most appropriate.
Faster Convolutional Neural Networks in Low Density FPGAs using Block Pruning
Tiago Peres, Ana Goncalves, Mário Véstias
Convolutional Neural Networks (CNNs) are achieving promis- ing results in several computer vision applications. Running these models is computationally very intensive and needs a large amount of memory to store weights and activations. Therefore, CNN typically run on high per- formance platforms. However, the classification capabilities of CNNs are very useful in many applications running in embedded platforms close to data production since it avoids data communication for cloud processing and permits real-time decisions turning these systems into smart em- bedded systems. In this paper, we improve the inference of large CNN in low density FPGAs using pruning. We propose block pruning and apply it to LiteCNN, an architecture for CNN inference that achieves high performance in low density FPGAs. With the proposed LiteCNN optimizations, we have an architecture for CNN inference with an aver- age performance of 275 GOPs for 8-bit data in a XC7Z020 FPGA. With our proposal, it is possible to infer an image in AlexNet in 5.1 ms in a ZYNQ7020 and in 13.2 ms in a ZYNQ7010 with only 2.4% accuracy degradation.