ICASSP 2008 - 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing - March 30 - April 4, 2008 - Las Vegas, Nevada, U.S.A.

T-10: Multimedia Signal Processing on Processors with Many Cores

Monday Morning, March 31
09:00 - 12:00

Presented by

Yen-Kuang Chen, Intel Corporation, USA and S. Y. Kung, Princeton University, USA

Abstract

This tutorial covers algorithm design and algorithmic-level optimization for future processors with many cores. For the best performance of multimedia applications on future personal computers, we must carefully consider the interplay between microprocessors and algorithms/applications. Beyond frequency increases, the performance of personal computers has improved significantly because of the introduction of multiple cores (e.g., the latest Intel Core Quad processor). Moving forward, we expect a trend of increasing the number of processing cores in a single personal computer (e.g., the latest Intel 80-core research prototype with tera-FLOPS). To harness the computational capability from multi-core processors, one of the best ways is to exploit the thread-level parallelism in the applications. As there is a symbiotic relationship between computation and memory, to achieve best effect of the highest level of computation is to assure the best memory performance. Hence, we must design or choose the algorithm for maximal thread-level parallelism and cache localities. The implication is applicable to developing algorithms/applications for not only personal computers, but also SoCs with multiple cores.

Outline

  1. Overview, motivation, and introduction
    1. Sequential vs. parallel processing in personal computers & SoCs
    2. Thread-level parallelism (Hyper-Threading Technology, Dual Core)
  2. Symbiotic design of multi-threading algorithm
    1. Partition application into multiple threads, using SPMD (design example: H.264 encoder)
    2. Avoid sequential dependencies (design example: Canny Edge Detector)
    3. Dynamically balance loads for better parallelism (design example: MPEG-2 video decoder and articulated body tracking)
    4. Reduce overheads (design example: two alternative graph mining algorithms, Hough transform)
    5. Take advantage of sharing cache to increase effectiveness (design example: SVM-based face detection, image remapping, matrix-matrix multiplication)
  3. Future implications and conclusions

Speaker Biographies

Yen-Kuang Chen received his Ph.D. from Princeton University, and is a Principal Engineer in Corporate Technology Group, Intel Corporation. His research interests include developing innovative multimedia applications, studying the performance bottleneck in current computers, and designing next generation microprocessor/platform. In particular, he is currently analyzing the emerging multimedia applications and providing inputs to the definition of the next-generation CPUs and GPUs with many cores. He is one of the key contributors to Supplemental Streaming SIMD Extension 3 in Intel® Core™ 2 Duo processors. He has 10+ US patents, 25+ pending patent applications, and 75+ technical publications. He is an associate editor of the Journal of VLSI Signal Processing Systems (including special issues on “System-on-a-Chip for Multimedia Systems”, “Design and Programming of Signal Processors for Multimedia Communication”, and “Multi-core Enabled Multimedia Applications & Architectures”) and of IEEE Transactions on Circuit and System I. He has served as a program committee member of 20+ international conferences and workshops on multimedia, video communication, image processing, VLSI circuits and systems, parallel processing, and software optimization. He is an invited participant to 2002 Frontiers of Engineering Symposium (National Academy of Engineering) and to 2003 German-American Frontiers of Engineering Symposium (Alexander von Humboldt Foundation). He is an IEEE Senior Member and an ACM Senior Member.

S. Y. Kung is a Professor at Department of Electrical Engineering in Princeton University. His research areas include VLSI array processors, system modeling and identification, neural networks, wireless communication, sensor array processing, multimedia signal processing, bioinformatic data mining and biometric authentication. He was a founding member of several Technical Committees (TC) of the IEEE Signal Processing Society, and was appointed as the first Associate Editor in VLSI Area (1984) and later the first Associate Editor in Neural Network (1991) for the IEEE Transactions on Signal Processing. He has been a Fellow of IEEE since 1988. He served as a Member of the Board of Governors of the IEEE Signal Processing Society (1989-1991). Since 1990, he has been the Editor-In-Chief of the Journal of VLSI Signal Processing Systems. He was a recipient of IEEE Signal Processing Society's Technical Achievement Award for the contributions on "parallel processing and neural network algorithms for signal processing" (1992); a Distinguished Lecturer of IEEE Signal Processing Society (1994); a recipient of IEEE Signal Processing Society's Best Paper Award for his publication on principal component neural networks (1996); and a recipient of the IEEE Third Millennium Medal (2000). He has authored and co-authored more than 400 technical publications and numerous textbooks including "VLSI and Modern Signal Processing", Prentice-Hall (1985), ``VLSI Array Processors'', Prentice-Hall (1988); ``Digital Neural Networks'', Prentice-Hall (1993); ``Principal Component Neural Networks'', John-Wiley (1996); and ``Biometric Authentication: A Machine Learning Approach'', Prentice-Hall (2004).


©2010 Conference Management Services, Inc. -||- email: webmaster@icassp2008.com -||- Last updated Thursday, January 24, 2008