Bio: Wei Gao is currently an Assistant Professor at the School of Electronic and Computer Engineering, Peking University, China. He received the Ph.D. degree in computer science from City University of Hong Kong, in February 2017. In 2016, he was a Visiting Scholar with University of California, Los Angeles, CA, USA. From 2017 to 2019, he worked as Postdoc Fellow at City University of Hong Kong, and Research Fellow at Nanyang Technological University, Singapore. He is a Senior Member of IEEE (Since January 2022).
His research interests include multimedia coding, multimedia processing, and artificial intelligence, especially 3D point cloud compression and processing. He has published over 180 high quality research papers in TPAMI, TIP, TCSVT, TMM, TNNLS, CVPR, ICCV, ECCV, ACM MM, AAAI, NeurIPS, etc. (Google Citations: 6500+, H-index: 40). He was selected to Stanford University "The World's Top 2% Scientists" List. He authored four books published by Springer Nature, including "AI-based 3D Point Cloud Coding", "AI-based Image and Video Coding", "Point Cloud Compression: Technologies and Standardization", and "Deep Learning for 3D Point Clouds", for which he received the 2025 Springer Nature - China New Development Awards. He has been actively participating into the developments of 3D visual data coding standards. He is a member of the Experts Groups of MPEG and AVS Standards, and a member of IEEE Data Compression Standards Committee (DCSC). He is currently the Executive Lead of AVS AI-based Point Cloud Coding Standard, a Co-Chair of AVS Digital Human Standard AdHoc Group, and a Co-Chair of IEEE P3366.3 - Gaussian Splatting Compression Standard. He has applied over 100 patents, and submitted over 70 standard proposals on point cloud coding, video coding, and point cloud quality assessment. He was a percipient of the AVS Industrial Technology Innovation Team Award for contributions to point cloud compression standard in 2023. He is also the Leader to establish several open source projects, including OpenPointCloud, OpenAICoding, and OpenDatasets. He founded the MMCAL Lab in 2019, and is now leading the research team to develop advanced technologies for immersive media and 3D vision.
He is currently an Associate Editor of IEEE TCSVT, IEEE TMM, and ACM TOMM, and has served Guest Editor for two special issues. He is an Elected Member of IEEE VSPC-TC, IEEE MSA-TC, and APSIPA IVM-TC. He organized workshops and special sessions at ACM MM 2025, IEEE ICME 2023, ACM MM 2022, IEEE VCIP 2022, and IEEE ICME 2021. He was the tutorial speaker at ACM MM 2025, ACM MM 2024, IEEE ICIP 2024, IEEE IJCNN 2024, and IEEE ICME 2023. He has regularly served as Reviewer for IEEE Transactions, such as TPAMI、TIP、TVCG、TCSVT、TMM、TNNLS, and Area Chair and TPC Member for several prestigious international conferences.
Abstract: In the current AI era, 3D visual data has empowered emerging applications such as immersive media, autonomous driving, and intelligent robotics. 3D visual data coding can efficiently reliever the burden of storage and transmission, and thus has attracted much attention and efforts from both academia and industry communities. The current 3D visual solutions are diversified due to the specific capturing conditions, rendering techniques, and application demands, etc., making researchers should put more efforts to adapt to practical different application scenarios, including multi-view, point cloud, mesh, and Gaussian splats. Is there a unified coding method even representation format for 3D visual data? We are expecting to use point cloud for the direct representation, considering the strong relations and feasible conversions with others. In this talk, we will present the following research works from my research group and the related discussion on the unified 3D visual data coding via point cloud. First of all, for rate-distortion optimization, the efficient context modeling and networks can be devised for efficient AI-based point cloud coding, including uniform contexts, checkerboard contexts, and cross-modality contexts, etc. Next, several representative works on quality assessment and perception modeling will be presented, and the challenges for human and machine perception oriented coding are identified with effective solutions. Afterwards, for complexity optimization, we will illustrate the use of dynamic networks for coding complexity control and rate-distortion control. Finally, we will give a unified point cloud coding method and framework for encoding any types of point cloud data and the extended point cloud modality, i.e., Gaussian splats. From this talk, we would like to raise the attention of audiences to develop more efficient, powerful, and unified coding solutions for 3D visual data, and thus promote the fast developments of practical applications in immersive media, spatial intelligence and embodied intelligence.
Bio: Heming Sun received the B.E. degree in Electronic Engineering from Shanghai Jiao Tong University, Shanghai, China, in 2011, and the M.E. degrees from Waseda University, Japan, and Shanghai Jiao Tong University, China, in 2012 and 2014, respectively, through a double-degree program. He earned his Ph.D. degree from Waseda University in 2017 through the Graduate Program for Embodiment Informatics.
He was a Researcher at NEC Central Research Laboratories from 2017 to 2018, and an Assistant Professor at Waseda University from 2018 to 2023. He is currently an Associate Professor at Yokohama National University. From 2019 to 2023, he was also a Researcher with the Japan Science and Technology Agency (JST) PRESTO program.
His research interests include algorithms and VLSI architectures for image/video processing and neural networks. He has published over 100 journal and conference papers, primarily in venues associated with the IEEE Circuits and Systems Society (e.g., TCSVT, TCAS-I, TCAS-II, ISCAS, VCIP), the IEEE Solid-State Circuits Society (e.g., JSSC, ISSCC, ASSCC), and the IEEE Computer Society (e.g., TIP, CVPR, ICIP).
He contributed to the design of the 8K HEVC decoder chip, which earned the ISSCC 2016 Takuo Sugano Award for Outstanding Far-East Paper. Additionally, his module design (system design of de-quantization and inverse transform) received the VLSI Design and Education Center (VDEC) Design Award from the University of Tokyo. Dr. Sun has received several awards, including the VCIP Best Paper Award, PCS Top-10 Best Paper, and the VCIP Best Demo Award, all as first author. In recognition of his outstanding research contributions, he was honored with the IPSJ/IEEE Computer Society Young Computer Researcher Award and the Young Scientists’ Award from the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology. The latter is regarded as the most prestigious honor for young researchers under the age of 40 in Japan.
In terms of academic service, he organized two special sessions on learned codecs at the Picture Coding Symposium in 2019 and 2022. He has delivered five tutorials on learned codecs, covering both algorithmic and architectural aspects, at ISCAS 2021, WACV 2023, ICCV 2023, EUSIPCO 2025 and APSIPA 2025. Additionally, he was invited by the Information Processing Society of Japan to give a talk titled “Deep Learning Method for Image Compression.” Dr. Sun has been serving as the VSPC-TC Membership Subcommittee Chair since 2021 and is an Associate Editor for TCSVT as well as a Guest Editor for JETCAS. He has also made significant contributions to flagship CAS Society journals and conferences as an RCM, area chair, and reviewer, and was recognized as the VCIP Best Reviewer.
Summary: This talk presents research that bridges traditional and learned image and video coding, spanning from algorithmic innovation to hardware realization.
The first part of the lecture introduces efficient fast algorithms and low-cost architectural techniques in the traditional coding domain, covering key components such as intra prediction and transform. These studies established a solid foundation for the subsequent development of neural network–based compression techniques.
The second part highlights advance in learned image and video coding, where neural networks significantly improve rate–distortion efficiency. I shall discuss neural approaches for enhanced intra prediction, CNN- and Transformer-based frameworks for achieving state-of-the-art performance, and quantization-aware, fixed-point designs that enable cross-platform consistency. Furthermore, an FPGA-based learned codec system demonstrating real-time performance is presented. Algorithm–architecture co-optimization is also explored to balance throughput and coding efficiency.
Finally, the talk concludes with ongoing efforts toward turning learned codecs into practical systems through algorithm–architecture co-design.