The 11th IEEE International Workshop on
Analysis and Modeling of Faces and Gestures
In conjunction with ICCV 2023
October 2 starting from 8:45 AM (Local, Paris)
Papers
We have experienced rapid advances in the face, gesture, and cross-modality (e.g., voice and face) technologies with many thanks to deep learning (i.e., dating back to 2012, AlexNet) and large-scale, labeled datasets. The progress in deep learning continues to push renowned public databases to near saturation, thus calling for the compiling of evermore challenging image collections as databases. In practice, and even widely in applied research, using off-the-shelf deep learning models has become the norm, as numerous pre-trained networks are available for download and readily deployed to new, unseen data (e.g., VGG-Face, ResNet). We have almost grown “spoiled” from such luxury, which, in all actuality, has enabled us to stay hidden from many truths. Theoretically, the truth behind what makes neural networks more discriminant than ever before is still, in all fairness, unclear. Instead, they act as a black box to most practitioners and researchers alike. More troublesome is the absence of tools to quantitatively and qualitatively characterize existing deep models, which could yield more significant insights about these all-so-familiar black boxes. With the frontier moving forward at rates incomparable to any spurt of the past, challenges such as high variations in illumination, pose, age, etc., now confront us. However, state-of-the-art deep learning models often fail when faced with such challenges owing to the difficulties in modeling structured data and visual dynamics.Alongside the effort spent on conventional face recognition is the research done across modality learning, such as face and voice, gestures in imagery, and video motion, along with several other tasks. This line of work has attracted attention from industry and academic researchers from all sorts of domains. Additionally, and in some cases with this, there has been a push to advance these technologies for social media-based applications. Regardless of the exact domain and purpose, the following capabilities must be satisfied: face and body tracking (e.g., facial expression analysis, face detection, gesture recognition), lip reading and voice understanding, face and body characterization (e.g., behavioral understanding, emotion recognition), face, body, and gesture characteristic analysis (e.g., gait, age, gender, ethnicity recognition), group understanding via social cues (e.g., kinship, non-blood relationships, personality), and visual sentiment analysis (e.g., temperament, arrangement). Thus, needing to be able to create effective models for visual certainty has significant value in both the scientific communities and the commercial market, with applications that span topics of human-computer interaction, social media analytics, video indexing, visual surveillance, and internet vision. Researchers have made significant progress addressing many of these problems, especially when considering off-the-shelf and cost-efficient vision products available these days, e.g., Intel RealSense, SHORE, and Affdex. Nonetheless, serious challenges remain, which only amplify when considering the unconstrained imaging conditions captured by different sources focused on non-cooperative subjects. These latter challenges especially grabbed our interest, as we sought to bring together cutting-edge techniques and recent advances in deep learning to solve the challenges in the wild.
This one-day serial workshop (AMFG2023) provides a forum for researchers to review the recent progress of recognition, analysis, and modeling of face, body, and gesture while embracing the most advanced deep learning systems available for face and gesture analysis, particularly under an unconstrained environment like social media and across modalities like face-to-voice. The workshop includes up to two keynotes and peer-reviewed papers (oral and poster). We call for original, high-quality contributions on the following topics:
- Novel deep model, deep learning survey, or comparative study for face/gesture recognition;
- Data-driven or physics-based generative models for faces, poses, and gestures; deep learning for internet-scale soft biometrics and profiling: age, gender, ethnicity, personality, kinship, occupation, beauty ranking, and fashion classification by facial or body descriptor;
- Deep learning for detection and recognition of faces and bodies with large 3D rotation, illumination change, partial occlusion, unknown/changing background, and aging (i.e., in the wild); especially large 3D rotation robust face and gesture recognition;
- Motion analysis, tracking, and extraction of face and body models captured from several non-overlapping views;
- Face, gait, and action recognition in low-quality (e.g., blurred), or low-resolution video from fixed or mobile device cameras;
- AutoML for face and gesture analysis;
- Social/psychological-based studies that aid in understanding computational modeling and building better automated face and gesture systems with interactive features;
- Multimedia learning models involving faces and gestures (e.g., voice, wearable IMUs, and face);
- Trustworthy learning for face and gesture analysis, e.g., fairness, explainable and transparency;
- Other applications that involve face and gesture analysis.
with the 2003 ICCV in Nice, France. So far, it has been successfully held TEN times. The
homepages of the last three AMFG workshops are as follows:
- AMFG2003@CVPR
- AMFG2005@CVPR
- AMFG2007@CVPR
- AMFG2010@CVPR
- AMFG2013@CVPR
- AMFG2015@CVPR
- AMFG2017@CVPR
- AMFG2018@CVPR
- AMFG2019@CVPR
- AMFG2021@CVPR
Face and gesture (hands) modeling have been long-standing problems in the computer vision community,
and have also been widely explored and studied in many other workshops in recent years, yet
with different emphases compared with AMFG as follows:
- Face recognition towards security concerns, such as fairness in
ChaLearn2020@ECCV, face antispoofing
in ChaLearn2021@ICCV, and masked face recognition in MFR2021@ICCV - Hands modeling for action understanding, such as HANDS2022@ECCV and HBHA2022@ECCV
- Face and gesture modeling in VR/AR, such as WCPA2022@ECCV
and CV4ARVR2022@CVPR
The proposed workshop will focus on the fundamental research centering on face and gesture, and thus
provide theoretical and technical support to the above applications. The topics covered by AMFG will
also benefit the community in a more generalized context, including human-computer interaction,
multimodal learning, egocentric vision, artificial ethics, robotics, etc.
Dates
<<<<<<< HEAD
=======
[ 07/23/2023 ] Submission Deadline (Updated)
>>>>>>> 1fa7bc8 (updated date)
[ 08/21/2023 ] Camera-Ready Due
Author Guidelines
Submissions are handled via the
workshop’s CMT website:
https://cmt3.research.microsoft.com/AMFG2023
Following the guideline of ICCV2023:
https://iccv2023.thecvf.com/submission.guidelines-361600-2-20-16.php
-
- 8 pages (excluding references)
-
- Anonymous
- Using
ICCV LaTex templates
Organizers
Committee
- Ali Pourramezan Fard, University of Denver, USA
- Bin Sun, Northeastern University, USA
- Brian DeCann, Systems & Technology Research, USA
- Can Qin, Northeastern University, USA
- Chang Liu, Northeastern University, USA
- Chih-Ting Liu, Amazon, USA
- Huan Wang, Northeastern University, USA
- Jianglin Lu, Northeastern University, USA
- Jun Li, Nanjing University of Science and Technology, China
- Miao Xin, Institute of Automation, Chinese Academy of Sciences, China
- Ronghang Zhu, University of Georgia, USA
- Shuangjun Liu, Northeastern University, USA
- Siyu Xia, Southeast University, China
- Weijun Tan, Linksprite Technologies, USA
- Xu Ma, Northeastern University, USA
- Yizhou Wang, Northeastern University, USA
- Yue Bai, Northeastern University, USA
- Zaid Khan, Northeastern University, USA
- Zheng Zhang, Harbin Institute of Technology, Shenzhen, China
- Zhi Xu, Northeastern University, USA
- Zhongliang Zhou, University of Georgia, USA
Keynotes
With renowned public databases nearing saturation and off-the-shelf deep learning models becoming ubiquitous, AMFG2023 offers a fresh perspective. It encourages participants to delve deeper into the neural network “black box,” providing both a lens to dissect current models and a canvas for innovation. The emphasis is on real-world challenges, such as lighting, pose, and age variations, where conventional models often need to catch up.
This one-day event promises an engaging blend of keynote sessions and peer-reviewed presentations, fostering a collaborative space for visionaries across industries. From novel deep models, AutoML, and internet-scale biometrics to trustworthy learning and multimedia models, the thematic spectrum is vast and captivating.
Join us as we unravel the complexities of today’s face and gesture analysis, pushing boundaries and forging new paths in the wild, unconstrained territories of deep learning. At AMFG2023, the future is not just seen but shaped.
on October 2 (Local, Paris)
8:45
|
Chars’ Opening Remarks |
9:00
|
Ting-Chun Wang, NVIDIA, Invited Talk #1: Neural Face Synthesis and Animation |
10:00
|
In Kyu Park, Inha University, *Oral Presentation #1: M2C: Concise Music Representation for 3D Dance Generation |
10:15
|
Maksym Ivashechkin, University of Surrey, *Oral Presentation #2: Denoising Diffusion for 3D Hand Pose Estimation from Images |
10:30
|
Coffee Break I |
11:00
|
Giorgos Karvounas, CSD-UOC and ICS-FORTH, *Oral Presentation #3: Dynamic Multiview Refinement of 3D Hand Datasets using Differentiable Ray Tracing |
11:15
|
Noha Sarhan, University of Hamburg, *Oral Presentation #4: Unraveling a Decade: A Comprehensive Survey on Isolated Sign Language Recognition |
11:30
|
Manuel Kansy, ETH Zurich, *Oral Presentation #5: Controllable Inversion of Black-Box Face Recognition Models via Diffusion |
11:45
|
Pietro Melzi, Universidad Autonoma de Madrid, *Oral Presentation #6: GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations |
12:00
|
Ce Zheng, University of Central Florida, *Oral Presentation #7: POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition |
12:15
|
Weng-Tai Su, National Tsing Hua University, *Oral Presentation #8: Kinship Representation Learning with Face Componential Relation |
12:30
|
†Lunch Break (AMFG Poster Session) |
13:45
|
Chi Xu, Osaka University, *Oral Presentation #9: Occluded Gait Recognition via Silhouette Registration Guided by Automated Occlusion Degree Estimation |
14:00
|
Ammar Qammaz, CSD-UOC and ICS-FORTH, *Oral Presentation #10: A Unified Approach for Occlusion Tolerant 3D Facial Pose Capture and Gaze Estimation using MocapNETs |
14:15
|
Rama Chellappa, Johns Hopkins University, Invited Talk #2: Person Recognition and Identification at Altitude and Range |
15:15
|
Coffee Break II |
15:45
|
Raja Kumar, University of California Santa Cruz, *Oral Presentation #11: Disjoint Pose and Shape for 3D Face Reconstruction |
16:00
|
Andreas Döring, University of Bonn, *Oral Presentation #12: A Gated Attention Transformer for Multi-Person Pose Tracking |
16:15
|
Cédric Rommel, valeo.ai, *Oral Presentation #13: DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion |
16:30
|
Bernhard Egger, Friedrich-Alexander-University Erlangen-Nuremberg, *Oral Presentation #14: PoseBias: On Dataset Bias and Task Difficulty – Is there an Optimal Camera Position for Facial Image Analysis? |
16:45
|
Xiaoming Liu, Michigan State University, Invited Talk #3: Person Identification at a (Far) Distance |
17:45
|
Best Paper Announcement and Conclusion |
† All oral presentations will be presented during the poster session for continued conversation and additional questions.