In conjunction with ICCV 2025
October 19 or 20 (TBD)
๐ Motivation
The domains of face, gesture, and cross-modal recognition have experienced tremendous progress fueled by deep learning and large-scale annotated datasets. From the early days of AlexNet to today's transformer-based architectures, performance across public benchmarks has improved dramatically. However, this success has come at a cost โ namely, decreased model explainability, limited generalization to unconstrained environments, and a growing dependence on opaque, pre-trained systems.
Despite saturation on traditional datasets, real-world applications often expose critical weaknesses: performance degrades under extreme pose variations, poor lighting, partial occlusion, or unpredictable subject behavior. Addressing these limitations requires robust training strategies, better data representations, and deeper, more interpretable models.
Meanwhile, multimodal learning has surged, integrating signals such as voice, face, and gesture to power applications in social media, HCI, surveillance, and affective computing. The next generation of systems must go beyond recognition โ they must reason, adapt, and function reliably in complex, real-world conditions.
๐ฏ Topics of Interest
๐ Core Vision Tasks
- 2D and 3D tracking of faces, hands, bodies, and actions across time
- Robust recognition across pose, occlusion, age, illumination, and resolution
- Segmentation and parsing of face and body parts for fine-grained analysis
๐จ Generative & Neural Rendering
- Neural rendering of expressions, dance, or gestures in AR/VR environments
- Controllable diffusion models and GANs for person-specific synthesis
- Cross-domain generative modeling (e.g., sketch-to-photo, video-to-avatar)
๐ง Learning Paradigms
- Few-shot, zero-shot, continual, and domain-adaptive learning techniques
- Vision-language foundation models for face/gesture understanding
- Large Vision Models (LVMs) and LLVMs as foundation or fine-tuned systems
- AutoML and architecture search for face/gesture pipelines
๐งฌ Soft Biometrics & Identity Understanding
- Emotion, personality, attention, fatigue, and sentiment analysis
- Social signal processing and behavioral trait inference
- Explainable and trustworthy models for identity-related inference
๐ฅ Multimodal and Cross-Modal Analysis
- Multimodal transformers for joint face-body-speech analysis
- Cross-modal generation: text-to-face, speech-to-gesture, etc.
- Alignment and synchronization across modalities (e.g., lip-sync)
๐งช Applications, Benchmarks, and Analysis
- Deployment studies and case reports in real-world scenarios
- Failure analysis, uncertainty estimation, and model auditing
- Interactive and interruptible AI systems for decision support
๐งญ Nature-Inspired & Cognitive Systems
- Vision systems for ethology, animal behavior, and neuroscience
- Cognitive modeling of gaze, micro-expressions, and attentional cues
- Integrating affective computing with behavioral science
โ๏ธ Ethics, Fairness, and Society
- Interpretability and transparency in face/gesture pipelines
- Regulatory frameworks and societal impacts of face technologies
- AI for accessibility, assistive tech, and inclusive HCI
๐ Related Workshops
The first AMFG was held in conjunction with the 2003 ICCV in Nice, France. It has since been successfully held ten times. Below are links to the most recent workshops:
๐๏ธ Past AMFG Workshops
Face and gesture (hands) modeling have been long-standing problems in the computer vision community. While many related workshops have emerged, AMFG retains a distinct focus. Other workshops with complementary emphases include:
๐งฉ Complementary Workshops
Face recognition with security focus: ChaLearn2020@ECCV, ChaLearn2021@ICCV, MFR2021@ICCV
Hands modeling for action understanding: HANDS2022@ECCV, HBHA2022@ECCV
Face and gesture modeling in VR/AR: WCPA2022@ECCV, CV4ARVR2022@CVPR
Recognizing Families In the Wild (RFIW): RFIW2020, RFIW2019, RFIW2018, RFIW2017
AMFG continues to provide theoretical and technical depth in face and gesture research. Its impact spans broader domains including human-computer interaction, multimodal learning, egocentric vision, artificial ethics, and robotics.
[ 07/04/2025 ] | Submission Deadline |
[ 07/10/2025 ] | Notification |
[ 08/18/2025 ] | Camera-Ready Due |
Submissions are handled via the workshop's OpenReview page.
Follow the official ICCV2025 guidelines:
ICCV Submission Guidelines
โข 8 pages (excluding references)
โข Anonymous submission
โข Use the ICCV LaTeX templates