The 10th IEEE International Workshop on
Analysis and Modeling of Faces and Gestures
In conjunction with CVPR 2021
19 June 2021 (Saturday) starting from 3PM EST (Live)
Papers
We have experienced rapid advances in face, gesture, and cross-modality (e.g., voice and face) technologies. This is thanks to deep learning (i.e., dating back to 2012, AlexNet) and large-scale, labeled datasets. The progress in deep learning continue to push renowned public databases to near saturation which, thus, calls for evermore challenging image collections to be compiled as databases. In practice, and even widely in applied research, using off-the-shelf deep learning models has become the norm, as numerous pre-trained networks are available for download and are readily deployed to new, unseen data (e.g., VGG-Face, ResNet, amongst other types). We have almost grown “spoiled” from such luxury, which, in all actuality, has enabled us to stay hidden from many truths. Theoretically, the truth behind what makes neural networks more discriminant than ever before is still, in all fairness, unclear—rather, they act as a sort of black box to most practitioners and even researchers, alike. More troublesome is the absence of tools to quantitatively and qualitatively characterize existing deep models, which, in itself, could yield greater insights about these all so familiar black boxes. With the frontier moving forward at rates incomparable to any spurt of the past, challenges such as high variations in illuminations, pose, age, etc., now confront us. However, state-of-the-art deep learning models often fail when faced with such challenges owing to the difficulties in modeling structured data and visual dynamics.Alongside the effort spent on conventional face recognition is the research done across modality learning, such as face and voice, gestures in imagery and motion in videos, along with several other tasks. This line of work has attracted attention from industry and academic researchers from all sorts of domains. Additionally, and in some cases with this, there has been a push to advance these technologies for social media based applications. Regardless of the exact domain and purpose, the following capabilities must be satisfied: face and body tracking (e.g., facial expression analysis, face detection, gesture recognition), lip reading and voice understanding, face and body characterization (e.g., behavioral understanding, emotion recognition), face, body and gesture characteristic analysis (e.g., gait, age, gender, ethnicity recognition), group understanding via social cues (e.g., kinship, non-blood relationships, personality), and visual sentiment analysis (e.g., temperament, arrangement). Thus, needing to be able to create effective models for visual certainty has significant value in both the scientific communities and the commercial market, with applications that span topics of human-computer interaction, social media analytics, video indexing, visual surveillance, and internet vision. Currently, researchers have made significant progress addressing many of these problems, and especially when considering off-the-shelf and cost-efficient vision HW products available these days, e.g. Intel RealSense, Magic Leap, SHORE, and Affdex. Nonetheless, serious challenges still remain, which only amplifies when considering the unconstrained imaging conditions captured by different sources focused on non-cooperative subjects. It is these latter challenges that especially grabs our interest, as we sought out to bring together the cutting-edge techniques and recent advances of deep learning to solve the challenges in the wild.This one-day serial workshop (AMFG2021) provides a forum for researchers to review the recent progress of recognition, analysis, and modeling of face, body, and gesture, while embracing the most advanced deep learning systems available for face and gesture analysis, particularly, under an unconstrained environment like social media and across modalities like face to voice. The workshop includes up to 3 keynotes and peer-reviewed papers (oral and poster). Original high-quality contributions are solicited on the following topics:
- Novel deep model, deep learning survey, or comparative study for face/gesture recognition;
- Deep learning methodology, theory, as applied to social media analytics;
- Data-driven or physics-based generative models for faces, poses, and gestures;
- Deep learning for internet-scale soft biometrics and profiling: age, gender, ethnicity, personality, kinship, occupation, beauty ranking, and fashion classification by facial or body descriptor;
- Deep learning for detection and recognition of faces and bodies with large 3D rotation, illumination change, partial occlusion, unknown/changing background, and aging (i.e., in the wild); especially large 3D rotation robust face and gesture recognition;
- Motion analysis, tracking and extraction of face and body models captured from several non-overlapping views;
- Face, gait, and action recognition in low-quality (e.g., blurred), or low-resolution video from fixed or mobile device cameras;
- AutoML for face and gesture analysis;
- Mathematical models and algorithms, sensors and modalities for face & body gesture and action representation, analysis, and recognition for cross-domain social media;
- Social/psychological based studies that aid in understanding computational modeling and building better automated face and gesture systems with interactive features;
- Multimedia learning models involving faces and gestures (e.g., voice, wearable IMUs, and face);
- Social applications involving detection, tracking & recognition of face, body, and action;
- Face and gesture analysis for sentiment analysis in social context;
- Other applications involving face and gesture analysis in social media content.
Previous AMFG Workshops
name was held in 2003, in conjunction with ICCV2003 in Nice, France. So far, it has
been successfully held NINE times. The homepages of previous five AMFG are as
follows:
AMFG2013: http://www.northeastern.edu/smilelab/AMFG2013/home.html
AMFG2015: http://www.northeastern.edu/smilelab/AMFG2015/home.html
AMFG2017: https://web.northeastern.edu/smilelab/AMFG2017/index.html
AMFG2018: https://fulab.sites.northeastern.edu/amfg2018/
AMFG2019: https://fulab.sites.northeastern.edu/amfg2019/
Dates
03/10/2021[ 04/07/2021 ] Submission Deadline
04/05/2021[ 04/20/2021 ] Camera-Ready Due
Author Guidelines
Submissions are handled via the
workshop's CMT website:
https://cmt3.research.microsoft.com/BAMFG2021/Submission/Index
Following the guideline of CVPR2021:
http://cvpr2021.thecvf.com/node/33#submission-guidelines
- 8 pages (+ references)
- Anonymous
- Using
CVPR Latex/Word templates
Organizers
Mike Jones, Mitsubishi Electric Research Laboratories (MERL), Cambridge, USA.
Chairs
Sheng Li, University of Georgia, Athens, GA, USA.
Committee
- Handong Zhao, Adobe Research, USA
- Bineng Zhong, Huaqiao University
- Chengcheng Jia, Huawei, USA
- Junchi Yan, Shanghai Jiao Tong University
- Jun Li, Nanjing University of Science and
Technology - Hong Pan, Southeast University, China
- Shuyang Wang, Shiseido Americas
- Samson Timoner, ISM Connect
- Ali Pourramezan Fard, University of Denver
- Anirudh Tunga, Purdue University
- Juan Wachs, Purdue University
- Ankith Jain Rakesh Kumar, University of
California Riverside - Bin Sun, Northeastern University
- Marah Halawa, Technische Universität Berlin
- Nima Aghli, Florida Institute of Technology
- René Haas, IT University of Copenhagen
- Ronghang Zhu, University of Georgia
- Shuyang Wang, Shiseido Americas
- Zhongliang Zhou, University of Georgia
- Aleix Martinez, The Ohio State University
- Yingli Tian, City University of New York
- Chengjun Liu, New Jersey Institute of
Technology - Liang Zheng, Australian National University
- Thomas Moeslund, Aalborg University, Denmark
- Kai Qin, Swinburne University of Technology
- Binod Bhattarai, Imperial College London
- Can Qin, Northeastern University
- Chao Gou, Sun Yat-Sen University
- Dimitrios Mallis, University of Nottingham
- Haoxiang Li, Wormpex AI Research
- Jihyun Lee, KAIST
- Kushajveer Singh, University of Georgia
- Songyao Jiang, Northeastern University
- Taotao Jing, Tulane University
- Weijun Tan, Linksprite Technologies
- Zaid Khan, Northeastern University
- Zheng Zhang, Harbin Institute of Technology,
Shenzhen
Keynotes
Sergey Tulyakov, Snap Inc.
Title: Representations for Content Creation, Manipulation and Animation
Abstract.
“What I cannot create, I do not understand” said the famous writing on Dr. Feynman’s blackboard. The ability to create or to change objects requires us to understand their structure and factors of variation. For example, to draw a face an artist is required to know its composition and have a good command of drawing skills (the latter is particularly challenging for the presenter). Animation additionally requires the knowledge of rigid and non-rigid motion patterns of the object. This talk shows that generation, manipulation and animation skills of deep generative models substantially benefit from such understanding. Moreover we see, the better the models can explain the data they see during training, the higher quality content they are able to generate. Understanding and generation form a loop in which improved understanding improves generation, improving understanding even more. To show this, I detail our works in three areas: video synthesis and prediction, image animation by motion retargeting. I will further introduce a new direction in video generation which allows the user to play videos as they're generated. In each of these works, the internal representation was designed to facilitate better understanding of the task, resulting in improved generation abilities. Without a single labeled example, our models are able to understand factors of variation, object parts, their shapes, their motion patterns and perform creative manipulations previously only available to trained professionals equipped with specialized software and hardware.
Bio.
Sergey Tulyakov is a Principal Research Scientist heading the Creative Vision team at Snap Research. His work focuses on creating methods for manipulating the world via computer vision and machine learning. This includes human and object understanding, photorealistic manipulation and animation, video synthesis, prediction and retargeting. He pioneered the unsupervised image animation domain with MonkeyNet and First Order Motion Model that sparkled a number of startups in the domain. His work on Interactive Video Stylization received the Best in Show Award at SIGGRAPH Real-Time Live! 2020. He has published 30+ top conference papers, journals and patents resulting in multiple innovative products, including Snapchat Pet Tracking, OurBaby Snappable, Real-time Neural Lenses (gender swap, baby face, aging lens, face animation) and many others. Before joining Snap Inc., Sergey was with Carnegie Mellon University, Microsoft, NVIDIA. He holds a PhD degree from the University of Trento, Italy.
David Bau, MIT Computer Science and Artificial Intelligence Laboratory.
Title: Cracking Open AI for Insight and Creativity
Abstract.
When we treat a model as a black box, we can overlook much of the knowledge that the model contains. For example, when using inputs or outputs to understand how a neural network responds to a person's hair, we could examine hair color or length because that is what we can easily count externally. But our simplistic external view might miss a model's internal ability to reason about a wider variety of more subtle hairstyles. In this talk I will discuss the benefits of cracking open deep networks to understand them from the inside. Drawing examples from state-of-the-art generative models, I will talk about how a direct look at model internals can help us understand the way a model decomposes problems, what a model is blind to, and how the rules of a model can be modified without training. We will dissect and manipulate models of scenes, animals, and people, and we will see how opening up models can enable new insights, new applications, and new research questions.
Bio.
David Bau received his A.B. in Mathematics from Harvard, his M.S. in Computer Science from Cornell, and is completing his Ph.D. in EECS at MIT. David has pioneered methods for the dissection, visualization, and interactive manipulation of deep networks in computer vision, and he is the creator of Network Dissection and GAN Paint, which enable a person to directly edit the internals of state-of-the-art neural networks. David is coauthor of a widely-used graduate textbook, Numerical Linear Algebra. Previous to his research at MIT he was an engineer at Google where he built Image Search ranking algorithms, Hangouts realtime communications, and the Pencil Code educational programming system. David plans to begin as Assistant Professor at Northeastern Khoury school of computer science next year.
Ming-Yu Liu, NVIDIA.
Title: Face-vid2vid and GANcraft
Abstract.
In this talk, I will present our face-vid2vid and GANcraft works. face-vid2vid is a neural talking-head rendering engine we developed for the video-conferencing application. It learns a keypoint representation and decomposition in an unsupervised manner. The representation and decomposition allow us to extract a compact representation from a video frame on the sender side and reconstruct it faithfully on the receiver side. Empirically, we find it achieves 10x bandwidth saving compared to H.264. We also find it allows other applications such as character animation and face redirection. GANcraft is a neural rendering engine for Minecraft worlds. It learns to map the blocky Minecraft worlds to photorealistic real worlds. The key innovation is a way to marry GANs and NeRF. Through the pseudo-image supervision, we overcome the challenge of no corresponding real-world imagery for a Minecraft world.
Bio.
I am a Distinguished Research Scientist and a Manager with NVIDIA Research. My research group focuses on deep generative models and their applications. We have created several high-impact research works. Several of them have enabled exciting new products, including NVIDIA GauGAN and NVIDIA Maxine. I am looking for talents to join my team. Please DM to my NVIDIA email if you are also passionate about generative models and their applications.
Prior to NIVIDA, I was a Principal Research Scientist with Mitsubishi Electric Research Laboratories (MERL). I received my Ph.D. degree from the University of Maryland, College Park, MD, USA, in 2012, advised by Prof. Rama Chellappa.
on 19 June 2021 (Eastern Time (ET))
3:00 PM
|
Welcome message |
3:10
PM |
Keynote 1: David Bau |
3:55PM
|
Xin, Miao; Mo, Shentong; Lin, Yuanze. EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks (Best Paper) |
4:00PM
|
Chu, Hau; Lee, Jia-Hong; Lee, Yao-Chih; Hsu, Ching-Hsien; Li, Jia-Da; Chen, Chu-Song. Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking |
4:05PM
|
Liu, Rushuai; Tan, Weijun. EQFace: A Simple Explicit Quality Network for Face Recognition |
4:10PM
|
Liu, Chih-Ting; Chen, Jun-Cheng; Chen, Chu-Song; Chien, Shao-Yi. Video-based Person Re-identification without Bells and Whistles |
4:15PM
|
Lee, Jihyun; Bhattarai, Binod; Kim, Tae-Kyun. Face Parsing from RGB and Depth Using Cross-Domain Mutual Learning |
4:20PM
|
Rakesh Kumar, Ankith Jain; Bhanu, Bir. Micro-Expression Classification based on Landmark Relations with Graph Attention Convolutional Network |
4:25PM
|
Pourramezan Fard, Ali; abdollahi, hojjat; Mahoor, Mohammad. ASMNet: a Lightweight Deep Neural Network for Face Alignment and Pose Estimation |
4:30
PM |
Coffe Break |
5:00
PM |
Keynote 2: Sergey Tulyakov |
5:45
PM |
Banerjee, Sandipan; Joshi, Ajjen; Mahajan, Prashant; Bhattacharya, Sneha; Kyal, Survi; Mishra, Taniya. LEGAN: Disentangled Manipulation of Directional Lighting and Facial Expressions whilst Leveraging Human Perceptual Judgements (Runner Up) |
5:50
PM |
Vyas, Kathan; Jiang, Le; Liu, Shuangjun ; Ostadabbas, Sarah. An Efficient 3D Synthetic Model Generation Pipeline for Human Pose Data Augmentation |
5:55
PM |
Keynote 3: Ming-Yu Liu |
6:40
PM |
Awards ceremony |
6:45
PM |
Workshop
adjourned |
-->