Saved in:
| Main Authors: | Li, Yifan, Dao, Anh, Bao, Wentao, Tan, Zhen, Chen, Tianlong, Liu, Huan, Kong, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.05052 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Window Token Concatenation for Efficient Visual Large Language Models
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Visual Large Language Models for Generalized and Specialized Applications
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Weakly Supervised Learning for Facial Affective Behavior Analysis : A Review
by: Praveen, R. Gnana, et al.
Published: (2021)
by: Praveen, R. Gnana, et al.
Published: (2021)
Robust Light-Weight Facial Affective Behavior Recognition with CLIP
by: Lin, Li, et al.
Published: (2024)
by: Lin, Li, et al.
Published: (2024)
Scalable Audio-Visual Masked Autoencoders for Efficient Affective Video Facial Analysis
by: Wu, Xuecheng, et al.
Published: (2025)
by: Wu, Xuecheng, et al.
Published: (2025)
IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
by: Xing, Bohao, et al.
Published: (2024)
by: Xing, Bohao, et al.
Published: (2024)
Open Set Face Forgery Detection via Dual-Level Evidence Collection
by: Cai, Zhongyi, et al.
Published: (2025)
by: Cai, Zhongyi, et al.
Published: (2025)
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
by: Chen, Yuxiao, et al.
Published: (2024)
by: Chen, Yuxiao, et al.
Published: (2024)
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
by: Bao, Wentao, et al.
Published: (2023)
by: Bao, Wentao, et al.
Published: (2023)
Affective Behaviour Analysis via Progressive Learning
by: Liu, Chen, et al.
Published: (2024)
by: Liu, Chen, et al.
Published: (2024)
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness
by: Zhao, Jiaxing, et al.
Published: (2025)
by: Zhao, Jiaxing, et al.
Published: (2025)
Solution for 8th Competition on Affective & Behavior Analysis in-the-wild
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
Task-Aware Resolution Optimization for Visual Large Language Models
by: Luo, Weiqing, et al.
Published: (2025)
by: Luo, Weiqing, et al.
Published: (2025)
FairSkin: Fair Diffusion for Skin Disease Image Generation
by: Zhang, Ruichen, et al.
Published: (2024)
by: Zhang, Ruichen, et al.
Published: (2024)
MissBench: Benchmarking Multimodal Affective Analysis under Imbalanced Missing Modalities
by: Pham, Tien Anh, et al.
Published: (2026)
by: Pham, Tien Anh, et al.
Published: (2026)
Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network
by: Li, Xiaodong, et al.
Published: (2024)
by: Li, Xiaodong, et al.
Published: (2024)
Affective Behaviour Analysis via Integrating Multi-Modal Knowledge
by: Zhang, Wei, et al.
Published: (2024)
by: Zhang, Wei, et al.
Published: (2024)
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition
by: Kollias, Dimitrios, et al.
Published: (2024)
by: Kollias, Dimitrios, et al.
Published: (2024)
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
by: Bao, Wentao, et al.
Published: (2024)
by: Bao, Wentao, et al.
Published: (2024)
CausalAffect: Causal Discovery for Facial Affective Understanding
by: Hu, Guanyu, et al.
Published: (2025)
by: Hu, Guanyu, et al.
Published: (2025)
CoIDO: Efficient Data Selection for Visual Instruction Tuning via Coupled Importance-Diversity Optimization
by: Yan, Yichen, et al.
Published: (2025)
by: Yan, Yichen, et al.
Published: (2025)
A Generative Framework for Self-Supervised Facial Representation Learning
by: He, Ruian, et al.
Published: (2023)
by: He, Ruian, et al.
Published: (2023)
To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model
by: Zhao, Chengshuai, et al.
Published: (2026)
by: Zhao, Chengshuai, et al.
Published: (2026)
Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
by: Dao, Quan, et al.
Published: (2024)
by: Dao, Quan, et al.
Published: (2024)
3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding
by: Li, Zeju, et al.
Published: (2024)
by: Li, Zeju, et al.
Published: (2024)
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
by: Chen, Liyang, et al.
Published: (2023)
by: Chen, Liyang, et al.
Published: (2023)
A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIP
by: Tao, Zeng, et al.
Published: (2024)
by: Tao, Zeng, et al.
Published: (2024)
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
by: Zhang, Yupei, et al.
Published: (2024)
by: Zhang, Yupei, et al.
Published: (2024)
Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
CFCPalsy: Facial Image Synthesis with Cross-Fusion Cycle Diffusion Model for Facial Paralysis Individuals
by: Gao, Weixiang, et al.
Published: (2024)
by: Gao, Weixiang, et al.
Published: (2024)
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
by: Liu, Ruyang, et al.
Published: (2023)
by: Liu, Ruyang, et al.
Published: (2023)
Streaming Video Instruction Tuning
by: Xia, Jiaer, et al.
Published: (2025)
by: Xia, Jiaer, et al.
Published: (2025)
PPBoost: Progressive Prompt Boosting for Text-Driven Medical Image Segmentation
by: Li, Xuchen, et al.
Published: (2025)
by: Li, Xuchen, et al.
Published: (2025)
What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning
by: Du, Yifan, et al.
Published: (2023)
by: Du, Yifan, et al.
Published: (2023)
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
by: Chaubey, Ashutosh, et al.
Published: (2025)
by: Chaubey, Ashutosh, et al.
Published: (2025)
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis
by: She, Dong, et al.
Published: (2026)
by: She, Dong, et al.
Published: (2026)
StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
by: Gou, Yunhao, et al.
Published: (2023)
by: Gou, Yunhao, et al.
Published: (2023)
Similar Items
-
Window Token Concatenation for Efficient Visual Large Language Models
by: Li, Yifan, et al.
Published: (2025) -
IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
by: Li, Yifan, et al.
Published: (2025) -
Visual Large Language Models for Generalized and Specialized Applications
by: Li, Yifan, et al.
Published: (2025) -
Weakly Supervised Learning for Facial Affective Behavior Analysis : A Review
by: Praveen, R. Gnana, et al.
Published: (2021) -
Robust Light-Weight Facial Affective Behavior Recognition with CLIP
by: Lin, Li, et al.
Published: (2024)