:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kang, Caixin, Yan, Tianyu, Gong, Sitong, Zhang, Mingfang, Ouyang, Liangyang, Liu, Ruicong, Zheng, Bo, Lu, Huchuan, Zhang, Kaipeng, Sato, Yoichi, Huang, Yifei
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition Computers and Society
Online Access:	https://arxiv.org/abs/2605.22109
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions
by: Kang, Caixin, et al.
Published: (2025)

Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions
by: Kang, Caixin, et al.
Published: (2025)

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation
by: Ouyang, Liangyang, et al.
Published: (2026)

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting
by: Liu, Ruicong, et al.
Published: (2025)

Multi-speaker Attention Alignment for Multimodal Social Interaction
by: Ouyang, Liangyang, et al.
Published: (2025)

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
by: Zhang, Mingfang, et al.
Published: (2025)

Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels
by: Huang, Yifei, et al.
Published: (2025)

ActionVOS: Actions as Prompts for Video Object Segmentation
by: Ouyang, Liangyang, et al.
Published: (2024)

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
by: Zhang, Mingfang, et al.
Published: (2024)

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
by: Liu, Ruicong, et al.
Published: (2024)

Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation
by: Liu, Ruicong, et al.
Published: (2025)

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
by: Zhang, Mingfang, et al.
Published: (2026)

Leadership Assessment in Pediatric Intensive Care Unit Team Training
by: Ouyang, Liangyang, et al.
Published: (2025)

Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild
by: Lin, Nie, et al.
Published: (2024)

LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing
by: Ouyang, Liangyang, et al.
Published: (2025)

Complementary and Contrastive Learning for Audio-Visual Segmentation
by: Gong, Sitong, et al.
Published: (2025)

Towards Interactive Intelligence for Digital Humans
by: Cai, Yiyi, et al.
Published: (2025)

AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities
by: Banno, Tatsuro, et al.
Published: (2025)

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
by: Lin, Nie, et al.
Published: (2025)

The N-Body Problem: Parallel Execution from Single-Person Egocentric Video
by: Zhu, Zhifan, et al.
Published: (2025)

Parameter Aware Mamba Model for Multi-task Dense Prediction
by: Yu, Xinzhuo, et al.
Published: (2025)

Reinforcing Video Reasoning Segmentation to Think Before It Segments
by: Gong, Sitong, et al.
Published: (2025)

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
by: Gong, Sitong, et al.
Published: (2025)

AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation
by: Gong, Sitong, et al.
Published: (2025)

Enhancing Impression Change Prediction in Speed Dating Simulations Based on Speakers' Personalities
by: Matsuo, Kazuya, et al.
Published: (2025)

Can MLLMs Understand the Deep Implication Behind Chinese Images?
by: Zhang, Chenhao, et al.
Published: (2024)

MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)

Enhancing Representation Learning of EEG Data with Masked Autoencoders
by: Zhou, Yifei, et al.
Published: (2024)

Prompt and Prejudice
by: Berlincioni, Lorenzo, et al.
Published: (2024)

Linking Perception, Confidence and Accuracy in MLLMs
by: Du, Yuetian, et al.
Published: (2026)

Subjective Face Transform using Human First Impressions
by: Roygaga, Chaitanya, et al.
Published: (2023)

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation
by: Wang, Haowei, et al.
Published: (2023)

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
by: Zhang, Pingping, et al.
Published: (2024)

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection
by: Gao, Shixuan, et al.
Published: (2024)

Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
by: Qiu, Yansheng, et al.
Published: (2025)

Can Impressions of Music be Extracted from Thumbnail Images?
by: Harada, Takashi, et al.
Published: (2025)

Can MLLMs Perform Text-to-Image In-Context Learning?
by: Zeng, Yuchen, et al.
Published: (2024)

LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification
by: Zhang, Pingping, et al.
Published: (2025)

X-ReID: Multi-granularity Information Interaction for Video-Based Visible-Infrared Person Re-Identification
by: Yu, Chenyang, et al.
Published: (2025)

Coarse-to-Fine Personalized LLM Impressions for Streamlined Radiology Reports
by: Sun, Chengbo, et al.
Published: (2025)