:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Jun, Zhang, Yunxiang, Zheng, Naixiang, Zhu, Lingsi, Wang, Guoyuan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.11306
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Solution to the 10th ABAW Expression Recognition Challenge: A Robust Multimodal Framework with Safe Cross-Attention and Modality Dropout
by: Yu, Jun, et al.
Published: (2026)

Anchoring Emotions in Text: Robust Multimodal Fusion for Mimicry Intensity Estimation
by: Zhu, Lingsi, et al.
Published: (2026)

Technical Approach for the EMI Challenge in the 8th Affective Behavior Analysis in-the-Wild Competition
by: Yu, Jun, et al.
Published: (2025)

Solution for 8th Competition on Affective & Behavior Analysis in-the-wild
by: Yu, Jun, et al.
Published: (2025)

A2Mamba: Attention-augmented State Space Models for Visual Recognition
by: Lou, Meng, et al.
Published: (2025)

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
by: Fu, Yunxiang, et al.
Published: (2024)

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
by: Zhang, Libo, et al.
Published: (2023)

WildPose: A Unified Framework for Robust Pose Estimation in the Wild
by: Zheng, Jianhao, et al.
Published: (2026)

SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection
by: Xu, Wenhao, et al.
Published: (2025)

TrackletGait: A Robust Framework for Gait Recognition in the Wild
by: Zhang, Shaoxiong, et al.
Published: (2025)

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
by: Lou, Meng, et al.
Published: (2024)

VL-Mamba: Exploring State Space Models for Multimodal Learning
by: Qiao, Yanyuan, et al.
Published: (2024)

Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning
by: Huang, Weijian, et al.
Published: (2023)

RPBG: Towards Robust Neural Point-based Graphics in the Wild
by: Zhu, Qingtian, et al.
Published: (2024)

Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning
by: Zhu, Yinglian, et al.
Published: (2025)

MIAR: Modality Interaction and Alignment Representation Fuison for Multimodal Emotion
by: Zhu, Jichao, et al.
Published: (2026)

Multimodal Instruction Tuning with Hybrid State Space Models
by: Zhou, Jianing, et al.
Published: (2024)

RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation
by: Xu, Guoan, et al.
Published: (2026)

AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts
by: Yu, Jun, et al.
Published: (2024)

DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance
by: Yang, Zhao, et al.
Published: (2025)

MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading
by: Zhang, Wenhao, et al.
Published: (2024)

FED-PsyAU: Privacy-Preserving Micro-Expression Recognition via Psychological AU Coordination and Dynamic Facial Motion Modeling
by: Li, Jingting, et al.
Published: (2025)

Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher
by: Chen, Jiangming, et al.
Published: (2024)

UCS: A Universal Model for Curvilinear Structure Segmentation
by: Zhu, Kai, et al.
Published: (2025)

Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model
by: Zhang, Tianpei, et al.
Published: (2025)

It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment
by: Zheng, Jinkai, et al.
Published: (2024)

VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
by: Li, Wenlong, et al.
Published: (2025)

Hierarchical Disentanglement-Alignment Network for Robust SAR Vehicle Recognition
by: Li, Weijie, et al.
Published: (2023)

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)

Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)

Large Language Models Facilitate Vision Reflection in Image Classification
by: An, Guoyuan, et al.
Published: (2025)

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild
by: Gushchin, Aleksandr, et al.
Published: (2026)

DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection
by: Li, Haochen, et al.
Published: (2026)

Detect Anything 3D in the Wild
by: Zhang, Hanxue, et al.
Published: (2025)

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
by: Ge, Chunjiang, et al.
Published: (2024)

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation
by: Liao, Bencheng, et al.
Published: (2025)

Hierarchical Feature Learning for Medical Point Clouds via State Space Model
by: Zhang, Guoqing, et al.
Published: (2025)

Deep Models, Shallow Alignment: Uncovering the Granularity Mismatch in Neural Decoding
by: Du, Yang, et al.
Published: (2026)

AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction
by: Zhang, Shuaiyu, et al.
Published: (2025)

Hierarchical Semantic Alignment for Image Clustering
by: Zhu, Xingyu, et al.
Published: (2025)