:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Wenhao, Wang, Jun, Luo, Yong, Yu, Lei, Yu, Wei, He, Zheng, Shen, Jialie
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.11979
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
by: Wang, He, et al.
Published: (2024)

MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
by: Yuan, Zhenlong, et al.
Published: (2024)

Wasserstein-Aligned Hyperbolic Multi-View Clustering
by: Wang, Rui, et al.
Published: (2025)

Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning
by: Zhu, Yinglian, et al.
Published: (2025)

Event-based Motion Deblurring via Multi-Temporal Granularity Fusion
by: Lin, Xiaopeng, et al.
Published: (2024)

Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
by: Zhou, Ziheng, et al.
Published: (2024)

EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
by: Ge, Chengjie, et al.
Published: (2025)

Template-Based Feature Aggregation Network for Industrial Anomaly Detection
by: Luo, Wei, et al.
Published: (2026)

Multi-Granularity Hand Action Detection
by: Zhe, Ting, et al.
Published: (2023)

VALLR: Visual ASR Language Model for Lip Reading
by: Thomas, Marshall, et al.
Published: (2025)

RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection
by: Yu, Xinquan, et al.
Published: (2024)

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes
by: Liu, Weifeng, et al.
Published: (2024)

LASER: Lip Landmark Assisted Speaker Detection for Robustness
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
by: Yeo, Jeong Hun, et al.
Published: (2024)

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
by: Wu, Jiang, et al.
Published: (2024)

Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)

Rethinking Event-Based Object Dtection through Representation-Level Temporal Aggregation and Model-Level Hypergraph Reasoning
by: Wang, Meisen, et al.
Published: (2026)

Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
by: Wu, Linzhi, et al.
Published: (2024)

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
by: Park, Young-Hu, et al.
Published: (2025)

Optimized View and Geometry Distillation from Multi-view Diffuser
by: Zhang, Youjia, et al.
Published: (2023)

Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
by: Shen, Wenhao, et al.
Published: (2024)

View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
by: Zhang, Qi, et al.
Published: (2024)

Exploring Spectral Characteristics for Single Image Reflection Removal
by: Guo, Pengbo, et al.
Published: (2025)

EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models
by: Xu, Wenhao, et al.
Published: (2025)

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
by: Min, Zhiyuan, et al.
Published: (2023)

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
by: Wang, Weitao, et al.
Published: (2024)

Hierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild
by: Yu, Jun, et al.
Published: (2026)

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
by: Liu, Qinying, et al.
Published: (2023)

VFM-Loc: Zero-Shot Cross-View Geo-Localization via Aligning Discriminative Visual Hierarchies
by: Lu, Jun, et al.
Published: (2026)

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
by: Luo, Songtao, et al.
Published: (2023)

PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
by: Yuan, Yuqian, et al.
Published: (2025)

AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation
by: Liang, Xinyue, et al.
Published: (2025)

Self-Navigated Residual Mamba for Universal Industrial Anomaly Detection
by: Li, Hanxi, et al.
Published: (2025)

SAUGE: Taming SAM for Uncertainty-Aligned Multi-Granularity Edge Detection
by: Liufu, Xing, et al.
Published: (2024)

Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter
by: Zhu, Yanyu, et al.
Published: (2025)

Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos
by: Wei, Dingkun, et al.
Published: (2026)

Noise-Started One-Step Real-World Super-Resolution via LR-Conditioned SplitMeanFlow and GAN Refinement
by: Zhu, Wei, et al.
Published: (2026)

Learning Parallax for Stereo Event-based Motion Deblurring
by: Lin, Mingyuan, et al.
Published: (2023)

Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading
by: Li, Haoran, et al.
Published: (2026)

RAISECity: A Multimodal Agent Framework for Reality-Aligned 3D World Generation at City-Scale
by: Wang, Shengyuan, et al.
Published: (2025)