Saved in:
| Main Authors: | Zhang, Wenhao, Wang, Jun, Luo, Yong, Yu, Lei, Yu, Wei, He, Zheng, Shen, Jialie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.11979 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
by: Yuan, Zhenlong, et al.
Published: (2024)
by: Yuan, Zhenlong, et al.
Published: (2024)
Wasserstein-Aligned Hyperbolic Multi-View Clustering
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning
by: Zhu, Yinglian, et al.
Published: (2025)
by: Zhu, Yinglian, et al.
Published: (2025)
Event-based Motion Deblurring via Multi-Temporal Granularity Fusion
by: Lin, Xiaopeng, et al.
Published: (2024)
by: Lin, Xiaopeng, et al.
Published: (2024)
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
by: Zhou, Ziheng, et al.
Published: (2024)
by: Zhou, Ziheng, et al.
Published: (2024)
EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
by: Ge, Chengjie, et al.
Published: (2025)
by: Ge, Chengjie, et al.
Published: (2025)
Template-Based Feature Aggregation Network for Industrial Anomaly Detection
by: Luo, Wei, et al.
Published: (2026)
by: Luo, Wei, et al.
Published: (2026)
Multi-Granularity Hand Action Detection
by: Zhe, Ting, et al.
Published: (2023)
by: Zhe, Ting, et al.
Published: (2023)
VALLR: Visual ASR Language Model for Lip Reading
by: Thomas, Marshall, et al.
Published: (2025)
by: Thomas, Marshall, et al.
Published: (2025)
RaCMC: Residual-Aware Compensation Network with Multi-Granularity Constraints for Fake News Detection
by: Yu, Xinquan, et al.
Published: (2024)
by: Yu, Xinquan, et al.
Published: (2024)
Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes
by: Liu, Weifeng, et al.
Published: (2024)
by: Liu, Weifeng, et al.
Published: (2024)
LASER: Lip Landmark Assisted Speaker Detection for Robustness
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
by: Yeo, Jeong Hun, et al.
Published: (2024)
by: Yeo, Jeong Hun, et al.
Published: (2024)
GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
by: Wu, Jiang, et al.
Published: (2024)
by: Wu, Jiang, et al.
Published: (2024)
Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
Rethinking Event-Based Object Dtection through Representation-Level Temporal Aggregation and Model-Level Hypergraph Reasoning
by: Wang, Meisen, et al.
Published: (2026)
by: Wang, Meisen, et al.
Published: (2026)
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
by: Wu, Linzhi, et al.
Published: (2024)
by: Wu, Linzhi, et al.
Published: (2024)
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
by: Park, Young-Hu, et al.
Published: (2025)
by: Park, Young-Hu, et al.
Published: (2025)
Optimized View and Geometry Distillation from Multi-view Diffuser
by: Zhang, Youjia, et al.
Published: (2023)
by: Zhang, Youjia, et al.
Published: (2023)
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
by: Shen, Wenhao, et al.
Published: (2024)
by: Shen, Wenhao, et al.
Published: (2024)
View Transformation Robustness for Multi-View 3D Object Reconstruction with Reconstruction Error-Guided View Selection
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
Exploring Spectral Characteristics for Single Image Reflection Removal
by: Guo, Pengbo, et al.
Published: (2025)
by: Guo, Pengbo, et al.
Published: (2025)
EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models
by: Xu, Wenhao, et al.
Published: (2025)
by: Xu, Wenhao, et al.
Published: (2025)
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
by: Min, Zhiyuan, et al.
Published: (2023)
by: Min, Zhiyuan, et al.
Published: (2023)
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
by: Wang, Weitao, et al.
Published: (2024)
by: Wang, Weitao, et al.
Published: (2024)
Hierarchical Granularity Alignment and State Space Modeling for Robust Multimodal AU Detection in the Wild
by: Yu, Jun, et al.
Published: (2026)
by: Yu, Jun, et al.
Published: (2026)
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
by: Liu, Qinying, et al.
Published: (2023)
by: Liu, Qinying, et al.
Published: (2023)
VFM-Loc: Zero-Shot Cross-View Geo-Localization via Aligning Discriminative Visual Hierarchies
by: Lu, Jun, et al.
Published: (2026)
by: Lu, Jun, et al.
Published: (2026)
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
by: Luo, Songtao, et al.
Published: (2023)
by: Luo, Songtao, et al.
Published: (2023)
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
by: Yuan, Yuqian, et al.
Published: (2025)
by: Yuan, Yuqian, et al.
Published: (2025)
AlignCVC: Aligning Cross-View Consistency for Single-Image-to-3D Generation
by: Liang, Xinyue, et al.
Published: (2025)
by: Liang, Xinyue, et al.
Published: (2025)
Self-Navigated Residual Mamba for Universal Industrial Anomaly Detection
by: Li, Hanxi, et al.
Published: (2025)
by: Li, Hanxi, et al.
Published: (2025)
SAUGE: Taming SAM for Uncertainty-Aligned Multi-Granularity Edge Detection
by: Liufu, Xing, et al.
Published: (2024)
by: Liufu, Xing, et al.
Published: (2024)
Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter
by: Zhu, Yanyu, et al.
Published: (2025)
by: Zhu, Yanyu, et al.
Published: (2025)
Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos
by: Wei, Dingkun, et al.
Published: (2026)
by: Wei, Dingkun, et al.
Published: (2026)
Noise-Started One-Step Real-World Super-Resolution via LR-Conditioned SplitMeanFlow and GAN Refinement
by: Zhu, Wei, et al.
Published: (2026)
by: Zhu, Wei, et al.
Published: (2026)
Learning Parallax for Stereo Event-based Motion Deblurring
by: Lin, Mingyuan, et al.
Published: (2023)
by: Lin, Mingyuan, et al.
Published: (2023)
Learning to Fuse and Reconstruct Multi-View Graphs for Diabetic Retinopathy Grading
by: Li, Haoran, et al.
Published: (2026)
by: Li, Haoran, et al.
Published: (2026)
RAISECity: A Multimodal Agent Framework for Reality-Aligned 3D World Generation at City-Scale
by: Wang, Shengyuan, et al.
Published: (2025)
by: Wang, Shengyuan, et al.
Published: (2025)
Similar Items
-
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
by: Wang, He, et al.
Published: (2024) -
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
by: Yuan, Zhenlong, et al.
Published: (2024) -
Wasserstein-Aligned Hyperbolic Multi-View Clustering
by: Wang, Rui, et al.
Published: (2025) -
Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning
by: Zhu, Yinglian, et al.
Published: (2025) -
Event-based Motion Deblurring via Multi-Temporal Granularity Fusion
by: Lin, Xiaopeng, et al.
Published: (2024)