:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Sun, Muyi, Wang, Yixuan, Wang, Hong, Su, Chen, Zhang, Man, Qi, Xingqun, Li, Qi, Sun, Zhenan
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2603.09809
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
von: Hu, Yizhi, et al.
Veröffentlicht: (2025)

VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
von: Li, Shiying, et al.
Veröffentlicht: (2025)

DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
von: Zhang, Hengyuan, et al.
Veröffentlicht: (2025)

SemiTooth: a Generalizable Semi-supervised Framework for Multi-Source Tooth Segmentation
von: Sun, Muyi, et al.
Veröffentlicht: (2026)

Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning
von: Jiang, Fangling, et al.
Veröffentlicht: (2025)

Learning Disentangled Representation for One-shot Progressive Face Swapping
von: Li, Qi, et al.
Veröffentlicht: (2022)

Object-aware Sound Source Localization via Audio-Visual Scene Understanding
von: Um, Sung Jin, et al.
Veröffentlicht: (2025)

UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception
von: Song, Xinyang, et al.
Veröffentlicht: (2025)

Gotta Hear Them All: Towards Sound Source Aware Audio Generation
von: Guo, Wei, et al.
Veröffentlicht: (2024)

Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images
von: Jiang, Fangling, et al.
Veröffentlicht: (2025)

LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning
von: Che, Chang, et al.
Veröffentlicht: (2025)

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
von: Wang, Wei, et al.
Veröffentlicht: (2024)

Affinity Contrastive Learning for Skeleton-based Human Activity Understanding
von: Liu, Hongda, et al.
Veröffentlicht: (2026)

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
von: Lin, Haokun, et al.
Veröffentlicht: (2024)

TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models
von: Li, Zhiwei, et al.
Veröffentlicht: (2025)

Learning Geometric Invariance for Gait Recognition
von: Wang, Zengbin, et al.
Veröffentlicht: (2026)

Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
von: Liu, Yu, et al.
Veröffentlicht: (2024)

Pest Manager: A Systematic Framework for Precise Pest Counting and Identification in Invisible Grain Pile Storage Environment
von: Ma, Chuanyang, et al.
Veröffentlicht: (2024)

Towards High Fidelity Face Swapping: A Comprehensive Survey and New Benchmark
von: Li, Qi, et al.
Veröffentlicht: (2026)

Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection
von: Jiang, Fangling, et al.
Veröffentlicht: (2025)

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
von: Wang, Xingrui, et al.
Veröffentlicht: (2025)

EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
von: Qi, Xingqun, et al.
Veröffentlicht: (2023)

MMGeoLM: Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models
von: Sun, Kai, et al.
Veröffentlicht: (2025)

Structure-Aware Fine-Grained Gaussian Splatting for Expressive Avatar Reconstruction
von: Su, Yuze, et al.
Veröffentlicht: (2026)

Audio-Visual World Models: Towards Multisensory Imagination in Sight and Sound
von: Wang, Jiahua, et al.
Veröffentlicht: (2025)

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers
von: Yang, Lianwei, et al.
Veröffentlicht: (2024)

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
von: Wang, Yaoting, et al.
Veröffentlicht: (2023)

FingerVeinSyn-5M: A Million-Scale Dataset and Benchmark for Finger Vein Recognition
von: Wang, Yinfan, et al.
Veröffentlicht: (2025)

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
von: Ren, Min, et al.
Veröffentlicht: (2024)

AGC: Adaptive Geodesic Correction for Adversarial Robustness on Vision-Language Models
von: Li, Zhiwei, et al.
Veröffentlicht: (2026)

3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory
von: Song, Xinyang, et al.
Veröffentlicht: (2025)

Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
von: Xu, Qin, et al.
Veröffentlicht: (2024)

Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition
von: Sun, Baoli, et al.
Veröffentlicht: (2025)

Audio-Guided Visual Perception for Audio-Visual Navigation
von: Wang, Yi, et al.
Veröffentlicht: (2025)

Boosting Fine-Grained Visual Anomaly Detection with Coarse-Knowledge-Aware Adversarial Learning
von: Fang, Qingqing, et al.
Veröffentlicht: (2024)

WeakMedSAM: Weakly-Supervised Medical Image Segmentation via SAM with Sub-Class Exploration and Prompt Affinity Mining
von: Wang, Haoran, et al.
Veröffentlicht: (2025)

SeaVIS: Sound-Enhanced Association for Online Audio-Visual Instance Segmentation
von: Zhu, Yingjian, et al.
Veröffentlicht: (2026)

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs
von: Chen, Feng, et al.
Veröffentlicht: (2025)

Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
von: Astrid, Marcella, et al.
Veröffentlicht: (2024)

AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
von: Galougah, Siminfar Samakoush, et al.
Veröffentlicht: (2025)