:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tilak, Advait, Choi, Jiwon, Mouli, Nazifa, Le, Wei
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Multimedia Artificial Intelligence Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2605.00873
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks
von: Ku, Max, et al.
Veröffentlicht: (2024)

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
von: Huang, Victor Shea-Jay, et al.
Veröffentlicht: (2025)

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
von: Zhang, Shiyi, et al.
Veröffentlicht: (2025)

STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
von: Ukai, Mahiro, et al.
Veröffentlicht: (2025)

Feature CAM: Interpretable AI in Image Classification
von: Clement, Frincy, et al.
Veröffentlicht: (2024)

RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis
von: Dong, Linfeng, et al.
Veröffentlicht: (2025)

On Semiotic-Grounded Interpretive Evaluation of Generative Art
von: Jiang, Ruixiang, et al.
Veröffentlicht: (2026)

AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
von: Yang, Jialiang, et al.
Veröffentlicht: (2026)

A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs
von: Dang, Yunkai, et al.
Veröffentlicht: (2025)

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports
von: Ma, Jianzhe, et al.
Veröffentlicht: (2026)

Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks
von: Li, Jie, et al.
Veröffentlicht: (2025)

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
von: Yu, Jiashuo, et al.
Veröffentlicht: (2025)

SoccerHigh: A Benchmark Dataset for Automatic Soccer Video Summarization
von: Díaz-Juan, Artur, et al.
Veröffentlicht: (2025)

PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media
von: Li, Fuhao, et al.
Veröffentlicht: (2026)

Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
von: Abdelwahab, Abdelrahman, et al.
Veröffentlicht: (2024)

PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning
von: Zhang, Dongxu, et al.
Veröffentlicht: (2026)

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
von: Wu, Haoning, et al.
Veröffentlicht: (2023)

PaveBench: A Versatile Benchmark for Pavement Distress Perception and Interactive Vision-Language Analysis
von: Li, Dexiang, et al.
Veröffentlicht: (2026)

Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation
von: Barrios, Wayner, et al.
Veröffentlicht: (2026)

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
von: Du, Mengfei, et al.
Veröffentlicht: (2024)

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
von: Choi, Jeongsoo, et al.
Veröffentlicht: (2025)

POINTS: Improving Your Vision-language Model with Affordable Strategies
von: Liu, Yuan, et al.
Veröffentlicht: (2024)

VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents
von: Yi, Hongzhu, et al.
Veröffentlicht: (2026)

Customizable Perturbation Synthesis for Robust SLAM Benchmarking
von: Xu, Xiaohao, et al.
Veröffentlicht: (2024)

Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation
von: Liu, Che, et al.
Veröffentlicht: (2026)

Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation
von: He, Xiao, et al.
Veröffentlicht: (2025)

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
von: Ku, Max, et al.
Veröffentlicht: (2023)

HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors
von: Wei, Chuheng, et al.
Veröffentlicht: (2025)

Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos
von: Pedrouzo-Rodriguez, Laura, et al.
Veröffentlicht: (2025)

Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data
von: Phung, Thu Hang, et al.
Veröffentlicht: (2026)

CalliffusionV2: Personalized Natural Calligraphy Generation with Flexible Multi-modal Control
von: Liao, Qisheng, et al.
Veröffentlicht: (2024)

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition
von: Yu, Xuzheng, et al.
Veröffentlicht: (2024)

Diagnosing and Re-learning for Balanced Multimodal Learning
von: Wei, Yake, et al.
Veröffentlicht: (2024)

Anti-Inpainting: A Proactive Defense Approach against Malicious Diffusion-based Inpainters under Unknown Conditions
von: Guo, Yimao, et al.
Veröffentlicht: (2025)

Seeing Culture: A Benchmark for Visual Reasoning and Grounding
von: Satar, Burak, et al.
Veröffentlicht: (2025)

D-Judge: How Far Are We? Assessing the Discrepancies Between AI-synthesized and Natural Images through Multimodal Guidance
von: Liu, Renyang, et al.
Veröffentlicht: (2024)

Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping
von: Zhou, Chao, et al.
Veröffentlicht: (2026)

Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes
von: Liu, Huijie, et al.
Veröffentlicht: (2025)

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark
von: Mareen, Hannes, et al.
Veröffentlicht: (2026)

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection
von: Bhaskar, Paramananda, et al.
Veröffentlicht: (2026)