:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ke, Yan, Yu, Xin, Du, Heming, Chapman, Scott, Huang, Helen
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2509.24350
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
von: Cao, Zhuo, et al.
Veröffentlicht: (2025)

DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
von: Sun, Haoran, et al.
Veröffentlicht: (2025)

Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
von: Wan, Xingchen, et al.
Veröffentlicht: (2025)

IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection
von: Shen, Fei, et al.
Veröffentlicht: (2026)

Refine and Align: Confidence Calibration through Multi-Agent Interaction in VQA
von: Pandey, Ayush, et al.
Veröffentlicht: (2025)

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
von: Ren, Tianfei, et al.
Veröffentlicht: (2026)

Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?
von: Yun, Sukwon, et al.
Veröffentlicht: (2025)

M$^3$-VQA: A Benchmark for Multimodal, Multi-Entity, Multi-Hop Visual Question Answering
von: Ma, Jiatong, et al.
Veröffentlicht: (2026)

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
von: Cai, Zhixi, et al.
Veröffentlicht: (2026)

AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding
von: Li, Haocheng, et al.
Veröffentlicht: (2026)

MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space
von: Singh, Anshul, et al.
Veröffentlicht: (2025)

Multi-Agent Image Restoration
von: Jiang, Xu, et al.
Veröffentlicht: (2025)

UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization
von: Huang, Qing, et al.
Veröffentlicht: (2025)

Bird-SR: Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution
von: Fan, Zihao, et al.
Veröffentlicht: (2026)

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis
von: Xu, Jiao, et al.
Veröffentlicht: (2026)

Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering
von: Jiang, Bowen, et al.
Veröffentlicht: (2024)

CSAOT: Cooperative Multi-Agent System for Active Object Tracking
von: Nguyen, Hy, et al.
Veröffentlicht: (2025)

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
von: Zhang, Hongxin, et al.
Veröffentlicht: (2024)

PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates
von: Shi, Junjie, et al.
Veröffentlicht: (2024)

Realism Control One-step Diffusion for Real-World Image Super-Resolution
von: Wu, Zongliang, et al.
Veröffentlicht: (2025)

MSRAMIE: Multimodal Structured Reasoning Agent for Multi-instruction Image Editing
von: Qiu, Zhaoyuan, et al.
Veröffentlicht: (2026)

Symphony: A Cognitively-Inspired Multi-Agent System for Long-Video Understanding
von: Yan, Haiyang, et al.
Veröffentlicht: (2026)

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
von: Cao, Zhuo, et al.
Veröffentlicht: (2024)

Kvasir-VQA: A Text-Image Pair GI Tract Dataset
von: Gautam, Sushant, et al.
Veröffentlicht: (2024)

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective
von: Zhang, Yan, et al.
Veröffentlicht: (2025)

Dual Causal Inference: Integrating Backdoor Adjustment and Instrumental Variable Learning for Medical VQA
von: Xu, Zibo, et al.
Veröffentlicht: (2026)

LaRE$^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
von: Luo, Yunpeng, et al.
Veröffentlicht: (2024)

Box-QAymo: Box-Referring VQA Dataset for Autonomous Driving
von: Etchegaray, Djamahl, et al.
Veröffentlicht: (2025)

GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
von: Ye, Wen, et al.
Veröffentlicht: (2025)

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA
von: Yang, Chunze, et al.
Veröffentlicht: (2026)

World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge
von: Son, Moo Hyun, et al.
Veröffentlicht: (2025)

AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models
von: Zhang, Jiarui, et al.
Veröffentlicht: (2026)

ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images
von: Zhang, Yunfei, et al.
Veröffentlicht: (2025)

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
von: Chen, Pingyi, et al.
Veröffentlicht: (2024)

ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling
von: Zhu, Jiayi, et al.
Veröffentlicht: (2026)

OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding
von: Tao, Haoyi, et al.
Veröffentlicht: (2026)

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
von: Ding, Yanbo, et al.
Veröffentlicht: (2024)

A Denoising Framework for Real-World Ultra-Low-Dose Lung CT Images Based on an Image Purification Strategy
von: Gong, Guoliang, et al.
Veröffentlicht: (2025)

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving
von: Huang, Minqing, et al.
Veröffentlicht: (2026)

Multi-Knowledge-oriented Nighttime Haze Imaging Enhancer for Vision-driven Intelligent Systems
von: Chen, Ai, et al.
Veröffentlicht: (2025)