:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hu, Wanpeng, Liu, Haodi, Chen, Lin, Zhou, Feng, Xiao, Changming, Yang, Qi, Zhang, Changshui
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2501.02964
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
von: Xiao, Changming, et al.
Veröffentlicht: (2023)

Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions
von: Acuna, David, et al.
Veröffentlicht: (2025)

Instruction-tuned Self-Questioning Framework for Multimodal Reasoning
von: Jang, You-Won, et al.
Veröffentlicht: (2025)

Bridging Vision Language Models and Symbolic Grounding for Video Question Answering
von: Ma, Haodi, et al.
Veröffentlicht: (2025)

A Theoretical View of Linear Backpropagation and Its Convergence
von: Li, Ziang, et al.
Veröffentlicht: (2021)

Unified Multimodal Understanding via Byte-Pair Visual Encoding
von: Zhang, Wanpeng, et al.
Veröffentlicht: (2025)

Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction
von: Jiao, Zhengbo, et al.
Veröffentlicht: (2026)

Asking like Socrates: Socrates helps VLMs understand remote sensing images
von: Shao, Run, et al.
Veröffentlicht: (2025)

V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation
von: Wang, Han, et al.
Veröffentlicht: (2026)

Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains
von: Zhang, Jesen, et al.
Veröffentlicht: (2025)

CogStream: Context-guided Streaming Video Question Answering
von: Zhao, Zicheng, et al.
Veröffentlicht: (2025)

Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning
von: Wang, Haoyu, et al.
Veröffentlicht: (2026)

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
von: Xie, Tianchi, et al.
Veröffentlicht: (2025)

Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
von: Choi, Changin, et al.
Veröffentlicht: (2025)

MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering
von: Xi, Suyang, et al.
Veröffentlicht: (2026)

Reflect to Inform: Boosting Multimodal Reasoning via Information-Gain-Driven Verification
von: Lv, Shuai, et al.
Veröffentlicht: (2026)

LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models
von: Tian, Shi-Yu, et al.
Veröffentlicht: (2026)

When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning
von: Wu, Zhengxian, et al.
Veröffentlicht: (2026)

SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning
von: Qi, Biqing, et al.
Veröffentlicht: (2024)

Octopus: Agentic Multimodal Reasoning with Six-Capability Orchestration
von: Guo, Yifu, et al.
Veröffentlicht: (2025)

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
von: Hong, Yuyang, et al.
Veröffentlicht: (2025)

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
von: Guo, Xiaojun, et al.
Veröffentlicht: (2025)

Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture
von: He, Haodi, et al.
Veröffentlicht: (2025)

Diving into Self-Evolving Training for Multimodal Reasoning
von: Liu, Wei, et al.
Veröffentlicht: (2024)

Mutual Information guided Visual Contrastive Learning
von: Chen, Hanyang, et al.
Veröffentlicht: (2025)

Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild
von: Feng, Yigui, et al.
Veröffentlicht: (2025)

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
von: Xiao, Tong, et al.
Veröffentlicht: (2025)

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
von: Cai, Huanqia, et al.
Veröffentlicht: (2025)

Integrating Object Interaction Self-Attention and GAN-Based Debiasing for Visual Question Answering
von: Li, Zhifei, et al.
Veröffentlicht: (2025)

Hand3R: Online 4D Hand-Scene Reconstruction in the Wild
von: Hu, Wendi, et al.
Veröffentlicht: (2026)

Metis-HOME: Hybrid Optimized Mixture-of-Experts for Multimodal Reasoning
von: Lan, Xiaohan, et al.
Veröffentlicht: (2025)

VTPerception-R1: Enhancing Multimodal Reasoning via Explicit Visual and Textual Perceptual Grounding
von: Ding, Yizhuo, et al.
Veröffentlicht: (2025)

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration
von: Zhou, Yue, et al.
Veröffentlicht: (2025)

The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights
von: Liu, Yufang, et al.
Veröffentlicht: (2025)

ClueTracer: Question-to-Vision Clue Tracing for Training-Free Hallucination Suppression in Multimodal Reasoning
von: Xi, Gongli, et al.
Veröffentlicht: (2026)

Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI
von: Liu, Xiao, et al.
Veröffentlicht: (2022)

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
von: Lu, Yujie, et al.
Veröffentlicht: (2024)

MUPA: Towards Multi-Path Agentic Reasoning for Grounded Video Question Answering
von: Dang, Jisheng, et al.
Veröffentlicht: (2025)

DIVER: Dynamic Iterative Visual Evidence Reasoning for Multimodal Fake News Detection
von: Zhou, Weilin, et al.
Veröffentlicht: (2026)

STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision
von: Li, Chen, et al.
Veröffentlicht: (2025)