:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Song, Ruizhuo, Yuan, Beiming
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.15387
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Triple-CFN: Separating Concepts and Features Enhances Machine Abstract Reasoning Ability
by: Song, Ruizhuo, et al.
Published: (2024)

Johnny: Structuring Representation Space to Enhance Machine Abstract Reasoning Ability
by: Song, Ruizhuo, et al.
Published: (2025)

Funny-Valen-Tine: Planning Solution Distribution Enhances Machine Abstract Reasoning Ability
by: Song, Ruizhuo, et al.
Published: (2024)

D4C: Improving Negative Example Quality to Enhance Machine Abstract Reasoning Ability
by: Song, Ruizhuo, et al.
Published: (2024)

Solving the Clustering Reasoning Problems by Modeling a Deep-Learning-Based Probabilistic Model
by: Song, Ruizhuo, et al.
Published: (2024)

EiHi Net: Out-of-Distribution Generalization Paradigm
by: Wei, Qinglai, et al.
Published: (2022)

VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
by: Jiang, Chaoya, et al.
Published: (2025)

Multi-Granularity Mutual Refinement Network for Zero-Shot Learning
by: Wang, Ning, et al.
Published: (2025)

CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts
by: Cha, Junuk, et al.
Published: (2025)

DIO: Dataset of 3D Mesh Models of Indoor Objects for Robotics and Computer Vision Applications
by: Nimal, Nillan, et al.
Published: (2024)

VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization
by: Li, Mingxiao, et al.
Published: (2025)

Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence
by: Xu, Ruizhuo, et al.
Published: (2024)

Enhancing Advanced Visual Reasoning Ability of Large Language Models
by: Li, Zhiyuan, et al.
Published: (2024)

MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing
by: Tian, Xueyun, et al.
Published: (2025)

CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning
by: Song, Jeonghyo, et al.
Published: (2025)

CLGRPO: Reasoning Ability Enhancement for Small VLMs
by: Wang, Fanyi, et al.
Published: (2025)

CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning
by: Ma, Wenxin, et al.
Published: (2026)

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
by: Wang, Weiyun, et al.
Published: (2024)

MIGA: Mutual Information-Guided Attack on Denoising Models for Semantic Manipulation
by: Li, Guanghao, et al.
Published: (2025)

CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
by: Zhang, Mingfang, et al.
Published: (2026)

Camera-Based Localization and Enhanced Normalized Mutual Information
by: Kunde, Vishnu Teja, et al.
Published: (2024)

TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability
by: Ma, Fengji, et al.
Published: (2024)

Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification
by: Zhang, Zhizhong, et al.
Published: (2024)

Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition
by: Xu, Ruizhuo, et al.
Published: (2024)

MutualNeRF: Improve the Performance of NeRF under Limited Samples with Mutual Information Theory
by: Wang, Zifan, et al.
Published: (2025)

Mutually Causal Semantic Distillation Network for Zero-Shot Learning
by: Chen, Shiming, et al.
Published: (2026)

MI CAM: Mutual Information Weighted Activation Mapping for Causal Visual Explanations of Convolutional Neural Networks
by: Iyer, Ram S, et al.
Published: (2025)

OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
by: Wang, Jinhong, et al.
Published: (2025)

Generating Storytelling Images with Rich Chains-of-Reasoning
by: Song, Xiujie, et al.
Published: (2025)

Enhancing 3D Semantic Scene Completion with a Refinement Module
by: Zhang, Dunxing, et al.
Published: (2025)

$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement
by: Li, Zhecheng, et al.
Published: (2025)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
by: Zhang, Xintong, et al.
Published: (2025)

Geospatial Chain of Thought Reasoning for Enhanced Visual Question Answering on Satellite Imagery
by: Shanker, Shambhavi, et al.
Published: (2025)

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement
by: Liu, Yaoli, et al.
Published: (2025)

Improving Visual Reasoning with Iterative Evidence Refinement
by: Shi, Zeru, et al.
Published: (2026)

Revisiting Mutual Information Maximization for Generalized Category Discovery
by: Tan, Zhaorui, et al.
Published: (2024)

TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
by: Wang, Zeqing, et al.
Published: (2025)

Explaining Representation by Mutual Information
by: Gu, Lifeng
Published: (2021)

A Multi-Agent Framework with Structured Reasoning and Reflective Refinement for Multimodal Empathetic Response Generation
by: Wang, Liping, et al.
Published: (2026)

VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
by: Xiao, Kelaiti, et al.
Published: (2025)