:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kong, Xianghao, Chen, Jinyu, Wang, Wenguan, Su, Hang, Hu, Xiaolin, Yang, Yi, Liu, Si
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.07433
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

General and Task-Oriented Video Segmentation
by: Chen, Mu, et al.
Published: (2024)

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
by: Yang, Songlin, et al.
Published: (2026)

Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)

RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025)

CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
by: Liao, Haicheng, et al.
Published: (2025)

Nonverbal Interaction Detection
by: Wei, Jianan, et al.
Published: (2024)

Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
by: Zhang, Xu, et al.
Published: (2025)

Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
by: Feng, Tuo, et al.
Published: (2024)

Visual Knowledge in the Big Model Era: Retrospect and Prospect
by: Wang, Wenguan, et al.
Published: (2024)

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing
by: Wang, Wenguan, et al.
Published: (2022)

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
by: Kong, Xianghao, et al.
Published: (2025)

Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine
by: Wu, Yuan, et al.
Published: (2026)

A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)

Towards Enhanced Image Generation Via Multi-modal Chain of Thought in Unified Generative Models
by: Wang, Yi, et al.
Published: (2025)

GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
by: Wu, Fengyi, et al.
Published: (2025)

Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)

Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)

CoRGI: Verified Chain-of-Thought Reasoning with Post-hoc Visual Grounding
by: Yi, Shixin, et al.
Published: (2025)

Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)

CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
by: Zhang, Yuwei, et al.
Published: (2024)

Compositional Chain-of-Thought Prompting for Large Multimodal Models
by: Mitra, Chancharik, et al.
Published: (2023)

ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models
by: Liu, Xiwei, et al.
Published: (2026)

Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization
by: Du, Yifan, et al.
Published: (2025)

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax
by: Yuan, Hang, et al.
Published: (2026)

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
by: Hua, Hang, et al.
Published: (2024)

On the Importance of Backbone to the Adversarial Robustness of Object Detectors
by: Li, Xiao, et al.
Published: (2023)

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025)

Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning
by: Hu, Tao, et al.
Published: (2026)

SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification
by: Zhang, Yunkai, et al.
Published: (2025)

Reinforcing Structured Chain-of-Thought for Video Understanding
by: Wang, Peiyao, et al.
Published: (2026)

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
by: Han, Songhao, et al.
Published: (2024)

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
by: Ma, Jian, et al.
Published: (2024)

Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
by: Guo, Yuchen, et al.
Published: (2026)

Improving Chain-of-Thought Efficiency for Autoregressive Image Generation
by: Gu, Zeqi, et al.
Published: (2025)

MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2024)

Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI
by: Chang, Ching-Chun, et al.
Published: (2025)

CoV: Chain-of-View Prompting for Spatial Reasoning
by: Zhao, Haoyu, et al.
Published: (2026)

ATI: Any Trajectory Instruction for Controllable Video Generation
by: Wang, Angtian, et al.
Published: (2025)

Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)

Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)