Saved in:
| Main Authors: | Kong, Xianghao, Chen, Jinyu, Wang, Wenguan, Su, Hang, Hu, Xiaolin, Yang, Yi, Liu, Si |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.07433 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
General and Task-Oriented Video Segmentation
by: Chen, Mu, et al.
Published: (2024)
by: Chen, Mu, et al.
Published: (2024)
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
by: Yang, Songlin, et al.
Published: (2026)
by: Yang, Songlin, et al.
Published: (2026)
Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024)
by: Fan, Sheng, et al.
Published: (2024)
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025)
by: Lu, Yi, et al.
Published: (2025)
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
by: Liao, Haicheng, et al.
Published: (2025)
by: Liao, Haicheng, et al.
Published: (2025)
Nonverbal Interaction Detection
by: Wei, Jianan, et al.
Published: (2024)
by: Wei, Jianan, et al.
Published: (2024)
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
by: Feng, Tuo, et al.
Published: (2024)
by: Feng, Tuo, et al.
Published: (2024)
Visual Knowledge in the Big Model Era: Retrospect and Prospect
by: Wang, Wenguan, et al.
Published: (2024)
by: Wang, Wenguan, et al.
Published: (2024)
Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing
by: Wang, Wenguan, et al.
Published: (2022)
by: Wang, Wenguan, et al.
Published: (2022)
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
by: Kong, Xianghao, et al.
Published: (2025)
by: Kong, Xianghao, et al.
Published: (2025)
Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine
by: Wu, Yuan, et al.
Published: (2026)
by: Wu, Yuan, et al.
Published: (2026)
A Survey on 3D Gaussian Splatting
by: Chen, Guikun, et al.
Published: (2024)
by: Chen, Guikun, et al.
Published: (2024)
Towards Enhanced Image Generation Via Multi-modal Chain of Thought in Unified Generative Models
by: Wang, Yi, et al.
Published: (2025)
by: Wang, Yi, et al.
Published: (2025)
GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning
by: Wu, Fengyi, et al.
Published: (2025)
by: Wu, Fengyi, et al.
Published: (2025)
Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
by: Deng, Linger, et al.
Published: (2024)
by: Deng, Linger, et al.
Published: (2024)
Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)
by: Gao, Xinzge, et al.
Published: (2025)
CoRGI: Verified Chain-of-Thought Reasoning with Post-hoc Visual Grounding
by: Yi, Shixin, et al.
Published: (2025)
by: Yi, Shixin, et al.
Published: (2025)
Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations
by: Zhang, Yuwei, et al.
Published: (2024)
by: Zhang, Yuwei, et al.
Published: (2024)
Compositional Chain-of-Thought Prompting for Large Multimodal Models
by: Mitra, Chancharik, et al.
Published: (2023)
by: Mitra, Chancharik, et al.
Published: (2023)
ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models
by: Liu, Xiwei, et al.
Published: (2026)
by: Liu, Xiwei, et al.
Published: (2026)
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization
by: Du, Yifan, et al.
Published: (2025)
by: Du, Yifan, et al.
Published: (2025)
DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax
by: Yuan, Hang, et al.
Published: (2026)
by: Yuan, Hang, et al.
Published: (2026)
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
by: Hua, Hang, et al.
Published: (2024)
by: Hua, Hang, et al.
Published: (2024)
On the Importance of Backbone to the Adversarial Robustness of Object Detectors
by: Li, Xiao, et al.
Published: (2023)
by: Li, Xiao, et al.
Published: (2023)
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning
by: Hu, Tao, et al.
Published: (2026)
by: Hu, Tao, et al.
Published: (2026)
SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification
by: Zhang, Yunkai, et al.
Published: (2025)
by: Zhang, Yunkai, et al.
Published: (2025)
Reinforcing Structured Chain-of-Thought for Video Understanding
by: Wang, Peiyao, et al.
Published: (2026)
by: Wang, Peiyao, et al.
Published: (2026)
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
by: Han, Songhao, et al.
Published: (2024)
by: Han, Songhao, et al.
Published: (2024)
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
by: Ma, Jian, et al.
Published: (2024)
by: Ma, Jian, et al.
Published: (2024)
Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
by: Guo, Yuchen, et al.
Published: (2026)
by: Guo, Yuchen, et al.
Published: (2026)
Improving Chain-of-Thought Efficiency for Autoregressive Image Generation
by: Gu, Zeqi, et al.
Published: (2025)
by: Gu, Zeqi, et al.
Published: (2025)
MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation
by: Wang, Liuyi, et al.
Published: (2024)
by: Wang, Liuyi, et al.
Published: (2024)
Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI
by: Chang, Ching-Chun, et al.
Published: (2025)
by: Chang, Ching-Chun, et al.
Published: (2025)
CoV: Chain-of-View Prompting for Spatial Reasoning
by: Zhao, Haoyu, et al.
Published: (2026)
by: Zhao, Haoyu, et al.
Published: (2026)
ATI: Any Trajectory Instruction for Controllable Video Generation
by: Wang, Angtian, et al.
Published: (2025)
by: Wang, Angtian, et al.
Published: (2025)
Volumetric Environment Representation for Vision-Language Navigation
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Vision-Language Navigation with Energy-Based Policy
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Similar Items
-
General and Task-Oriented Video Segmentation
by: Chen, Mu, et al.
Published: (2024) -
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
by: Yang, Songlin, et al.
Published: (2026) -
Navigation Instruction Generation with BEV Perception and Large Language Models
by: Fan, Sheng, et al.
Published: (2024) -
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025) -
CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting
by: Liao, Haicheng, et al.
Published: (2025)