Saved in:
| Main Authors: | Wan, Wentao, Kang, Nan, Wang, Zeqing, Yang, Zhuojie, Lin, Liang, Wang, Keze |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.09809 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
by: Wan, Wentao, et al.
Published: (2025)
by: Wan, Wentao, et al.
Published: (2025)
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023)
by: Wang, Zeqing, et al.
Published: (2023)
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
by: Wang, Zeqing, et al.
Published: (2024)
by: Wang, Zeqing, et al.
Published: (2024)
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models
by: Guo, Xiaoyang, et al.
Published: (2025)
by: Guo, Xiaoyang, et al.
Published: (2025)
ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation
by: Wang, Yihao, et al.
Published: (2026)
by: Wang, Yihao, et al.
Published: (2026)
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
by: Xu, Zhihua, et al.
Published: (2025)
by: Xu, Zhihua, et al.
Published: (2025)
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
by: Gao, Minghe, et al.
Published: (2025)
by: Gao, Minghe, et al.
Published: (2025)
ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation
by: Yang, Zhuojie, et al.
Published: (2026)
by: Yang, Zhuojie, et al.
Published: (2026)
UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
Omni-o3: Deep Nested Omnimodal Deduction for Deliberative Audio-Visual Reasoning
by: Zhang, Zhicheng, et al.
Published: (2026)
by: Zhang, Zhicheng, et al.
Published: (2026)
Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains
by: Zhang, Jesen, et al.
Published: (2025)
by: Zhang, Jesen, et al.
Published: (2025)
SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning Tasks
by: Wan, Wentao, et al.
Published: (2025)
by: Wan, Wentao, et al.
Published: (2025)
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Trust-Aware Diversion for Data-Effective Distillation
by: Wu, Zhuojie, et al.
Published: (2025)
by: Wu, Zhuojie, et al.
Published: (2025)
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
by: Hu, Yushi, et al.
Published: (2023)
by: Hu, Yushi, et al.
Published: (2023)
FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models
by: Cai, Kaitong, et al.
Published: (2025)
by: Cai, Kaitong, et al.
Published: (2025)
VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
by: Xiao, Kelaiti, et al.
Published: (2025)
by: Xiao, Kelaiti, et al.
Published: (2025)
Triage: Hierarchical Visual Budgeting for Efficient Video Reasoning in Vision-Language Models
by: Wang, Anmin, et al.
Published: (2026)
by: Wang, Anmin, et al.
Published: (2026)
Visual Program Distillation with Template-Based Augmentation
by: Shlapentokh-Rothman, Michal, et al.
Published: (2024)
by: Shlapentokh-Rothman, Michal, et al.
Published: (2024)
VideoVerse: Does Your T2V Generator Have World Model Capability to Synthesize Videos?
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning
by: Ku, Chahyon, et al.
Published: (2023)
by: Ku, Chahyon, et al.
Published: (2023)
InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward
by: Ning, Zhiwei, et al.
Published: (2026)
by: Ning, Zhiwei, et al.
Published: (2026)
Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification
by: Shi, Daqian, et al.
Published: (2025)
by: Shi, Daqian, et al.
Published: (2025)
Mimic Human Cognition, Master Multi-Image Reasoning: A Meta-Action Framework for Enhanced Visual Understanding
by: Yin, Jianghao, et al.
Published: (2026)
by: Yin, Jianghao, et al.
Published: (2026)
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
CausalStep: A Benchmark for Explicit Stepwise Causal Reasoning in Videos
by: Li, Xuchen, et al.
Published: (2025)
by: Li, Xuchen, et al.
Published: (2025)
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
by: Jain, Jitesh, et al.
Published: (2024)
by: Jain, Jitesh, et al.
Published: (2024)
DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition
by: Liu, Haijing, et al.
Published: (2025)
by: Liu, Haijing, et al.
Published: (2025)
Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition
by: Liu, Haijing, et al.
Published: (2024)
by: Liu, Haijing, et al.
Published: (2024)
RVTBench: A Benchmark for Visual Reasoning Tasks
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
VGR: Visual Grounded Reasoning
by: Wang, Jiacong, et al.
Published: (2025)
by: Wang, Jiacong, et al.
Published: (2025)
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
by: Wang, Jingchao, et al.
Published: (2025)
by: Wang, Jingchao, et al.
Published: (2025)
Smooth and Stepwise Self-Distillation for Object Detection
by: Deng, Jieren, et al.
Published: (2023)
by: Deng, Jieren, et al.
Published: (2023)
GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks
by: Wang, Xuan, et al.
Published: (2024)
by: Wang, Xuan, et al.
Published: (2024)
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs
by: Chang, Boyu, et al.
Published: (2026)
by: Chang, Boyu, et al.
Published: (2026)
Visual Reasoning through Tool-supervised Reinforcement Learning
by: Dong, Qihua, et al.
Published: (2026)
by: Dong, Qihua, et al.
Published: (2026)
3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale
by: Fan, Yijia, et al.
Published: (2025)
by: Fan, Yijia, et al.
Published: (2025)
Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)
by: Wang, Qixun, et al.
Published: (2025)
Similar Items
-
Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
by: Wan, Wentao, et al.
Published: (2025) -
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
by: Wang, Zeqing, et al.
Published: (2023) -
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
by: Wang, Zeqing, et al.
Published: (2024) -
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
by: Wang, Zeqing, et al.
Published: (2025) -
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
by: Wang, Zeqing, et al.
Published: (2025)