Saved in:
| Main Authors: | Wu, Yi-Fu, Lee, Minseung, Ahn, Sungjin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01203 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dreamweaver: Learning Compositional World Models from Pixels
by: Baek, Junyeob, et al.
Published: (2025)
by: Baek, Junyeob, et al.
Published: (2025)
An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
by: Yoon, Jaesik, et al.
Published: (2023)
by: Yoon, Jaesik, et al.
Published: (2023)
Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
by: Jung, Whie, et al.
Published: (2024)
by: Jung, Whie, et al.
Published: (2024)
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
by: Lee, Byung Hyun, et al.
Published: (2025)
by: Lee, Byung Hyun, et al.
Published: (2025)
Parallelized Spatiotemporal Binding
by: Singh, Gautam, et al.
Published: (2024)
by: Singh, Gautam, et al.
Published: (2024)
Multimodal Transformer With a Low-Computational-Cost Guarantee
by: Park, Sungjin, et al.
Published: (2024)
by: Park, Sungjin, et al.
Published: (2024)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models
by: Yu, Yongsheng, et al.
Published: (2024)
by: Yu, Yongsheng, et al.
Published: (2024)
Training-Free Zero-Shot Anomaly Detection in 3D Brain MRI with 2D Foundation Models
by: Le-Gia, Tai, et al.
Published: (2026)
by: Le-Gia, Tai, et al.
Published: (2026)
From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
by: Niu, Ke, et al.
Published: (2025)
by: Niu, Ke, et al.
Published: (2025)
Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing
by: Hong, Inpyo, et al.
Published: (2024)
by: Hong, Inpyo, et al.
Published: (2024)
CATVis: Context-Aware Thought Visualization
by: Mehmood, Tariq, et al.
Published: (2025)
by: Mehmood, Tariq, et al.
Published: (2025)
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
by: Xu, Qinwu, et al.
Published: (2026)
by: Xu, Qinwu, et al.
Published: (2026)
S-Chain: Structured Visual Chain-of-Thought For Medicine
by: Le-Duc, Khai, et al.
Published: (2025)
by: Le-Duc, Khai, et al.
Published: (2025)
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
by: Li, Chengzu, et al.
Published: (2025)
by: Li, Chengzu, et al.
Published: (2025)
Thought Flow Nets: From Single Predictions to Trains of Model Thought
by: Schuff, Hendrik, et al.
Published: (2021)
by: Schuff, Hendrik, et al.
Published: (2021)
DiffBlender: Composable and Versatile Multimodal Text-to-Image Diffusion Models
by: Kim, Sungnyun, et al.
Published: (2023)
by: Kim, Sungnyun, et al.
Published: (2023)
Understanding normalization in contrastive representation learning and out-of-distribution detection
by: Le-Gia, Tai, et al.
Published: (2023)
by: Le-Gia, Tai, et al.
Published: (2023)
Rethinking Fine-Tuning: Unlocking Hidden Capabilities in Vision-Language Models
by: Zhang, Mingyuan, et al.
Published: (2025)
by: Zhang, Mingyuan, et al.
Published: (2025)
ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
by: Park, Seonghwan, et al.
Published: (2025)
by: Park, Seonghwan, et al.
Published: (2025)
Controllable Feature Whitening for Hyperparameter-Free Bias Mitigation
by: Cho, Yooshin, et al.
Published: (2025)
by: Cho, Yooshin, et al.
Published: (2025)
MoPD: Mixture-of-Prompts Distillation for Vision-Language Models
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
In Search of a Data Transformation That Accelerates Neural Field Training
by: Seo, Junwon, et al.
Published: (2023)
by: Seo, Junwon, et al.
Published: (2023)
Mitigating Sexual Content Generation via Embedding Distortion in Text-conditioned Diffusion Models
by: Ahn, Jaesin, et al.
Published: (2025)
by: Ahn, Jaesin, et al.
Published: (2025)
MINR: Implicit Neural Representations with Masked Image Modelling
by: Lee, Sua, et al.
Published: (2025)
by: Lee, Sua, et al.
Published: (2025)
Sherlock: Self-Correcting Reasoning in Vision-Language Models
by: Ding, Yi, et al.
Published: (2025)
by: Ding, Yi, et al.
Published: (2025)
MTS-DMAE: Dual-Masked Autoencoder for Unsupervised Multivariate Time Series Representation Learning
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
by: Fu, Yuwei, et al.
Published: (2024)
by: Fu, Yuwei, et al.
Published: (2024)
Bridge the Modality and Capability Gaps in Vision-Language Model Selection
by: Yi, Chao, et al.
Published: (2024)
by: Yi, Chao, et al.
Published: (2024)
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
by: Lee, Jaewoo, et al.
Published: (2024)
by: Lee, Jaewoo, et al.
Published: (2024)
NeurNCD: Novel Class Discovery via Implicit Neural Representation
by: Wang, Junming, et al.
Published: (2025)
by: Wang, Junming, et al.
Published: (2025)
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
by: Yang, Yi, et al.
Published: (2024)
by: Yang, Yi, et al.
Published: (2024)
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
by: Mai, Jiayao, et al.
Published: (2026)
by: Mai, Jiayao, et al.
Published: (2026)
Structuring GUI Elements through Vision Language Models: Towards Action Space Generation
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking
by: Wang, Xiao, et al.
Published: (2025)
by: Wang, Xiao, et al.
Published: (2025)
Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)
by: Zheng, Shunjie-Fabian, et al.
Published: (2025)
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
by: Su, Hung-Ting, et al.
Published: (2024)
by: Su, Hung-Ting, et al.
Published: (2024)
FairCoT: Enhancing Fairness in Text-to-Image Generation via Chain of Thought Reasoning with Multimodal Large Language Models
by: Sahili, Zahraa Al, et al.
Published: (2024)
by: Sahili, Zahraa Al, et al.
Published: (2024)
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)
by: Li, Lingxiao, et al.
Published: (2025)
Similar Items
-
Dreamweaver: Learning Compositional World Models from Pixels
by: Baek, Junyeob, et al.
Published: (2025) -
An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
by: Yoon, Jaesik, et al.
Published: (2023) -
Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
by: Jung, Whie, et al.
Published: (2024) -
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
by: Lee, Byung Hyun, et al.
Published: (2025) -
Parallelized Spatiotemporal Binding
by: Singh, Gautam, et al.
Published: (2024)