Saved in:
| Main Authors: | Tanaka, Kaito, Tan, Benjamin, Wong, Brian |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.10758 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models
by: Nakayama, Aya, et al.
Published: (2025)
by: Nakayama, Aya, et al.
Published: (2025)
XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
by: Ito, Masato, et al.
Published: (2025)
by: Ito, Masato, et al.
Published: (2025)
Leveraging Vision-Language Models as Weak Annotators in Active Learning
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)
Learning to Prompt with Text Only Supervision for Vision-Language Models
by: Khattak, Muhammad Uzair, et al.
Published: (2024)
by: Khattak, Muhammad Uzair, et al.
Published: (2024)
DONUT: A Decoder-Only Model for Trajectory Prediction
by: Knoche, Markus, et al.
Published: (2025)
by: Knoche, Markus, et al.
Published: (2025)
DecoderTracker: Decoder-Only Method for Multiple-Object Tracking
by: Pan, Liao, et al.
Published: (2023)
by: Pan, Liao, et al.
Published: (2023)
Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models
by: Zhao, Jianfei, et al.
Published: (2025)
by: Zhao, Jianfei, et al.
Published: (2025)
Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models
by: Park, Minho, et al.
Published: (2024)
by: Park, Minho, et al.
Published: (2024)
MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
by: Shen, Hui, et al.
Published: (2026)
by: Shen, Hui, et al.
Published: (2026)
Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)
by: Chen, Xinlong, et al.
Published: (2025)
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
by: Li, Han, et al.
Published: (2025)
by: Li, Han, et al.
Published: (2025)
ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation
by: Tan, Kaiyuan, et al.
Published: (2025)
by: Tan, Kaiyuan, et al.
Published: (2025)
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)
by: Berasi, Davide, et al.
Published: (2025)
Decoder-Only LLMs are Better Controllers for Diffusion Models
by: Dong, Ziyi, et al.
Published: (2025)
by: Dong, Ziyi, et al.
Published: (2025)
Decoder-Only Image Registration
by: Jia, Xi, et al.
Published: (2024)
by: Jia, Xi, et al.
Published: (2024)
Text-Only Data Synthesis for Vision Language Model Training
by: Yu, Xiaomin, et al.
Published: (2025)
by: Yu, Xiaomin, et al.
Published: (2025)
Domain Adaptation for Ulcerative Colitis Severity Estimation Using Patient-Level Diagnoses
by: Yamaguchi, Takamasa, et al.
Published: (2025)
by: Yamaguchi, Takamasa, et al.
Published: (2025)
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)
by: Kang, Jialiang, et al.
Published: (2025)
Attention-aware Inference Optimizations for Large Vision-Language Models with Memory-efficient Decoding
by: Ilhan, Fatih, et al.
Published: (2026)
by: Ilhan, Fatih, et al.
Published: (2026)
Instruction-Following Evaluation of Large Vision-Language Models
by: Shiono, Daiki, et al.
Published: (2025)
by: Shiono, Daiki, et al.
Published: (2025)
Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling
by: Tanaka, Daichi, et al.
Published: (2025)
by: Tanaka, Daichi, et al.
Published: (2025)
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
by: Yu, Qing, et al.
Published: (2024)
by: Yu, Qing, et al.
Published: (2024)
VOSR: A Vision-Only Generative Model for Image Super-Resolution
by: Wu, Rongyuan, et al.
Published: (2026)
by: Wu, Rongyuan, et al.
Published: (2026)
Context-Aware Decoding for Faithful Vision-Language Generation
by: Fazli, Mehrdad, et al.
Published: (2026)
by: Fazli, Mehrdad, et al.
Published: (2026)
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
by: Kamoi, Ryo, et al.
Published: (2024)
by: Kamoi, Ryo, et al.
Published: (2024)
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
Bridging Hidden States in Vision-Language Models
by: Fein-Ashley, Benjamin, et al.
Published: (2025)
by: Fein-Ashley, Benjamin, et al.
Published: (2025)
Orion-Lite: Distilling LLM Reasoning into Efficient Vision-Only Driving Models
by: Gu, Jing, et al.
Published: (2026)
by: Gu, Jing, et al.
Published: (2026)
VDG: Vision-Only Dynamic Gaussian for Driving Simulation
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
by: Li, Wei, et al.
Published: (2024)
by: Li, Wei, et al.
Published: (2024)
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
by: Zhu, Lanyun, et al.
Published: (2024)
by: Zhu, Lanyun, et al.
Published: (2024)
SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
by: Park, Woohyeon, et al.
Published: (2025)
by: Park, Woohyeon, et al.
Published: (2025)
SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding
by: Tong, Yujia, et al.
Published: (2026)
by: Tong, Yujia, et al.
Published: (2026)
Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance
by: Chen, Xinrong, et al.
Published: (2026)
by: Chen, Xinrong, et al.
Published: (2026)
The Wallpaper is Ugly: Indoor Localization using Vision and Language
by: Pate, Seth, et al.
Published: (2024)
by: Pate, Seth, et al.
Published: (2024)
An Application-Agnostic Automatic Target Recognition System Using Vision Language Models
by: Palladino, Anthony, et al.
Published: (2024)
by: Palladino, Anthony, et al.
Published: (2024)
Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
by: Kim, Mingyeong, et al.
Published: (2026)
by: Kim, Mingyeong, et al.
Published: (2026)
SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
Interactive Post-Training for Vision-Language-Action Models
by: Tan, Shuhan, et al.
Published: (2025)
by: Tan, Shuhan, et al.
Published: (2025)
Modality-Agnostic fMRI Decoding of Vision and Language
by: Nikolaus, Mitja, et al.
Published: (2024)
by: Nikolaus, Mitja, et al.
Published: (2024)
Similar Items
-
Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models
by: Nakayama, Aya, et al.
Published: (2025) -
XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
by: Ito, Masato, et al.
Published: (2025) -
Leveraging Vision-Language Models as Weak Annotators in Active Learning
by: Nguyen, Phuong Ngoc, et al.
Published: (2026) -
Learning to Prompt with Text Only Supervision for Vision-Language Models
by: Khattak, Muhammad Uzair, et al.
Published: (2024) -
DONUT: A Decoder-Only Model for Trajectory Prediction
by: Knoche, Markus, et al.
Published: (2025)