:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tanaka, Kaito, Tan, Benjamin, Wong, Brian
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.10758
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models
by: Nakayama, Aya, et al.
Published: (2025)

XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
by: Ito, Masato, et al.
Published: (2025)

Leveraging Vision-Language Models as Weak Annotators in Active Learning
by: Nguyen, Phuong Ngoc, et al.
Published: (2026)

Learning to Prompt with Text Only Supervision for Vision-Language Models
by: Khattak, Muhammad Uzair, et al.
Published: (2024)

DONUT: A Decoder-Only Model for Trajectory Prediction
by: Knoche, Markus, et al.
Published: (2025)

DecoderTracker: Decoder-Only Method for Multiple-Object Tracking
by: Pan, Liao, et al.
Published: (2023)

Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models
by: Zhao, Jianfei, et al.
Published: (2025)

Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models
by: Park, Minho, et al.
Published: (2024)

MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
by: Shen, Hui, et al.
Published: (2026)

Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models
by: Chen, Xinlong, et al.
Published: (2025)

OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
by: Li, Han, et al.
Published: (2025)

ViSE: A Systematic Approach to Vision-Only Street-View Extrapolation
by: Tan, Kaiyuan, et al.
Published: (2025)

Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
by: Berasi, Davide, et al.
Published: (2025)

Decoder-Only LLMs are Better Controllers for Diffusion Models
by: Dong, Ziyi, et al.
Published: (2025)

Decoder-Only Image Registration
by: Jia, Xi, et al.
Published: (2024)

Text-Only Data Synthesis for Vision Language Model Training
by: Yu, Xiaomin, et al.
Published: (2025)

Domain Adaptation for Ulcerative Colitis Severity Estimation Using Patient-Level Diagnoses
by: Yamaguchi, Takamasa, et al.
Published: (2025)

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)

Attention-aware Inference Optimizations for Large Vision-Language Models with Memory-efficient Decoding
by: Ilhan, Fatih, et al.
Published: (2026)

Instruction-Following Evaluation of Large Vision-Language Models
by: Shiono, Daiki, et al.
Published: (2025)

Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling
by: Tanaka, Daichi, et al.
Published: (2025)

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
by: Yu, Qing, et al.
Published: (2024)

VOSR: A Vision-Only Generative Model for Image Super-Resolution
by: Wu, Rongyuan, et al.
Published: (2026)

Context-Aware Decoding for Faithful Vision-Language Generation
by: Fazli, Mehrdad, et al.
Published: (2026)

VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
by: Kamoi, Ryo, et al.
Published: (2024)

Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
by: Zhang, Xin, et al.
Published: (2025)

Bridging Hidden States in Vision-Language Models
by: Fein-Ashley, Benjamin, et al.
Published: (2025)

Orion-Lite: Distilling LLM Reasoning into Efficient Vision-Only Driving Models
by: Gu, Jing, et al.
Published: (2026)

VDG: Vision-Only Dynamic Gaussian for Driving Simulation
by: Li, Hao, et al.
Published: (2024)

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
by: Li, Wei, et al.
Published: (2024)

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
by: Zhu, Lanyun, et al.
Published: (2024)

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
by: Park, Woohyeon, et al.
Published: (2025)

SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding
by: Tong, Yujia, et al.
Published: (2026)

Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance
by: Chen, Xinrong, et al.
Published: (2026)

The Wallpaper is Ugly: Indoor Localization using Vision and Language
by: Pate, Seth, et al.
Published: (2024)

An Application-Agnostic Automatic Target Recognition System Using Vision Language Models
by: Palladino, Anthony, et al.
Published: (2024)

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
by: Kim, Mingyeong, et al.
Published: (2026)

SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)

Interactive Post-Training for Vision-Language-Action Models
by: Tan, Shuhan, et al.
Published: (2025)

Modality-Agnostic fMRI Decoding of Vision and Language
by: Nikolaus, Mitja, et al.
Published: (2024)