Saved in:
| Main Authors: | Parashar, Shubham, Lin, Zhiqiu, Liu, Tian, Dong, Xiangjue, Li, Yanan, Ramanan, Deva, Caverlee, James, Kong, Shu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.12425 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Revisiting the Role of Language Priors in Vision-Language Models
by: Lin, Zhiqiu, et al.
Published: (2023)
by: Lin, Zhiqiu, et al.
Published: (2023)
Language Models as Black-Box Optimizers for Vision-Language Models
by: Liu, Shihong, et al.
Published: (2023)
by: Liu, Shihong, et al.
Published: (2023)
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
by: Li, Baiqi, et al.
Published: (2024)
by: Li, Baiqi, et al.
Published: (2024)
Revisiting Few-Shot Object Detection with Vision-Language Models
by: Madan, Anish, et al.
Published: (2023)
by: Madan, Anish, et al.
Published: (2023)
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
by: Mitra, Chancharik, et al.
Published: (2024)
by: Mitra, Chancharik, et al.
Published: (2024)
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
by: Wan, Zifu, et al.
Published: (2025)
by: Wan, Zifu, et al.
Published: (2025)
RefAV: Towards Planning-Centric Scenario Mining
by: Davidson, Cainan, et al.
Published: (2025)
by: Davidson, Cainan, et al.
Published: (2025)
Long-Tailed 3D Detection via Multi-Modal Fusion
by: Ma, Yechi, et al.
Published: (2023)
by: Ma, Yechi, et al.
Published: (2023)
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
by: Robicheaux, Peter, et al.
Published: (2025)
by: Robicheaux, Peter, et al.
Published: (2025)
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
by: Wan, Zifu, et al.
Published: (2025)
by: Wan, Zifu, et al.
Published: (2025)
Evaluating Text-to-Visual Generation with Image-to-Text Generation
by: Lin, Zhiqiu, et al.
Published: (2024)
by: Lin, Zhiqiu, et al.
Published: (2024)
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations
by: Mitra, Chancharik, et al.
Published: (2025)
by: Mitra, Chancharik, et al.
Published: (2025)
Surely Large Multimodal Models (Don't) Excel in Visual Species Recognition?
by: Liu, Tian, et al.
Published: (2025)
by: Liu, Tian, et al.
Published: (2025)
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
by: Liu, Tian, et al.
Published: (2024)
by: Liu, Tian, et al.
Published: (2024)
Solving Semi-Supervised Few-Shot Learning from an Auto-Annotation Perspective
by: Liu, Tian, et al.
Published: (2025)
by: Liu, Tian, et al.
Published: (2025)
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
by: Lin, Zhiqiu, et al.
Published: (2023)
by: Lin, Zhiqiu, et al.
Published: (2023)
Predicting Long-horizon Futures by Conditioning on Geometry and Time
by: Khurana, Tarasha, et al.
Published: (2024)
by: Khurana, Tarasha, et al.
Published: (2024)
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
by: Tan, Jeff, et al.
Published: (2024)
by: Tan, Jeff, et al.
Published: (2024)
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
by: Vuong, Khiem, et al.
Published: (2025)
by: Vuong, Khiem, et al.
Published: (2025)
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
by: Li, Baiqi, et al.
Published: (2024)
by: Li, Baiqi, et al.
Published: (2024)
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
by: Zhao, Qitao, et al.
Published: (2025)
by: Zhao, Qitao, et al.
Published: (2025)
Using Diffusion Priors for Video Amodal Segmentation
by: Chen, Kaihua, et al.
Published: (2024)
by: Chen, Kaihua, et al.
Published: (2024)
Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos
by: Chen, Kaihua, et al.
Published: (2025)
by: Chen, Kaihua, et al.
Published: (2025)
Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Building a Precise Video Language with Human-AI Oversight
by: Lin, Zhiqiu, et al.
Published: (2026)
by: Lin, Zhiqiu, et al.
Published: (2026)
Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)
by: Zhang, Yanan, et al.
Published: (2024)
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
by: Zhang, Yikai, et al.
Published: (2024)
by: Zhang, Yikai, et al.
Published: (2024)
$DA^3$: A Distribution-Aware Adversarial Attack against Language Models
by: Wang, Yibo, et al.
Published: (2023)
by: Wang, Yibo, et al.
Published: (2023)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
by: Diao, Xingjian, et al.
Published: (2026)
by: Diao, Xingjian, et al.
Published: (2026)
A Survey on LLM Inference-Time Self-Improvement
by: Dong, Xiangjue, et al.
Published: (2024)
by: Dong, Xiangjue, et al.
Published: (2024)
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
by: Ye, Jiacheng, et al.
Published: (2025)
by: Ye, Jiacheng, et al.
Published: (2025)
Language Models as Semantic Augmenters for Sequential Recommenders
by: Valizadeh, Mahsa, et al.
Published: (2025)
by: Valizadeh, Mahsa, et al.
Published: (2025)
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)
by: Kang, Jialiang, et al.
Published: (2025)
Towards Understanding Camera Motions in Any Video
by: Lin, Zhiqiu, et al.
Published: (2025)
by: Lin, Zhiqiu, et al.
Published: (2025)
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models
by: Teleki, Maria, et al.
Published: (2025)
by: Teleki, Maria, et al.
Published: (2025)
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
by: Fang, Hao, et al.
Published: (2025)
by: Fang, Hao, et al.
Published: (2025)
Soft Augmentation for Image Classification
by: Liu, Yang, et al.
Published: (2022)
by: Liu, Yang, et al.
Published: (2022)
Similar Items
-
Revisiting the Role of Language Priors in Vision-Language Models
by: Lin, Zhiqiu, et al.
Published: (2023) -
Language Models as Black-Box Optimizers for Vision-Language Models
by: Liu, Shihong, et al.
Published: (2023) -
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
by: Li, Baiqi, et al.
Published: (2024) -
Revisiting Few-Shot Object Detection with Vision-Language Models
by: Madan, Anish, et al.
Published: (2023) -
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
by: Mitra, Chancharik, et al.
Published: (2024)