:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Shiyu, Wang, Zhenting, Juefei-Xu, Felix, Xia, Xide, Liu, Miao, Wang, Xiaofang, Liang, Mingfu, Zhang, Ning, Metaxas, Dimitris N., Yu, Licheng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.00556
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Apollo: An Exploration of Video Understanding in Large Multimodal Models
by: Zohar, Orr, et al.
Published: (2024)

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
by: Li, Zhuowei, et al.
Published: (2025)

LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation
by: Zhou, Yang, et al.
Published: (2025)

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking
by: Jin, Can, et al.
Published: (2024)

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
by: Jin, Can, et al.
Published: (2025)

DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models
by: Wang, Zhenting, et al.
Published: (2023)

MLLM-as-a-Judge for Image Safety without Human Labeling
by: Wang, Zhenting, et al.
Published: (2024)

Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
by: Dafnis, Konstantinos M., et al.
Published: (2025)

Token-Budget-Aware LLM Reasoning
by: Han, Tingxu, et al.
Published: (2024)

How to Trace Latent Generative Model Generated Images without Artificial Watermark?
by: Wang, Zhenting, et al.
Published: (2024)

Token-Controlled Re-ranking for Sequential Recommendation via LLMs
by: Dai, Wenxi, et al.
Published: (2025)

Evidence Over Plans: Online Trajectory Verification for Skill Distillation
by: Zhou, Yang, et al.
Published: (2026)

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
by: Wang, Yibin, et al.
Published: (2024)

Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation
by: Gu, Difei, et al.
Published: (2025)

QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)

Rethinking Token Reduction for Large Vision-Language Models
by: Wang, Yi, et al.
Published: (2026)

MHB: Multimodal Handshape-aware Boundary Detection for Continuous Sign Language Recognition
by: Zhao, Mingyu, et al.
Published: (2025)

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
by: Zhang, Tunyu, et al.
Published: (2025)

Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models
by: Zhang, Xinxi, et al.
Published: (2024)

RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
by: Gu, Difei, et al.
Published: (2025)

MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
by: Dao, Quan, et al.
Published: (2026)

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
by: Fu, Tianyu, et al.
Published: (2024)

Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023)

LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
by: Zhao, Zihui, et al.
Published: (2025)

M^3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark
by: Zhou, Yang, et al.
Published: (2025)

Dynamic Token Reduction during Generation for Vision Language Models
by: Liang, Xiaoyu, et al.
Published: (2025)

MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
by: Xu, Wujiang, et al.
Published: (2025)

Beyond Explicit Edges: Robust Reasoning over Noisy and Sparse Knowledge Graphs
by: Gao, Hang, et al.
Published: (2026)

Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)

AVID: Any-Length Video Inpainting with Diffusion Model
by: Zhang, Zhixing, et al.
Published: (2023)

Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
by: Xu, Qipan, et al.
Published: (2025)

Aligning Large Language Models with Healthcare Stakeholders: A Pathway to Trustworthy AI Integration
by: Ding, Kexin, et al.
Published: (2025)

Visual Prompting in Multimodal Large Language Models: A Survey
by: Wu, Junda, et al.
Published: (2024)

TokenCom: Vision-Language Model for Multimodal and Multitask Token Communications
by: Jiang, Feibo, et al.
Published: (2026)

Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
by: Deniz, Omer Faruk, et al.
Published: (2026)

Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
by: Zeng, Qingcheng, et al.
Published: (2024)

Large Sign Language Models: Toward 3D American Sign Language Translation
by: Zhang, Sen, et al.
Published: (2025)

Score-Guided Diffusion for 3D Human Recovery
by: Stathopoulos, Anastasis, et al.
Published: (2024)

Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
by: Huang, Yuheng, et al.
Published: (2023)