Saved in:
| Main Authors: | Zhao, Shiyu, Wang, Zhenting, Juefei-Xu, Felix, Xia, Xide, Liu, Miao, Wang, Xiaofang, Liang, Mingfu, Zhang, Ning, Metaxas, Dimitris N., Yu, Licheng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.00556 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Apollo: An Exploration of Video Understanding in Large Multimodal Models
by: Zohar, Orr, et al.
Published: (2024)
by: Zohar, Orr, et al.
Published: (2024)
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
by: Li, Zhuowei, et al.
Published: (2025)
by: Li, Zhuowei, et al.
Published: (2025)
LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking
by: Jin, Can, et al.
Published: (2024)
by: Jin, Can, et al.
Published: (2024)
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
by: Jin, Can, et al.
Published: (2025)
by: Jin, Can, et al.
Published: (2025)
DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models
by: Wang, Zhenting, et al.
Published: (2023)
by: Wang, Zhenting, et al.
Published: (2023)
MLLM-as-a-Judge for Image Safety without Human Labeling
by: Wang, Zhenting, et al.
Published: (2024)
by: Wang, Zhenting, et al.
Published: (2024)
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
by: Dafnis, Konstantinos M., et al.
Published: (2025)
by: Dafnis, Konstantinos M., et al.
Published: (2025)
Token-Budget-Aware LLM Reasoning
by: Han, Tingxu, et al.
Published: (2024)
by: Han, Tingxu, et al.
Published: (2024)
How to Trace Latent Generative Model Generated Images without Artificial Watermark?
by: Wang, Zhenting, et al.
Published: (2024)
by: Wang, Zhenting, et al.
Published: (2024)
Token-Controlled Re-ranking for Sequential Recommendation via LLMs
by: Dai, Wenxi, et al.
Published: (2025)
by: Dai, Wenxi, et al.
Published: (2025)
Evidence Over Plans: Online Trajectory Verification for Skill Distillation
by: Zhou, Yang, et al.
Published: (2026)
by: Zhou, Yang, et al.
Published: (2026)
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
by: Wang, Yibin, et al.
Published: (2024)
by: Wang, Yibin, et al.
Published: (2024)
Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation
by: Gu, Difei, et al.
Published: (2025)
by: Gu, Difei, et al.
Published: (2025)
QAPruner: Quantization-Aware Vision Token Pruning for Multimodal Large Language Models
by: Wang, Xinhao, et al.
Published: (2026)
by: Wang, Xinhao, et al.
Published: (2026)
Rethinking Token Reduction for Large Vision-Language Models
by: Wang, Yi, et al.
Published: (2026)
by: Wang, Yi, et al.
Published: (2026)
MHB: Multimodal Handshape-aware Boundary Detection for Continuous Sign Language Recognition
by: Zhao, Mingyu, et al.
Published: (2025)
by: Zhao, Mingyu, et al.
Published: (2025)
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
by: Zhang, Tunyu, et al.
Published: (2025)
by: Zhang, Tunyu, et al.
Published: (2025)
Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models
by: Zhang, Xinxi, et al.
Published: (2024)
by: Zhang, Xinxi, et al.
Published: (2024)
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
by: Gu, Difei, et al.
Published: (2025)
by: Gu, Difei, et al.
Published: (2025)
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
by: Dao, Quan, et al.
Published: (2026)
by: Dao, Quan, et al.
Published: (2026)
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models
by: Fu, Tianyu, et al.
Published: (2024)
by: Fu, Tianyu, et al.
Published: (2024)
Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023)
by: Zhao, Shiyu, et al.
Published: (2023)
LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
by: Zhao, Zihui, et al.
Published: (2025)
by: Zhao, Zihui, et al.
Published: (2025)
M^3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
Dynamic Token Reduction during Generation for Vision Language Models
by: Liang, Xiaoyu, et al.
Published: (2025)
by: Liang, Xiaoyu, et al.
Published: (2025)
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
by: Xu, Wujiang, et al.
Published: (2025)
by: Xu, Wujiang, et al.
Published: (2025)
Beyond Explicit Edges: Robust Reasoning over Noisy and Sparse Knowledge Graphs
by: Gao, Hang, et al.
Published: (2026)
by: Gao, Hang, et al.
Published: (2026)
Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
AVID: Any-Length Video Inpainting with Diffusion Model
by: Zhang, Zhixing, et al.
Published: (2023)
by: Zhang, Zhixing, et al.
Published: (2023)
Can Large Vision-Language Models Detect Images Copyright Infringement from GenAI?
by: Xu, Qipan, et al.
Published: (2025)
by: Xu, Qipan, et al.
Published: (2025)
Aligning Large Language Models with Healthcare Stakeholders: A Pathway to Trustworthy AI Integration
by: Ding, Kexin, et al.
Published: (2025)
by: Ding, Kexin, et al.
Published: (2025)
Visual Prompting in Multimodal Large Language Models: A Survey
by: Wu, Junda, et al.
Published: (2024)
by: Wu, Junda, et al.
Published: (2024)
TokenCom: Vision-Language Model for Multimodal and Multitask Token Communications
by: Jiang, Feibo, et al.
Published: (2026)
by: Jiang, Feibo, et al.
Published: (2026)
Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models
by: Deniz, Omer Faruk, et al.
Published: (2026)
by: Deniz, Omer Faruk, et al.
Published: (2026)
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
by: Zeng, Qingcheng, et al.
Published: (2024)
by: Zeng, Qingcheng, et al.
Published: (2024)
Large Sign Language Models: Toward 3D American Sign Language Translation
by: Zhang, Sen, et al.
Published: (2025)
by: Zhang, Sen, et al.
Published: (2025)
Score-Guided Diffusion for 3D Human Recovery
by: Stathopoulos, Anastasis, et al.
Published: (2024)
by: Stathopoulos, Anastasis, et al.
Published: (2024)
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
by: Huang, Yuheng, et al.
Published: (2023)
by: Huang, Yuheng, et al.
Published: (2023)
Similar Items
-
Apollo: An Exploration of Video Understanding in Large Multimodal Models
by: Zohar, Orr, et al.
Published: (2024) -
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
by: Li, Zhuowei, et al.
Published: (2025) -
LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation
by: Zhou, Yang, et al.
Published: (2025) -
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking
by: Jin, Can, et al.
Published: (2024) -
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
by: Jin, Can, et al.
Published: (2025)