Saved in:
| Main Authors: | Wu, Tianxiang, Nie, Minxin, Cao, Ziqiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.23089 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Position Prompt for MLLM based Visual Grounding
by: Tang, Wei, et al.
Published: (2025)
by: Tang, Wei, et al.
Published: (2025)
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
by: Wu, Mingrui, et al.
Published: (2024)
by: Wu, Mingrui, et al.
Published: (2024)
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
by: Dai, Tianxiang, et al.
Published: (2026)
by: Dai, Tianxiang, et al.
Published: (2026)
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
by: Jiang, Songtao, et al.
Published: (2024)
by: Jiang, Songtao, et al.
Published: (2024)
Elysium: Exploring Object-level Perception in Videos via MLLM
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
IPCV: Information-Preserving Compression for MLLM Visual Encoders
by: Chen, Yuan, et al.
Published: (2025)
by: Chen, Yuan, et al.
Published: (2025)
RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
by: Nie, Yunshuang, et al.
Published: (2026)
by: Nie, Yunshuang, et al.
Published: (2026)
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
by: Slyman, Eric, et al.
Published: (2025)
by: Slyman, Eric, et al.
Published: (2025)
EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness
by: Sun, Yueru, et al.
Published: (2026)
by: Sun, Yueru, et al.
Published: (2026)
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
by: Li, Xu, et al.
Published: (2025)
by: Li, Xu, et al.
Published: (2025)
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
by: Liu, Xinyu, et al.
Published: (2024)
by: Liu, Xinyu, et al.
Published: (2024)
MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge
by: Lee, Sua, et al.
Published: (2026)
by: Lee, Sua, et al.
Published: (2026)
InstructX: Towards Unified Visual Editing with MLLM Guidance
by: Mou, Chong, et al.
Published: (2025)
by: Mou, Chong, et al.
Published: (2025)
Structure Causal Models and LLMs Integration in Medical Visual Question Answering
by: Xu, Zibo, et al.
Published: (2025)
by: Xu, Zibo, et al.
Published: (2025)
The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts
by: Zhang, Yuchen, et al.
Published: (2025)
by: Zhang, Yuchen, et al.
Published: (2025)
Robust MLLM Unlearning via Visual Knowledge Distillation
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
MambaRefine-YOLO: A Dual-Modality Small Object Detector for UAV Imagery
by: Cao, Shuyu, et al.
Published: (2025)
by: Cao, Shuyu, et al.
Published: (2025)
EarthGPT-X: A Spatial MLLM for Multi-level Multi-Source Remote Sensing Imagery Understanding with Visual Prompting
by: Zhang, Wei, et al.
Published: (2025)
by: Zhang, Wei, et al.
Published: (2025)
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
by: Wu, Yixuan, et al.
Published: (2024)
by: Wu, Yixuan, et al.
Published: (2024)
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
by: Munasinghe, Shehan, et al.
Published: (2024)
by: Munasinghe, Shehan, et al.
Published: (2024)
dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models
by: Xin, Yi, et al.
Published: (2025)
by: Xin, Yi, et al.
Published: (2025)
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
by: Wu, Yongjian, et al.
Published: (2024)
by: Wu, Yongjian, et al.
Published: (2024)
MM-IFEngine: Towards Multimodal Instruction Following
by: Ding, Shengyuan, et al.
Published: (2025)
by: Ding, Shengyuan, et al.
Published: (2025)
Evaluating Visual Prompts with Eye-Tracking Data for MLLM-Based Human Activity Recognition
by: Choi, Jae Young, et al.
Published: (2026)
by: Choi, Jae Young, et al.
Published: (2026)
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
by: Peng, Taiying, et al.
Published: (2025)
by: Peng, Taiying, et al.
Published: (2025)
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
by: Zhou, Yang, et al.
Published: (2024)
by: Zhou, Yang, et al.
Published: (2024)
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
by: Fang, Rongyao, et al.
Published: (2024)
by: Fang, Rongyao, et al.
Published: (2024)
AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs
by: Chang, Boyu, et al.
Published: (2026)
by: Chang, Boyu, et al.
Published: (2026)
PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning
by: Ma'sum, Muhammad Anwar, et al.
Published: (2024)
by: Ma'sum, Muhammad Anwar, et al.
Published: (2024)
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding
by: Zhang, Haoyu, et al.
Published: (2025)
by: Zhang, Haoyu, et al.
Published: (2025)
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
by: Zhao, Pengfei, et al.
Published: (2025)
by: Zhao, Pengfei, et al.
Published: (2025)
D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
by: Zhang, Evelyn, et al.
Published: (2025)
by: Zhang, Evelyn, et al.
Published: (2025)
MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence
by: Yin, Xingyilang, et al.
Published: (2026)
by: Yin, Xingyilang, et al.
Published: (2026)
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
by: Huang, Runhui, et al.
Published: (2025)
by: Huang, Runhui, et al.
Published: (2025)
Revisiting MLLM Token Technology through the Lens of Classical Visual Coding
by: Liu, Jinming, et al.
Published: (2025)
by: Liu, Jinming, et al.
Published: (2025)
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
by: Lu, Yi, et al.
Published: (2025)
by: Lu, Yi, et al.
Published: (2025)
HAMMER: Harnessing MLLM via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding
by: Yao, Lei, et al.
Published: (2026)
by: Yao, Lei, et al.
Published: (2026)
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
by: Xu, Jingwei, et al.
Published: (2024)
by: Xu, Jingwei, et al.
Published: (2024)
Visual Hallucinations of Multi-modal Large Language Models
by: Huang, Wen, et al.
Published: (2024)
by: Huang, Wen, et al.
Published: (2024)
Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent
by: Wu, Junda, et al.
Published: (2025)
by: Wu, Junda, et al.
Published: (2025)
Similar Items
-
Visual Position Prompt for MLLM based Visual Grounding
by: Tang, Wei, et al.
Published: (2025) -
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
by: Wu, Mingrui, et al.
Published: (2024) -
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
by: Dai, Tianxiang, et al.
Published: (2026) -
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
by: Jiang, Songtao, et al.
Published: (2024) -
Elysium: Exploring Object-level Perception in Videos via MLLM
by: Wang, Han, et al.
Published: (2024)