:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhang, Zeliang, Sun, Rui, Liu, Jiani, Wu, Qi, Xu, Chenliang
Formato:	Preprint
Publicado:	2026
Materias:	Computation and Language Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2604.01514
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
por: Zhang, Zeliang, et al.
Publicado: (2024)

Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
por: Liu, Jiani, et al.
Publicado: (2025)

Scaling Concept With Text-Guided Diffusion Models
por: Huang, Chao, et al.
Publicado: (2024)

Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
por: Tan, Zhangyun, et al.
Publicado: (2026)

Targeted Forgetting of Image Subgroups in CLIP Models
por: Zhang, Zeliang, et al.
Publicado: (2025)

Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?
por: Zhang, Zeliang, et al.
Publicado: (2024)

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
por: Zhang, Zeliang, et al.
Publicado: (2025)

Video Understanding with Large Language Models: A Survey
por: Tang, Yolo Y., et al.
Publicado: (2023)

Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
por: Feng, Mingqian, et al.
Publicado: (2024)

Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
por: Zhang, Zeliang, et al.
Publicado: (2024)

Learning to Transform Dynamically for Better Adversarial Transferability
por: Zhu, Rongyi, et al.
Publicado: (2024)

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
por: Gao, Hongcheng, et al.
Publicado: (2024)

The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?
por: Yin, Hao, et al.
Publicado: (2025)

CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
por: Han, Insu, et al.
Publicado: (2025)

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
por: Xing, Shangyu, et al.
Publicado: (2024)

Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments
por: Hong, Haodong, et al.
Publicado: (2024)

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
por: Luo, Chuwei, et al.
Publicado: (2022)

DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
por: Huang, Chao, et al.
Publicado: (2025)

Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
por: Hong, Haodong, et al.
Publicado: (2024)

Hierarchy-Aware Multimodal Unlearning for Medical AI
por: Wu, Fengli, et al.
Publicado: (2025)

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
por: Luo, Chuwei, et al.
Publicado: (2024)

LLaVA-Video: Video Instruction Tuning With Synthetic Data
por: Zhang, Yuanhan, et al.
Publicado: (2024)

Otter: A Multi-Modal Model with In-Context Instruction Tuning
por: Li, Bo, et al.
Publicado: (2023)

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
por: Xing, Zhen, et al.
Publicado: (2024)

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering
por: Li, Yangfu, et al.
Publicado: (2025)

TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
por: Miao, Daiye, et al.
Publicado: (2025)

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
por: Xu, Shilin, et al.
Publicado: (2025)

Forward Learning for Gradient-based Black-box Saliency Map Generation
por: Zhang, Zeliang, et al.
Publicado: (2024)

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
por: You, Zebin, et al.
Publicado: (2025)

CLEAR: Character Unlearning in Textual and Visual Modalities
por: Dontsov, Alexey, et al.
Publicado: (2024)

Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models
por: Nazir, Maham, et al.
Publicado: (2026)

Rethinking Machine Unlearning in Image Generation Models
por: Liu, Renyang, et al.
Publicado: (2025)

Maya: An Instruction Finetuned Multilingual Multimodal Model
por: Alam, Nahid, et al.
Publicado: (2024)

Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
por: Liu, Zikang, et al.
Publicado: (2025)

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
por: Zhang, Yanzhe, et al.
Publicado: (2023)

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
por: Wu, Mengyang, et al.
Publicado: (2024)

UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models
por: Zhang, Yihua, et al.
Publicado: (2024)

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
por: Fan, Zhiwen, et al.
Publicado: (2025)

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
por: Zhu, Wanrong, et al.
Publicado: (2024)

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
por: Huang, Yupan, et al.
Publicado: (2023)