Guardado en:
| Autores principales: | Zhang, Zeliang, Sun, Rui, Liu, Jiani, Wu, Qi, Xu, Chenliang |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2604.01514 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
por: Zhang, Zeliang, et al.
Publicado: (2024)
por: Zhang, Zeliang, et al.
Publicado: (2024)
Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
por: Liu, Jiani, et al.
Publicado: (2025)
por: Liu, Jiani, et al.
Publicado: (2025)
Scaling Concept With Text-Guided Diffusion Models
por: Huang, Chao, et al.
Publicado: (2024)
por: Huang, Chao, et al.
Publicado: (2024)
Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
por: Tan, Zhangyun, et al.
Publicado: (2026)
por: Tan, Zhangyun, et al.
Publicado: (2026)
Targeted Forgetting of Image Subgroups in CLIP Models
por: Zhang, Zeliang, et al.
Publicado: (2025)
por: Zhang, Zeliang, et al.
Publicado: (2025)
Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?
por: Zhang, Zeliang, et al.
Publicado: (2024)
por: Zhang, Zeliang, et al.
Publicado: (2024)
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
por: Zhang, Zeliang, et al.
Publicado: (2025)
por: Zhang, Zeliang, et al.
Publicado: (2025)
Video Understanding with Large Language Models: A Survey
por: Tang, Yolo Y., et al.
Publicado: (2023)
por: Tang, Yolo Y., et al.
Publicado: (2023)
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
por: Feng, Mingqian, et al.
Publicado: (2024)
por: Feng, Mingqian, et al.
Publicado: (2024)
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
por: Zhang, Zeliang, et al.
Publicado: (2024)
por: Zhang, Zeliang, et al.
Publicado: (2024)
Learning to Transform Dynamically for Better Adversarial Transferability
por: Zhu, Rongyi, et al.
Publicado: (2024)
por: Zhu, Rongyi, et al.
Publicado: (2024)
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
por: Gao, Hongcheng, et al.
Publicado: (2024)
por: Gao, Hongcheng, et al.
Publicado: (2024)
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs?
por: Yin, Hao, et al.
Publicado: (2025)
por: Yin, Hao, et al.
Publicado: (2025)
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
por: Han, Insu, et al.
Publicado: (2025)
por: Han, Insu, et al.
Publicado: (2025)
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models
por: Xing, Shangyu, et al.
Publicado: (2024)
por: Xing, Shangyu, et al.
Publicado: (2024)
Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments
por: Hong, Haodong, et al.
Publicado: (2024)
por: Hong, Haodong, et al.
Publicado: (2024)
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
por: Luo, Chuwei, et al.
Publicado: (2022)
por: Luo, Chuwei, et al.
Publicado: (2022)
DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
por: Huang, Chao, et al.
Publicado: (2025)
por: Huang, Chao, et al.
Publicado: (2025)
Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
por: Hong, Haodong, et al.
Publicado: (2024)
por: Hong, Haodong, et al.
Publicado: (2024)
Hierarchy-Aware Multimodal Unlearning for Medical AI
por: Wu, Fengli, et al.
Publicado: (2025)
por: Wu, Fengli, et al.
Publicado: (2025)
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
por: Luo, Chuwei, et al.
Publicado: (2024)
por: Luo, Chuwei, et al.
Publicado: (2024)
LLaVA-Video: Video Instruction Tuning With Synthetic Data
por: Zhang, Yuanhan, et al.
Publicado: (2024)
por: Zhang, Yuanhan, et al.
Publicado: (2024)
Otter: A Multi-Modal Model with In-Context Instruction Tuning
por: Li, Bo, et al.
Publicado: (2023)
por: Li, Bo, et al.
Publicado: (2023)
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
por: Xing, Zhen, et al.
Publicado: (2024)
por: Xing, Zhen, et al.
Publicado: (2024)
Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering
por: Li, Yangfu, et al.
Publicado: (2025)
por: Li, Yangfu, et al.
Publicado: (2025)
TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
por: Miao, Daiye, et al.
Publicado: (2025)
por: Miao, Daiye, et al.
Publicado: (2025)
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
por: Xu, Shilin, et al.
Publicado: (2025)
por: Xu, Shilin, et al.
Publicado: (2025)
Forward Learning for Gradient-based Black-box Saliency Map Generation
por: Zhang, Zeliang, et al.
Publicado: (2024)
por: Zhang, Zeliang, et al.
Publicado: (2024)
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
por: You, Zebin, et al.
Publicado: (2025)
por: You, Zebin, et al.
Publicado: (2025)
CLEAR: Character Unlearning in Textual and Visual Modalities
por: Dontsov, Alexey, et al.
Publicado: (2024)
por: Dontsov, Alexey, et al.
Publicado: (2024)
Multimodal Abstractive Summarization of Instructional Videos with Vision-Language Models
por: Nazir, Maham, et al.
Publicado: (2026)
por: Nazir, Maham, et al.
Publicado: (2026)
Rethinking Machine Unlearning in Image Generation Models
por: Liu, Renyang, et al.
Publicado: (2025)
por: Liu, Renyang, et al.
Publicado: (2025)
Maya: An Instruction Finetuned Multilingual Multimodal Model
por: Alam, Nahid, et al.
Publicado: (2024)
por: Alam, Nahid, et al.
Publicado: (2024)
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
por: Liu, Zikang, et al.
Publicado: (2025)
por: Liu, Zikang, et al.
Publicado: (2025)
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
por: Zhang, Yanzhe, et al.
Publicado: (2023)
por: Zhang, Yanzhe, et al.
Publicado: (2023)
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
por: Wu, Mengyang, et al.
Publicado: (2024)
por: Wu, Mengyang, et al.
Publicado: (2024)
UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models
por: Zhang, Yihua, et al.
Publicado: (2024)
por: Zhang, Yihua, et al.
Publicado: (2024)
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
por: Fan, Zhiwen, et al.
Publicado: (2025)
por: Fan, Zhiwen, et al.
Publicado: (2025)
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
por: Zhu, Wanrong, et al.
Publicado: (2024)
por: Zhu, Wanrong, et al.
Publicado: (2024)
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
por: Huang, Yupan, et al.
Publicado: (2023)
por: Huang, Yupan, et al.
Publicado: (2023)
Ejemplares similares
-
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP
por: Zhang, Zeliang, et al.
Publicado: (2024) -
Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
por: Liu, Jiani, et al.
Publicado: (2025) -
Scaling Concept With Text-Guided Diffusion Models
por: Huang, Chao, et al.
Publicado: (2024) -
Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
por: Tan, Zhangyun, et al.
Publicado: (2026) -
Targeted Forgetting of Image Subgroups in CLIP Models
por: Zhang, Zeliang, et al.
Publicado: (2025)