Saved in:
| Main Authors: | Han, Xiaoqi, Li, Ru, Yi, Ran, Tan, Hongye, Liang, Zhuomin, Gutiérrez-Basulto, Víctor, Pan, Jeff Z. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.13243 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-level Matching Network for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2024)
by: Hu, Zhiwei, et al.
Published: (2024)
Multi-level Mixture of Experts for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2025)
by: Hu, Zhiwei, et al.
Published: (2025)
Consistency-Aware Editing for Entity-level Unlearning in Language Models
by: Han, Xiaoqi, et al.
Published: (2025)
by: Han, Xiaoqi, et al.
Published: (2025)
Uncovering Entity Identity Confusion in Multimodal Knowledge Editing
by: Wu, Shu, et al.
Published: (2026)
by: Wu, Shu, et al.
Published: (2026)
StyleBooth: Image Style Editing with Multimodal Instruction
by: Han, Zhen, et al.
Published: (2024)
by: Han, Zhen, et al.
Published: (2024)
Knowledge-Aware Neuron Interpretation for Scene Classification
by: Guan, Yong, et al.
Published: (2024)
by: Guan, Yong, et al.
Published: (2024)
MambaTrans: Multimodal Fusion Image Translation via Large Language Model Priors for Downstream Visual Tasks
by: Xu, Yushen, et al.
Published: (2025)
by: Xu, Yushen, et al.
Published: (2025)
Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
by: Zhu, Xingyu, et al.
Published: (2026)
by: Zhu, Xingyu, et al.
Published: (2026)
Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)
by: Khan, Mohammed Safi Ur Rahman, et al.
Published: (2026)
Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
by: Yang, Zeyu, et al.
Published: (2023)
by: Yang, Zeyu, et al.
Published: (2023)
The Blind Spot of Adaptation: Quantifying and Mitigating Forgetting in Fine-tuned Driving Models
by: Mao, Runhao, et al.
Published: (2026)
by: Mao, Runhao, et al.
Published: (2026)
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
by: Pan, Kaihang, et al.
Published: (2024)
by: Pan, Kaihang, et al.
Published: (2024)
BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution
by: Li, Feng, et al.
Published: (2024)
by: Li, Feng, et al.
Published: (2024)
Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming
by: Tong, Baoshun, et al.
Published: (2026)
by: Tong, Baoshun, et al.
Published: (2026)
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision
by: Magay, Alexey, et al.
Published: (2025)
by: Magay, Alexey, et al.
Published: (2025)
MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing
by: Hou, Ruibing, et al.
Published: (2025)
by: Hou, Ruibing, et al.
Published: (2025)
Vidi: Large Multimodal Models for Video Understanding and Editing
by: Vidi Team, et al.
Published: (2025)
by: Vidi Team, et al.
Published: (2025)
Mitigating the Impact of Attribute Editing on Face Recognition
by: Banerjee, Sudipta, et al.
Published: (2024)
by: Banerjee, Sudipta, et al.
Published: (2024)
Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics
by: Jia, Lixin, et al.
Published: (2025)
by: Jia, Lixin, et al.
Published: (2025)
EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing
by: Khalid, Umar, et al.
Published: (2024)
by: Khalid, Umar, et al.
Published: (2024)
DreamOmni2: Multimodal Instruction-based Editing and Generation
by: Xia, Bin, et al.
Published: (2025)
by: Xia, Bin, et al.
Published: (2025)
Image Editing As Programs with Diffusion Models
by: Hu, Yujia, et al.
Published: (2025)
by: Hu, Yujia, et al.
Published: (2025)
HIME: Mitigating Object Hallucinations in LVLMs via Hallucination Insensitivity Model Editing
by: Akl, Ahmed, et al.
Published: (2026)
by: Akl, Ahmed, et al.
Published: (2026)
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
by: Gao, Hongcheng, et al.
Published: (2025)
by: Gao, Hongcheng, et al.
Published: (2025)
Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing
by: Liu, Jialun, et al.
Published: (2026)
by: Liu, Jialun, et al.
Published: (2026)
Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
by: Song, Yuxin, et al.
Published: (2025)
by: Song, Yuxin, et al.
Published: (2025)
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
by: Baldrati, Alberto, et al.
Published: (2024)
by: Baldrati, Alberto, et al.
Published: (2024)
Towards Generalized Multi-Image Editing for Unified Multimodal Models
by: Xu, Pengcheng, et al.
Published: (2026)
by: Xu, Pengcheng, et al.
Published: (2026)
HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation
by: Cheng, Hongye, et al.
Published: (2025)
by: Cheng, Hongye, et al.
Published: (2025)
Spatial Blindness in Whole-Slide Multiple Instance Learning
by: Li, Xiangyu, et al.
Published: (2026)
by: Li, Xiangyu, et al.
Published: (2026)
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
by: Ouyang, Wenqi, et al.
Published: (2024)
by: Ouyang, Wenqi, et al.
Published: (2024)
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
by: Zhuang, Xianwei, et al.
Published: (2025)
by: Zhuang, Xianwei, et al.
Published: (2025)
SOEDiff: Efficient Distillation for Small Object Editing
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
GSVA: Generalized Segmentation via Multimodal Large Language Models
by: Xia, Zhuofan, et al.
Published: (2023)
by: Xia, Zhuofan, et al.
Published: (2023)
Mitigating Backdoor Attacks using Activation-Guided Model Editing
by: Hsieh, Felix, et al.
Published: (2024)
by: Hsieh, Felix, et al.
Published: (2024)
VACE: All-in-One Video Creation and Editing
by: Jiang, Zeyinzi, et al.
Published: (2025)
by: Jiang, Zeyinzi, et al.
Published: (2025)
VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
by: Huang, Han, et al.
Published: (2024)
by: Huang, Han, et al.
Published: (2024)
Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models
by: Ma, Ji, et al.
Published: (2026)
by: Ma, Ji, et al.
Published: (2026)
VEBench:Benchmarking Large Multimodal Models for Real-World Video Editing
by: Deng, Andong, et al.
Published: (2026)
by: Deng, Andong, et al.
Published: (2026)
ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
by: Srivastava, Ashutosh, et al.
Published: (2024)
by: Srivastava, Ashutosh, et al.
Published: (2024)
Similar Items
-
Multi-level Matching Network for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2024) -
Multi-level Mixture of Experts for Multimodal Entity Linking
by: Hu, Zhiwei, et al.
Published: (2025) -
Consistency-Aware Editing for Entity-level Unlearning in Language Models
by: Han, Xiaoqi, et al.
Published: (2025) -
Uncovering Entity Identity Confusion in Multimodal Knowledge Editing
by: Wu, Shu, et al.
Published: (2026) -
StyleBooth: Image Style Editing with Multimodal Instruction
by: Han, Zhen, et al.
Published: (2024)