Saved in:
| Main Authors: | Tan, Zhiya, Zhang, Xin, Zhou, Joey Tianyi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.02405 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
by: Huang, Jing, et al.
Published: (2025)
by: Huang, Jing, et al.
Published: (2025)
AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences
by: Li, Jieyu, et al.
Published: (2025)
by: Li, Jieyu, et al.
Published: (2025)
Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification
by: Hong, Yuxin, et al.
Published: (2024)
by: Hong, Yuxin, et al.
Published: (2024)
Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges
by: Liu, Ping, et al.
Published: (2024)
by: Liu, Ping, et al.
Published: (2024)
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction
by: Hu, Juncheng, et al.
Published: (2026)
by: Hu, Juncheng, et al.
Published: (2026)
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
by: Zhang, Yuanhong, et al.
Published: (2026)
by: Zhang, Yuanhong, et al.
Published: (2026)
Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator
by: Zhang, Xin, et al.
Published: (2024)
by: Zhang, Xin, et al.
Published: (2024)
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
by: He, Yang, et al.
Published: (2024)
by: He, Yang, et al.
Published: (2024)
DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios
by: Miao, Changtao, et al.
Published: (2025)
by: Miao, Changtao, et al.
Published: (2025)
Multi-modal Attribute Prompting for Vision-Language Models
by: Liu, Xin, et al.
Published: (2024)
by: Liu, Xin, et al.
Published: (2024)
Beyond Modality Collapse: Representations Blending for Multimodal Dataset Distillation
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
by: Du, Jiawei, et al.
Published: (2024)
by: Du, Jiawei, et al.
Published: (2024)
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
Garment Attribute Manipulation with Multi-level Attention
by: Casula, Vittorio, et al.
Published: (2024)
by: Casula, Vittorio, et al.
Published: (2024)
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)
by: Liu, Jiaxiang, et al.
Published: (2025)
TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models
by: Zhang, Zhifang, et al.
Published: (2025)
by: Zhang, Zhifang, et al.
Published: (2025)
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
by: Deng, Jinhong, et al.
Published: (2025)
by: Deng, Jinhong, et al.
Published: (2025)
Multisize Dataset Condensation
by: He, Yang, et al.
Published: (2024)
by: He, Yang, et al.
Published: (2024)
PVP: Polar Representation Boost for 3D Semantic Occupancy Prediction
by: Xue, Yujing, et al.
Published: (2024)
by: Xue, Yujing, et al.
Published: (2024)
Low-Level Dataset Distillation for Medical Image Enhancement
by: Xu, Fengzhi, et al.
Published: (2025)
by: Xu, Fengzhi, et al.
Published: (2025)
IMS3: Breaking Distributional Aggregation in Diffusion-Based Dataset Distillation
by: Wang, Chenru, et al.
Published: (2026)
by: Wang, Chenru, et al.
Published: (2026)
TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery
by: Zhang, Li, et al.
Published: (2026)
by: Zhang, Li, et al.
Published: (2026)
SpecFLASH: A Latent-Guided Semi-autoregressive Speculative Decoding Framework for Efficient Multimodal Generation
by: Wang, Zihua, et al.
Published: (2025)
by: Wang, Zihua, et al.
Published: (2025)
Protect-Your-IP: Scalable Source-Tracing and Attribution against Personalized Generation
by: Li, Runyi, et al.
Published: (2024)
by: Li, Runyi, et al.
Published: (2024)
Modest-Align: Data-Efficient Alignment for Vision-Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)
by: Liu, Jiaxiang, et al.
Published: (2025)
Agentic Spatio-Temporal Grounding via Collaborative Reasoning
by: Zhao, Heng, et al.
Published: (2026)
by: Zhao, Heng, et al.
Published: (2026)
Multi-View Synergistic Learning with Vision-Language Adaption for Low-Resource Biomedical Image Classification
by: Luo, Xiaoliu, et al.
Published: (2026)
by: Luo, Xiaoliu, et al.
Published: (2026)
MVAD: A Benchmark Dataset for Multimodal AI-Generated Video-Audio Detection
by: Hu, Mengxue, et al.
Published: (2025)
by: Hu, Mengxue, et al.
Published: (2025)
MedCoT: Medical Chain of Thought via Hierarchical Expert
by: Liu, Jiaxiang, et al.
Published: (2024)
by: Liu, Jiaxiang, et al.
Published: (2024)
MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models
by: Zhou, Xunlan, et al.
Published: (2026)
by: Zhou, Xunlan, et al.
Published: (2026)
All-in-One Slider for Attribute Manipulation in Diffusion Models
by: Ye, Weixin, et al.
Published: (2025)
by: Ye, Weixin, et al.
Published: (2025)
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
by: Yan, Tingbing, et al.
Published: (2024)
by: Yan, Tingbing, et al.
Published: (2024)
Video Set Distillation: Information Diversification and Temporal Densification
by: Zhao, Yinjie, et al.
Published: (2024)
by: Zhao, Yinjie, et al.
Published: (2024)
Reinforcement-Guided Synthetic Data Generation for Privacy-Sensitive Identity Recognition
by: Jia, Xuemei, et al.
Published: (2026)
by: Jia, Xuemei, et al.
Published: (2026)
IncreFA: Breaking the Static Wall of Generative Model Attribution
by: Qin, Haotian, et al.
Published: (2026)
by: Qin, Haotian, et al.
Published: (2026)
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
A Concise but High-performing Network for Image Guided Depth Completion in Autonomous Driving
by: Liu, Moyun, et al.
Published: (2024)
by: Liu, Moyun, et al.
Published: (2024)
SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens
by: Zhang, Xiaoyan, et al.
Published: (2026)
by: Zhang, Xiaoyan, et al.
Published: (2026)
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models
by: Xiong, Lexiang, et al.
Published: (2026)
by: Xiong, Lexiang, et al.
Published: (2026)
Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
Similar Items
-
LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA
by: Huang, Jing, et al.
Published: (2025) -
AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences
by: Li, Jieyu, et al.
Published: (2025) -
Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification
by: Hong, Yuxin, et al.
Published: (2024) -
Evolving from Single-modal to Multi-modal Facial Deepfake Detection: Progress and Challenges
by: Liu, Ping, et al.
Published: (2024) -
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction
by: Hu, Juncheng, et al.
Published: (2026)