Saved in:
| Main Authors: | Shao, Jie, Zhu, Ke, Fu, Minghao, Wang, Guo-hua, Wu, Jianxin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.09598 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rectify the Regression Bias in Long-Tailed Object Detection
by: Zhu, Ke, et al.
Published: (2024)
by: Zhu, Ke, et al.
Published: (2024)
Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition
by: Lee, Jongseo, et al.
Published: (2025)
by: Lee, Jongseo, et al.
Published: (2025)
Quantization without Tears
by: Fu, Minghao, et al.
Published: (2024)
by: Fu, Minghao, et al.
Published: (2024)
DTL: Disentangled Transfer Learning for Visual Recognition
by: Fu, Minghao, et al.
Published: (2023)
by: Fu, Minghao, et al.
Published: (2023)
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
by: Tang, Ningyuan, et al.
Published: (2024)
by: Tang, Ningyuan, et al.
Published: (2024)
DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition
by: Shao, Jie, et al.
Published: (2024)
by: Shao, Jie, et al.
Published: (2024)
When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance
by: Cao, Jinjin, et al.
Published: (2025)
by: Cao, Jinjin, et al.
Published: (2025)
Minimal Interaction Separated Tuning: A New Paradigm for Visual Adaptation
by: Tang, Ningyuan, et al.
Published: (2024)
by: Tang, Ningyuan, et al.
Published: (2024)
QwT-v2: Practical, Effective and Efficient Post-Training Quantization
by: Tang, Ningyuan, et al.
Published: (2025)
by: Tang, Ningyuan, et al.
Published: (2025)
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
by: Wang, Zhaochen, et al.
Published: (2025)
by: Wang, Zhaochen, et al.
Published: (2025)
Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification
by: Lin, Rifen, et al.
Published: (2025)
by: Lin, Rifen, et al.
Published: (2025)
Diffusion Product Quantization
by: Shao, Jie, et al.
Published: (2024)
by: Shao, Jie, et al.
Published: (2024)
Continuous Visual Autoregressive Generation via Score Maximization
by: Shao, Chenze, et al.
Published: (2025)
by: Shao, Chenze, et al.
Published: (2025)
All You Need in Knowledge Distillation Is a Tailored Coordinate System
by: Zhou, Junjie, et al.
Published: (2024)
by: Zhou, Junjie, et al.
Published: (2024)
Teaching LMMs for Image Quality Scoring and Interpreting
by: Zhang, Zicheng, et al.
Published: (2025)
by: Zhang, Zicheng, et al.
Published: (2025)
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
by: Weng, Zhaotian, et al.
Published: (2024)
by: Weng, Zhaotian, et al.
Published: (2024)
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
by: Fu, Minghao, et al.
Published: (2025)
by: Fu, Minghao, et al.
Published: (2025)
Stable Score Distillation for High-Quality 3D Generation
by: Tang, Boshi, et al.
Published: (2023)
by: Tang, Boshi, et al.
Published: (2023)
FaceScore: Benchmarking and Enhancing Face Quality in Human Generation
by: Liao, Zhenyi, et al.
Published: (2024)
by: Liao, Zhenyi, et al.
Published: (2024)
ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts
by: Tran, Uy Dieu, et al.
Published: (2024)
by: Tran, Uy Dieu, et al.
Published: (2024)
Forge-and-Quench: Enhancing Image Generation for Higher Fidelity in Unified Multimodal Models
by: Zeng, Yanbing, et al.
Published: (2026)
by: Zeng, Yanbing, et al.
Published: (2026)
Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation
by: Fu, Kang, et al.
Published: (2026)
by: Fu, Kang, et al.
Published: (2026)
Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation
by: He, Yiguo, et al.
Published: (2025)
by: He, Yiguo, et al.
Published: (2025)
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
by: Wang, Yueqian, et al.
Published: (2024)
by: Wang, Yueqian, et al.
Published: (2024)
A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition
by: Zhu, Jie, et al.
Published: (2025)
by: Zhu, Jie, et al.
Published: (2025)
Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images
by: Aziz, Memoona, et al.
Published: (2024)
by: Aziz, Memoona, et al.
Published: (2024)
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
by: Guo, Minghao, et al.
Published: (2024)
by: Guo, Minghao, et al.
Published: (2024)
Understanding the Failure Modes of Out-of-Distribution Generalization
by: Nagarajan, Vaishnavh, et al.
Published: (2020)
by: Nagarajan, Vaishnavh, et al.
Published: (2020)
Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)
by: Fang, Yan, et al.
Published: (2026)
More Images, More Problems? A Controlled Analysis of VLM Failure Modes
by: Das, Anurag, et al.
Published: (2026)
by: Das, Anurag, et al.
Published: (2026)
Segment Any-Quality Images with Generative Latent Space Enhancement
by: Guo, Guangqian, et al.
Published: (2025)
by: Guo, Guangqian, et al.
Published: (2025)
Score2Instruct: Scaling Up Video Quality-Centric Instructions via Automated Dimension Scoring
by: Xie, Qizhi, et al.
Published: (2025)
by: Xie, Qizhi, et al.
Published: (2025)
Analytic Score Optimization for Multi Dimension Video Quality Assessment
by: Lin, Boda, et al.
Published: (2026)
by: Lin, Boda, et al.
Published: (2026)
Radiology Report Generation for Low-Quality X-Ray Images
by: Zhu, Hongze, et al.
Published: (2026)
by: Zhu, Hongze, et al.
Published: (2026)
Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution
by: You, Zhiyuan, et al.
Published: (2025)
by: You, Zhiyuan, et al.
Published: (2025)
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
by: Wang, Peihao, et al.
Published: (2023)
by: Wang, Peihao, et al.
Published: (2023)
VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation
by: Huang, Peng, et al.
Published: (2025)
by: Huang, Peng, et al.
Published: (2025)
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
by: Zhang, Hui, et al.
Published: (2024)
by: Zhang, Hui, et al.
Published: (2024)
When Vision Speaks for Sound
by: Wen, Xiaofei, et al.
Published: (2026)
by: Wen, Xiaofei, et al.
Published: (2026)
PRIME: Prioritizing Interpretability in Failure Mode Extraction
by: Rezaei, Keivan, et al.
Published: (2023)
by: Rezaei, Keivan, et al.
Published: (2023)
Similar Items
-
Rectify the Regression Bias in Long-Tailed Object Detection
by: Zhu, Ke, et al.
Published: (2024) -
Disentangled Concepts Speak Louder Than Words: Explainable Video Action Recognition
by: Lee, Jongseo, et al.
Published: (2025) -
Quantization without Tears
by: Fu, Minghao, et al.
Published: (2024) -
DTL: Disentangled Transfer Learning for Visual Recognition
by: Fu, Minghao, et al.
Published: (2023) -
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
by: Tang, Ningyuan, et al.
Published: (2024)