:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bi, Hanbo, Yuan, Zhiqiang, Jia, Zexi, Zhang, Jiapei, Li, Chongyang, Luo, Peixiang, Deng, Ying, Duan, Xiaoyue, Zhang, Jinchao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.17714
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
by: Li, Chongyang, et al.
Published: (2025)

RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2025)

WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
by: Yuan, Zhiqiang, et al.
Published: (2024)

VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2024)

CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection
by: Jia, Zexi, et al.
Published: (2026)

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
by: Jia, Zexi, et al.
Published: (2025)

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection
by: Jia, Zexi, et al.
Published: (2025)

Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
by: Hu, Nanxing, et al.
Published: (2025)

RVLM: Recursive Vision-Language Models with Adaptive Depth
by: Mayumu, Nicanor, et al.
Published: (2026)

StyleDecoupler: Generalizable Artistic Style Disentanglement
by: Jia, Zexi, et al.
Published: (2026)

Bayesian FFT Modal Identification for Multi-setup Experimental Modal Analysis
by: Wang, Peixiang, et al.
Published: (2024)

Semantic to Structure: Learning Structural Representations for Infringement Detection
by: Huang, Chuanwei, et al.
Published: (2025)

Control-CLIP: Decoupling Category and Style Guidance in CLIP for Specific-Domain Generation
by: Jia, Zexi, et al.
Published: (2025)

Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection
by: Cao, Weihao, et al.
Published: (2026)

Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling
by: Chen, Jiatao, et al.
Published: (2026)

Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
by: Hua, Kai, et al.
Published: (2025)

Evaluating Generative Models via One-Dimensional Code Distributions
by: Jia, Zexi, et al.
Published: (2026)

Manifold-Optimal Guidance: A Unified Riemannian Control View of Diffusion Guidance
by: Jia, Zexi, et al.
Published: (2026)

Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
by: Fei, Hongyan, et al.
Published: (2026)

A Synthetic-to-Real Dehazing Method based on Domain Unification
by: Yuan, Zhiqiang, et al.
Published: (2025)

Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
by: Lu, Junyu, et al.
Published: (2023)

Accelerating Gaussian beam tracing method with dynamic parallelism on graphics processing units
by: Sheng, Zhang, et al.
Published: (2025)

Auditing Stealth Sycophancy in Mental-Health Dialogue: Structured Clinical-State Diagnostics and Clean Matched Benchmarks
by: Han, Tianze, et al.
Published: (2026)

Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems
by: Zhang, Yuxin, et al.
Published: (2025)

Language-driven Fine-grained Retrieval
by: Wang, Shijie, et al.
Published: (2025)

DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
by: Wang, Minzheng, et al.
Published: (2024)

ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
by: Zhang, Ting, et al.
Published: (2024)

Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
by: Yang, Yiran, et al.
Published: (2024)

Fine-grained Image Retrieval via Dual-Vision Adaptation
by: Jiang, Xin, et al.
Published: (2025)

Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination
by: Chen, Xiangyan, et al.
Published: (2026)

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples
by: Chen, Tao, et al.
Published: (2023)

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification
by: Chen, Xiangyan, et al.
Published: (2025)

Modal Fragments
by: Bezhanishvili, Nick, et al.
Published: (2026)

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
by: Geng, Tiantian, et al.
Published: (2024)

Normalizing Batch Normalization for Long-Tailed Recognition
by: Bao, Yuxiang, et al.
Published: (2025)

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
by: Ru, Dongyu, et al.
Published: (2024)

Cross-Modal Adapter for Vision-Language Retrieval
by: Jiang, Haojun, et al.
Published: (2022)

AdaRD-key: Adaptive Relevance-Diversity Keyframe Sampling for Long-form Video understanding
by: Zhang, Xian, et al.
Published: (2025)

Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
by: Ma, Zehong, et al.
Published: (2025)

From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
by: Gong, Xuan, et al.
Published: (2025)