Saved in:
| Main Authors: | Bi, Hanbo, Yuan, Zhiqiang, Jia, Zexi, Zhang, Jiapei, Li, Chongyang, Luo, Peixiang, Deng, Ying, Duan, Xiaoyue, Zhang, Jinchao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.17714 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
by: Li, Chongyang, et al.
Published: (2025)
by: Li, Chongyang, et al.
Published: (2025)
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2025)
by: Yuan, Zhiqiang, et al.
Published: (2025)
WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
by: Yuan, Zhiqiang, et al.
Published: (2024)
by: Yuan, Zhiqiang, et al.
Published: (2024)
VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2024)
by: Yuan, Zhiqiang, et al.
Published: (2024)
CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
by: Jia, Zexi, et al.
Published: (2025)
by: Jia, Zexi, et al.
Published: (2025)
From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection
by: Jia, Zexi, et al.
Published: (2025)
by: Jia, Zexi, et al.
Published: (2025)
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
by: Hu, Nanxing, et al.
Published: (2025)
by: Hu, Nanxing, et al.
Published: (2025)
RVLM: Recursive Vision-Language Models with Adaptive Depth
by: Mayumu, Nicanor, et al.
Published: (2026)
by: Mayumu, Nicanor, et al.
Published: (2026)
StyleDecoupler: Generalizable Artistic Style Disentanglement
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
Bayesian FFT Modal Identification for Multi-setup Experimental Modal Analysis
by: Wang, Peixiang, et al.
Published: (2024)
by: Wang, Peixiang, et al.
Published: (2024)
Semantic to Structure: Learning Structural Representations for Infringement Detection
by: Huang, Chuanwei, et al.
Published: (2025)
by: Huang, Chuanwei, et al.
Published: (2025)
Control-CLIP: Decoupling Category and Style Guidance in CLIP for Specific-Domain Generation
by: Jia, Zexi, et al.
Published: (2025)
by: Jia, Zexi, et al.
Published: (2025)
Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection
by: Cao, Weihao, et al.
Published: (2026)
by: Cao, Weihao, et al.
Published: (2026)
Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling
by: Chen, Jiatao, et al.
Published: (2026)
by: Chen, Jiatao, et al.
Published: (2026)
Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems
by: Hua, Kai, et al.
Published: (2025)
by: Hua, Kai, et al.
Published: (2025)
Evaluating Generative Models via One-Dimensional Code Distributions
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
Manifold-Optimal Guidance: A Unified Riemannian Control View of Diffusion Guidance
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
by: Fei, Hongyan, et al.
Published: (2026)
by: Fei, Hongyan, et al.
Published: (2026)
A Synthetic-to-Real Dehazing Method based on Domain Unification
by: Yuan, Zhiqiang, et al.
Published: (2025)
by: Yuan, Zhiqiang, et al.
Published: (2025)
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
by: Lu, Junyu, et al.
Published: (2023)
by: Lu, Junyu, et al.
Published: (2023)
Accelerating Gaussian beam tracing method with dynamic parallelism on graphics processing units
by: Sheng, Zhang, et al.
Published: (2025)
by: Sheng, Zhang, et al.
Published: (2025)
Auditing Stealth Sycophancy in Mental-Health Dialogue: Structured Clinical-State Diagnostics and Clean Matched Benchmarks
by: Han, Tianze, et al.
Published: (2026)
by: Han, Tianze, et al.
Published: (2026)
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems
by: Zhang, Yuxin, et al.
Published: (2025)
by: Zhang, Yuxin, et al.
Published: (2025)
Language-driven Fine-grained Retrieval
by: Wang, Shijie, et al.
Published: (2025)
by: Wang, Shijie, et al.
Published: (2025)
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling
by: Wang, Minzheng, et al.
Published: (2024)
by: Wang, Minzheng, et al.
Published: (2024)
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
by: Zhang, Ting, et al.
Published: (2024)
by: Zhang, Ting, et al.
Published: (2024)
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
by: Yang, Yiran, et al.
Published: (2024)
by: Yang, Yiran, et al.
Published: (2024)
Fine-grained Image Retrieval via Dual-Vision Adaptation
by: Jiang, Xin, et al.
Published: (2025)
by: Jiang, Xin, et al.
Published: (2025)
Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination
by: Chen, Xiangyan, et al.
Published: (2026)
by: Chen, Xiangyan, et al.
Published: (2026)
MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples
by: Chen, Tao, et al.
Published: (2023)
by: Chen, Tao, et al.
Published: (2023)
FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification
by: Chen, Xiangyan, et al.
Published: (2025)
by: Chen, Xiangyan, et al.
Published: (2025)
Modal Fragments
by: Bezhanishvili, Nick, et al.
Published: (2026)
by: Bezhanishvili, Nick, et al.
Published: (2026)
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
by: Geng, Tiantian, et al.
Published: (2024)
by: Geng, Tiantian, et al.
Published: (2024)
Normalizing Batch Normalization for Long-Tailed Recognition
by: Bao, Yuxiang, et al.
Published: (2025)
by: Bao, Yuxiang, et al.
Published: (2025)
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
by: Ru, Dongyu, et al.
Published: (2024)
by: Ru, Dongyu, et al.
Published: (2024)
Cross-Modal Adapter for Vision-Language Retrieval
by: Jiang, Haojun, et al.
Published: (2022)
by: Jiang, Haojun, et al.
Published: (2022)
AdaRD-key: Adaptive Relevance-Diversity Keyframe Sampling for Long-form Video understanding
by: Zhang, Xian, et al.
Published: (2025)
by: Zhang, Xian, et al.
Published: (2025)
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
by: Ma, Zehong, et al.
Published: (2025)
by: Ma, Zehong, et al.
Published: (2025)
From Parameters to Prompts: Understanding and Mitigating the Factuality Gap between Fine-Tuned LLMs
by: Gong, Xuan, et al.
Published: (2025)
by: Gong, Xuan, et al.
Published: (2025)
Similar Items
-
Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants
by: Li, Chongyang, et al.
Published: (2025) -
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2025) -
WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
by: Yuan, Zhiqiang, et al.
Published: (2024) -
VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2024) -
CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection
by: Jia, Zexi, et al.
Published: (2026)