:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Duan, Jiawei, Hu, Haibo, Ye, Qingqing, Sun, Xinyue
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition Databases
Online Access:	https://arxiv.org/abs/2504.05618
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis
by: Liu, Qinghua, et al.
Published: (2025)

Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025)

Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
by: Xu, Qinwu
Published: (2026)

CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation
by: Eslami, Mohammad, et al.
Published: (2026)

Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
by: Celona, Luigi, et al.
Published: (2023)

NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition
by: Yao, Junguang, et al.
Published: (2026)

Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning
by: Lange, Lucas, et al.
Published: (2024)

Physics-Guided Abnormal Trajectory Gap Detection
by: Sharma, Arun, et al.
Published: (2024)

Unveiling the Pitfalls of Knowledge Editing for Large Language Models
by: Li, Zhoubo, et al.
Published: (2023)

Improving Diagnostic Performance on Small and Imbalanced Datasets Using Class-Based Input Image Composition
by: Azzeddine, Hlali, et al.
Published: (2025)

SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
by: Yu, Wenhan, et al.
Published: (2025)

OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models
by: Lin, Ling, et al.
Published: (2026)

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations
by: Garrucho, Lidia, et al.
Published: (2024)

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
by: Dai, Josef, et al.
Published: (2024)

3D Primitives are a Spatial Language for VLMs
by: Liu, Junze, et al.
Published: (2026)

Probabilistic Kernel Function for Fast Angle Testing
by: Lu, Kejing, et al.
Published: (2025)

Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search
by: Lu, Kejing, et al.
Published: (2024)

MotionCFG: Boosting Motion Dynamics via Stochastic Concept Perturbation
by: Kim, Byungjun, et al.
Published: (2026)

Divide, Weight, and Route: Difficulty-Aware Optimization with Dynamic Expert Fusion for Long-tailed Recognition
by: Wei, Xiaolei, et al.
Published: (2025)

Ovis-U1 Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)

HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction
by: Yuan, Ruicheng, et al.
Published: (2026)

Geometric 4D Stitching for Grounded 4D Generation
by: Park, Sunwoo, et al.
Published: (2026)

Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
by: Lab, Shanghai AI, et al.
Published: (2025)

UI-Venus-1.5 Technical Report
by: Venus Team, et al.
Published: (2026)

On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression
by: Zhang, Xinwei, et al.
Published: (2026)

Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models
by: Ahn, Donghoon, et al.
Published: (2025)

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
by: Guan, Jiwei, et al.
Published: (2026)

On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
by: Huang, Lianming, et al.
Published: (2025)

HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)

Seed1.5-VL Technical Report
by: Guo, Dong, et al.
Published: (2025)

Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)

Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
by: Chen, Ruoxi, et al.
Published: (2021)

DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution
by: Wu, Rongyuan, et al.
Published: (2025)

SODIUM: From Open Web Data to Queryable Databases
by: Hu, Chuxuan, et al.
Published: (2026)

CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI
by: Wang, Zi, et al.
Published: (2024)

Perturbing the Gradient for Alleviating Meta Overfitting
by: Gogoi, Manas, et al.
Published: (2024)

H2OVL-Mississippi Vision Language Models Technical Report
by: Galib, Shaikat, et al.
Published: (2024)

Ovis-Image Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)

GR-3 Technical Report
by: Cheang, Chilam, et al.
Published: (2025)