Saved in:
| Main Authors: | Duan, Jiawei, Hu, Haibo, Ye, Qingqing, Sun, Xinyue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.05618 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis
by: Liu, Qinghua, et al.
Published: (2025)
by: Liu, Qinghua, et al.
Published: (2025)
Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025)
by: He, Yuxuan, et al.
Published: (2025)
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
by: Xu, Qinwu
Published: (2026)
by: Xu, Qinwu
Published: (2026)
CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation
by: Eslami, Mohammad, et al.
Published: (2026)
by: Eslami, Mohammad, et al.
Published: (2026)
Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)
by: Lu, Shiyin, et al.
Published: (2025)
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
by: Celona, Luigi, et al.
Published: (2023)
by: Celona, Luigi, et al.
Published: (2023)
NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition
by: Yao, Junguang, et al.
Published: (2026)
by: Yao, Junguang, et al.
Published: (2026)
Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning
by: Lange, Lucas, et al.
Published: (2024)
by: Lange, Lucas, et al.
Published: (2024)
Physics-Guided Abnormal Trajectory Gap Detection
by: Sharma, Arun, et al.
Published: (2024)
by: Sharma, Arun, et al.
Published: (2024)
Unveiling the Pitfalls of Knowledge Editing for Large Language Models
by: Li, Zhoubo, et al.
Published: (2023)
by: Li, Zhoubo, et al.
Published: (2023)
Improving Diagnostic Performance on Small and Imbalanced Datasets Using Class-Based Input Image Composition
by: Azzeddine, Hlali, et al.
Published: (2025)
by: Azzeddine, Hlali, et al.
Published: (2025)
SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
by: Yu, Wenhan, et al.
Published: (2025)
by: Yu, Wenhan, et al.
Published: (2025)
OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models
by: Lin, Ling, et al.
Published: (2026)
by: Lin, Ling, et al.
Published: (2026)
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations
by: Garrucho, Lidia, et al.
Published: (2024)
by: Garrucho, Lidia, et al.
Published: (2024)
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
by: Dai, Josef, et al.
Published: (2024)
by: Dai, Josef, et al.
Published: (2024)
3D Primitives are a Spatial Language for VLMs
by: Liu, Junze, et al.
Published: (2026)
by: Liu, Junze, et al.
Published: (2026)
Probabilistic Kernel Function for Fast Angle Testing
by: Lu, Kejing, et al.
Published: (2025)
by: Lu, Kejing, et al.
Published: (2025)
Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search
by: Lu, Kejing, et al.
Published: (2024)
by: Lu, Kejing, et al.
Published: (2024)
MotionCFG: Boosting Motion Dynamics via Stochastic Concept Perturbation
by: Kim, Byungjun, et al.
Published: (2026)
by: Kim, Byungjun, et al.
Published: (2026)
Divide, Weight, and Route: Difficulty-Aware Optimization with Dynamic Expert Fusion for Long-tailed Recognition
by: Wei, Xiaolei, et al.
Published: (2025)
by: Wei, Xiaolei, et al.
Published: (2025)
Ovis-U1 Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)
by: Wang, Guo-Hua, et al.
Published: (2025)
HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction
by: Yuan, Ruicheng, et al.
Published: (2026)
by: Yuan, Ruicheng, et al.
Published: (2026)
Geometric 4D Stitching for Grounded 4D Generation
by: Park, Sunwoo, et al.
Published: (2026)
by: Park, Sunwoo, et al.
Published: (2026)
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report
by: Lab, Shanghai AI, et al.
Published: (2025)
by: Lab, Shanghai AI, et al.
Published: (2025)
UI-Venus-1.5 Technical Report
by: Venus Team, et al.
Published: (2026)
by: Venus Team, et al.
Published: (2026)
On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression
by: Zhang, Xinwei, et al.
Published: (2026)
by: Zhang, Xinwei, et al.
Published: (2026)
Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models
by: Ahn, Donghoon, et al.
Published: (2025)
by: Ahn, Donghoon, et al.
Published: (2025)
Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
by: Guan, Jiwei, et al.
Published: (2026)
by: Guan, Jiwei, et al.
Published: (2026)
On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
by: Huang, Lianming, et al.
Published: (2025)
by: Huang, Lianming, et al.
Published: (2025)
HunyuanOCR Technical Report
by: Hunyuan Vision Team, et al.
Published: (2025)
by: Hunyuan Vision Team, et al.
Published: (2025)
Seed1.5-VL Technical Report
by: Guo, Dong, et al.
Published: (2025)
by: Guo, Dong, et al.
Published: (2025)
Qwen3-VL Technical Report
by: Bai, Shuai, et al.
Published: (2025)
by: Bai, Shuai, et al.
Published: (2025)
Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
by: Chen, Ruoxi, et al.
Published: (2021)
by: Chen, Ruoxi, et al.
Published: (2021)
DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution
by: Wu, Rongyuan, et al.
Published: (2025)
by: Wu, Rongyuan, et al.
Published: (2025)
SODIUM: From Open Web Data to Queryable Databases
by: Hu, Chuxuan, et al.
Published: (2026)
by: Hu, Chuxuan, et al.
Published: (2026)
CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI
by: Wang, Zi, et al.
Published: (2024)
by: Wang, Zi, et al.
Published: (2024)
Perturbing the Gradient for Alleviating Meta Overfitting
by: Gogoi, Manas, et al.
Published: (2024)
by: Gogoi, Manas, et al.
Published: (2024)
H2OVL-Mississippi Vision Language Models Technical Report
by: Galib, Shaikat, et al.
Published: (2024)
by: Galib, Shaikat, et al.
Published: (2024)
Ovis-Image Technical Report
by: Wang, Guo-Hua, et al.
Published: (2025)
by: Wang, Guo-Hua, et al.
Published: (2025)
GR-3 Technical Report
by: Cheang, Chilam, et al.
Published: (2025)
by: Cheang, Chilam, et al.
Published: (2025)
Similar Items
-
MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time-Series Analysis
by: Liu, Qinghua, et al.
Published: (2025) -
Technical Report: Quantifying and Analyzing the Generalization Power of a DNN
by: He, Yuxuan, et al.
Published: (2025) -
Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift
by: Xu, Qinwu
Published: (2026) -
CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation
by: Eslami, Mohammad, et al.
Published: (2026) -
Ovis2.5 Technical Report
by: Lu, Shiyin, et al.
Published: (2025)