:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Han, Han, Hu, Zhou, S. Kevin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.18741
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)
by: Kim, Yearim, et al.
Published: (2024)

Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
by: Zhang, Xinsong, et al.
Published: (2025)

Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024)

Near, far: Patch-ordering enhances vision foundation models' scene understanding
by: Pariza, Valentinos, et al.
Published: (2024)

Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
by: Hsu, Chia-Yu, et al.
Published: (2024)

LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion
by: Dong, Chengqi, et al.
Published: (2025)

How to select slices for annotation to train best-performing deep learning segmentation models for cross-sectional medical images?
by: Zhang, Yixin, et al.
Published: (2024)

Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations
by: Cavalcante, Guilherme J., et al.
Published: (2025)

VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image
by: Hsiao, Teng-Fang, et al.
Published: (2026)

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images
by: Di Via, Roberto, et al.
Published: (2024)

DepthSeg: Depth prompting in remote sensing semantic segmentation
by: Zhou, Ning, et al.
Published: (2025)

Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification
by: Liu, Chun, et al.
Published: (2024)

MARIO: A Mixed Annotation Framework For Polyp Segmentation
by: Li, Haoyang, et al.
Published: (2025)

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
by: Xu, Yingxue, et al.
Published: (2024)

Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model
by: Kong, Fei
Published: (2025)

Adaptive Channel Allocation for Robust Differentiable Architecture Search
by: Li, Chao, et al.
Published: (2022)

ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
by: Zhao, Ruixiang, et al.
Published: (2024)

MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification
by: Wang, Chunjiang, et al.
Published: (2025)

Slight Corruption in Pre-training Data Makes Better Diffusion Models
by: Chen, Hao, et al.
Published: (2024)

ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
by: Hao, Zhiwei, et al.
Published: (2025)

VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment
by: Jia, Ziheng, et al.
Published: (2025)

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm
by: Fan, Xiao, et al.
Published: (2025)

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition
by: Han, Sangyu, et al.
Published: (2024)

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
by: Chen, Haonan, et al.
Published: (2025)

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2025)

NutrifyAI: An AI-Powered System for Real-Time Food Detection, Nutritional Analysis, and Personalized Meal Recommendations
by: Han, Michelle, et al.
Published: (2024)

StyleAutoEncoder for manipulating image attributes using pre-trained StyleGAN
by: Bedychaj, Andrzej, et al.
Published: (2024)

Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection
by: Yu, Yi, et al.
Published: (2025)

Edge Approximation Text Detector
by: Yang, Chuang, et al.
Published: (2025)

BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
by: Huang, Cheng, et al.
Published: (2025)

Information transmission: Inferring change area from change moment in time series remote sensing images
by: Li, Jialu, et al.
Published: (2025)

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
by: Wang, Jiaze, et al.
Published: (2024)

Generating Transferrable Adversarial Examples via Local Mixing and Logits Optimization for Remote Sensing Object Recognition
by: Liu, Chun, et al.
Published: (2025)

Precise localization of corneal reflections in eye images using deep learning trained on synthetic data
by: Byrne, Sean Anthony, et al.
Published: (2023)

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
by: Hao, Shaozhe, et al.
Published: (2024)

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
by: Pang, Yuqi, et al.
Published: (2025)

Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling
by: Tang, Long, et al.
Published: (2025)

Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)

RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
by: Nie, Yunshuang, et al.
Published: (2026)

Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation
by: Zhan, Zijun, et al.
Published: (2024)