Saved in:
| Main Authors: | Li, Han, Han, Hu, Zhou, S. Kevin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.18741 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)
by: Kim, Yearim, et al.
Published: (2024)
by: Kim, Yearim, et al.
Published: (2024)
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
by: Zhang, Xinsong, et al.
Published: (2025)
by: Zhang, Xinsong, et al.
Published: (2025)
Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024)
by: Astolfi, Pietro, et al.
Published: (2024)
Near, far: Patch-ordering enhances vision foundation models' scene understanding
by: Pariza, Valentinos, et al.
Published: (2024)
by: Pariza, Valentinos, et al.
Published: (2024)
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
by: Hsu, Chia-Yu, et al.
Published: (2024)
by: Hsu, Chia-Yu, et al.
Published: (2024)
LGMSNet: Thinning a medical image segmentation model via dual-level multiscale fusion
by: Dong, Chengqi, et al.
Published: (2025)
by: Dong, Chengqi, et al.
Published: (2025)
How to select slices for annotation to train best-performing deep learning segmentation models for cross-sectional medical images?
by: Zhang, Yixin, et al.
Published: (2024)
by: Zhang, Yixin, et al.
Published: (2024)
Toward explainable AI approaches for breast imaging: adapting foundation models to diverse populations
by: Cavalcante, Guilherme J., et al.
Published: (2025)
by: Cavalcante, Guilherme J., et al.
Published: (2025)
VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image
by: Hsiao, Teng-Fang, et al.
Published: (2026)
by: Hsiao, Teng-Fang, et al.
Published: (2026)
Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images
by: Di Via, Roberto, et al.
Published: (2024)
by: Di Via, Roberto, et al.
Published: (2024)
DepthSeg: Depth prompting in remote sensing semantic segmentation
by: Zhou, Ning, et al.
Published: (2025)
by: Zhou, Ning, et al.
Published: (2025)
Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image Classification
by: Liu, Chun, et al.
Published: (2024)
by: Liu, Chun, et al.
Published: (2024)
MARIO: A Mixed Annotation Framework For Polyp Segmentation
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
by: Xu, Yingxue, et al.
Published: (2024)
by: Xu, Yingxue, et al.
Published: (2024)
Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model
by: Kong, Fei
Published: (2025)
by: Kong, Fei
Published: (2025)
Adaptive Channel Allocation for Robust Differentiable Architecture Search
by: Li, Chao, et al.
Published: (2022)
by: Li, Chao, et al.
Published: (2022)
ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval
by: Zhao, Ruixiang, et al.
Published: (2024)
by: Zhao, Ruixiang, et al.
Published: (2024)
MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification
by: Wang, Chunjiang, et al.
Published: (2025)
by: Wang, Chunjiang, et al.
Published: (2025)
Slight Corruption in Pre-training Data Makes Better Diffusion Models
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
by: Hao, Zhiwei, et al.
Published: (2025)
by: Hao, Zhiwei, et al.
Published: (2025)
VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment
by: Jia, Ziheng, et al.
Published: (2025)
by: Jia, Ziheng, et al.
Published: (2025)
MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm
by: Fan, Xiao, et al.
Published: (2025)
by: Fan, Xiao, et al.
Published: (2025)
Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition
by: Han, Sangyu, et al.
Published: (2024)
by: Han, Sangyu, et al.
Published: (2024)
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
by: Chen, Haonan, et al.
Published: (2025)
by: Chen, Haonan, et al.
Published: (2025)
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2025)
by: Han, Boyu, et al.
Published: (2025)
NutrifyAI: An AI-Powered System for Real-Time Food Detection, Nutritional Analysis, and Personalized Meal Recommendations
by: Han, Michelle, et al.
Published: (2024)
by: Han, Michelle, et al.
Published: (2024)
StyleAutoEncoder for manipulating image attributes using pre-trained StyleGAN
by: Bedychaj, Andrzej, et al.
Published: (2024)
by: Bedychaj, Andrzej, et al.
Published: (2024)
Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection
by: Yu, Yi, et al.
Published: (2025)
by: Yu, Yi, et al.
Published: (2025)
Edge Approximation Text Detector
by: Yang, Chuang, et al.
Published: (2025)
by: Yang, Chuang, et al.
Published: (2025)
BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
by: Huang, Cheng, et al.
Published: (2025)
by: Huang, Cheng, et al.
Published: (2025)
Information transmission: Inferring change area from change moment in time series remote sensing images
by: Li, Jialu, et al.
Published: (2025)
by: Li, Jialu, et al.
Published: (2025)
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
by: Wang, Jiaze, et al.
Published: (2024)
by: Wang, Jiaze, et al.
Published: (2024)
Generating Transferrable Adversarial Examples via Local Mixing and Logits Optimization for Remote Sensing Object Recognition
by: Liu, Chun, et al.
Published: (2025)
by: Liu, Chun, et al.
Published: (2025)
Precise localization of corneal reflections in eye images using deep learning trained on synthetic data
by: Byrne, Sean Anthony, et al.
Published: (2023)
by: Byrne, Sean Anthony, et al.
Published: (2023)
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
by: Hao, Shaozhe, et al.
Published: (2024)
by: Hao, Shaozhe, et al.
Published: (2024)
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
by: Pang, Yuqi, et al.
Published: (2025)
by: Pang, Yuqi, et al.
Published: (2025)
Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling
by: Tang, Long, et al.
Published: (2025)
by: Tang, Long, et al.
Published: (2025)
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
by: Li, Hengzhuang, et al.
Published: (2025)
by: Li, Hengzhuang, et al.
Published: (2025)
RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
by: Nie, Yunshuang, et al.
Published: (2026)
by: Nie, Yunshuang, et al.
Published: (2026)
Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation
by: Zhan, Zijun, et al.
Published: (2024)
by: Zhan, Zijun, et al.
Published: (2024)
Similar Items
-
Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG)
by: Kim, Yearim, et al.
Published: (2024) -
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
by: Zhang, Xinsong, et al.
Published: (2025) -
Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024) -
Near, far: Patch-ordering enhances vision foundation models' scene understanding
by: Pariza, Valentinos, et al.
Published: (2024) -
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
by: Hsu, Chia-Yu, et al.
Published: (2024)