:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sepehri, Mohammad Shahab, Fabian, Zalan, Soltanolkotabi, Maryam, Soltanolkotabi, Mahdi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2409.15477
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2024)

Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)

ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization
by: Gan, Haosheng, et al.
Published: (2025)

MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI
by: Arguello, Paula, et al.
Published: (2026)

Emergence and Evolution of Interpretable Concepts in Diffusion Models
by: Tinaz, Berk, et al.
Published: (2025)

DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency
by: Fabian, Zalan, et al.
Published: (2023)

ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)

HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models
by: Mushtaq, Erum, et al.
Published: (2025)

Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models
by: Fabian, Zalan, et al.
Published: (2023)

CryptoMamba: Leveraging State Space Models for Accurate Bitcoin Price Prediction
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)

Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction
by: Fridovich-Keil, Sara, et al.
Published: (2023)

Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning
by: Banayeeanzade, Amin, et al.
Published: (2024)

LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction
by: Mehradfar, Asal, et al.
Published: (2025)

Don't trust your eyes: on the (un)reliability of feature visualizations
by: Geirhos, Robert, et al.
Published: (2023)

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
by: Goel, Gautam, et al.
Published: (2026)

Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable Rewards
by: Heckel, Reinhard, et al.
Published: (2026)

Bias-constrained multimodal intelligence for equitable and reliable clinical AI
by: Li, Cheng, et al.
Published: (2026)

Do not trust what you trust: Miscalibration in Semi-supervised Learning
by: Mishra, Shambhavi, et al.
Published: (2024)

If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems
by: Chang, Jiamin, et al.
Published: (2026)

AI-assisted prostate cancer detection and localisation on biparametric MR by classifying radiologist-positives
by: Wu, Xiangcen, et al.
Published: (2024)

A multimodal vision foundation model for generalizable knee pathology
by: Yu, Kang, et al.
Published: (2026)

Are foundation models efficient for medical image segmentation?
by: Ferreira, Danielle, et al.
Published: (2023)

Fine-tuning can cripple your foundation model; preserving features may be the solution
by: Mukhoti, Jishnu, et al.
Published: (2023)

Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scale
by: Tulbure, Mirela G., et al.
Published: (2025)

MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output
by: Chen, Yanyuan, et al.
Published: (2025)

ENSAM: an efficient foundation model for interactive segmentation of 3D medical images
by: Stenhede, Elias, et al.
Published: (2025)

FoMo4Wheat: Toward reliable crop vision foundation models with globally curated data
by: Han, Bing, et al.
Published: (2025)

Stability properties of gradient flow dynamics for the symmetric low-rank matrix factorization problem
by: Mohammadi, Hesameddin, et al.
Published: (2024)

MiSuRe is all you need to explain your image segmentation
by: Hasany, Syed Nouman, et al.
Published: (2024)

GlitchBench: Can large multimodal models detect video game glitches?
by: Taesiri, Mohammad Reza, et al.
Published: (2023)

MediAug: Exploring Visual Augmentation in Medical Imaging
by: Qi, Xuyin, et al.
Published: (2025)

The Rich and the Simple: On the Implicit Bias of Adam and SGD
by: Vasudeva, Bhavya, et al.
Published: (2025)

Learning to Recall with Transformers Beyond Orthogonal Embeddings
by: Vural, Nuri Mert, et al.
Published: (2026)

PAST: A multimodal single-cell foundation model for histopathology and spatial transcriptomics in cancer
by: Yang, Changchun, et al.
Published: (2025)

MedDINOv3: How to adapt vision foundation models for medical image segmentation?
by: Li, Yuheng, et al.
Published: (2025)

Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
by: Ballout, Mohamad, et al.
Published: (2025)

PB-IAD: Utilizing multimodal foundation models for semantic industrial anomaly detection in dynamic manufacturing environments
by: Hofmann, Bernd, et al.
Published: (2025)

Closing the gap in multimodal medical representation alignment
by: Grassucci, Eleonora, et al.
Published: (2026)

Visual concept ranking uncovers medical shortcuts used by large multimodal models
by: Janizek, Joseph D., et al.
Published: (2026)

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
by: Ye, Qilang, et al.
Published: (2025)