Saved in:
| Main Authors: | Schmidt, Carlos, Reiß, Simon |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.06748 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Conquering the Retina: Bringing Visual in-Context Learning to OCT
by: Negrini, Alessio, et al.
Published: (2025)
by: Negrini, Alessio, et al.
Published: (2025)
Is Visual in-Context Learning for Compositional Medical Tasks within Reach?
by: Reiß, Simon, et al.
Published: (2025)
by: Reiß, Simon, et al.
Published: (2025)
Probing Intrinsic Medical Task Relationships: A Contrastive Learning Perspective
by: Muth, Jonas, et al.
Published: (2026)
by: Muth, Jonas, et al.
Published: (2026)
GazeGen: Gaze-Driven User Interaction for Visual Content Generation
by: Hsieh, He-Yen, et al.
Published: (2024)
by: Hsieh, He-Yen, et al.
Published: (2024)
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
by: Bai, Yang, et al.
Published: (2024)
by: Bai, Yang, et al.
Published: (2024)
From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos
by: Chen, Yin, et al.
Published: (2023)
by: Chen, Yin, et al.
Published: (2023)
Generative Multimodal Models are In-Context Learners
by: Sun, Quan, et al.
Published: (2023)
by: Sun, Quan, et al.
Published: (2023)
Video Diffusion Transformers are In-Context Learners
by: Fei, Zhengcong, et al.
Published: (2024)
by: Fei, Zhengcong, et al.
Published: (2024)
From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users
by: Tariq, Shahroz, et al.
Published: (2025)
by: Tariq, Shahroz, et al.
Published: (2025)
From CNN to CNN + RNN: Adapting Visualization Techniques for Time-Series Anomaly Detection
by: Poirier, Fabien
Published: (2024)
by: Poirier, Fabien
Published: (2024)
CaMML: Context-Aware Multimodal Learner for Large Models
by: Chen, Yixin, et al.
Published: (2024)
by: Chen, Yixin, et al.
Published: (2024)
DINO-Tok: Adapting DINO for Visual Tokenizers
by: Jia, Mingkai, et al.
Published: (2025)
by: Jia, Mingkai, et al.
Published: (2025)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
by: Zamfir, Eduard, et al.
Published: (2024)
by: Zamfir, Eduard, et al.
Published: (2024)
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
by: Zong, Zhuofan, et al.
Published: (2024)
by: Zong, Zhuofan, et al.
Published: (2024)
Rejuvenating image-GPT as Strong Visual Representation Learners
by: Ren, Sucheng, et al.
Published: (2023)
by: Ren, Sucheng, et al.
Published: (2023)
Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks
by: Jaus, Alexander, et al.
Published: (2024)
by: Jaus, Alexander, et al.
Published: (2024)
T2T-VICL: Unlocking the Boundaries of Cross-Task Visual In-Context Learning via Implicit Text-Driven VLMs
by: Xia, Shao-Jun, et al.
Published: (2025)
by: Xia, Shao-Jun, et al.
Published: (2025)
Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning
by: Lu, Yiting, et al.
Published: (2025)
by: Lu, Yiting, et al.
Published: (2025)
From Snapshots to Symphonies: The Evolution of Protein Prediction from Static Structures to Generative Dynamics and Multimodal Interactions
by: Chen, Jingzhi, et al.
Published: (2026)
by: Chen, Jingzhi, et al.
Published: (2026)
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
by: Gao, Bin-Bin, et al.
Published: (2025)
by: Gao, Bin-Bin, et al.
Published: (2025)
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
by: Wang, Xuan, et al.
Published: (2023)
by: Wang, Xuan, et al.
Published: (2023)
VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments
by: Yang, Bufang, et al.
Published: (2024)
by: Yang, Bufang, et al.
Published: (2024)
VRSO: Visual-Centric Reconstruction for Static Object Annotation
by: Yu, Chenyao, et al.
Published: (2024)
by: Yu, Chenyao, et al.
Published: (2024)
Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model
by: Ueno, Shiryu, et al.
Published: (2025)
by: Ueno, Shiryu, et al.
Published: (2025)
Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers
by: Li, Mengke, et al.
Published: (2024)
by: Li, Mengke, et al.
Published: (2024)
Exploring Task-Level Optimal Prompts for Visual In-Context Learning
by: Zhu, Yan, et al.
Published: (2025)
by: Zhu, Yan, et al.
Published: (2025)
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
by: Cai, Kaiwen, et al.
Published: (2024)
by: Cai, Kaiwen, et al.
Published: (2024)
Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task Learning
by: He, Yi, et al.
Published: (2025)
by: He, Yi, et al.
Published: (2025)
GMC: A General Framework of Multi-stage Context Learning and Utilization for Visual Detection Tasks
by: Wang, Xuan, et al.
Published: (2024)
by: Wang, Xuan, et al.
Published: (2024)
Foreign object segmentation in chest x-rays through anatomy-guided shape insertion
by: Seibold, Constantin, et al.
Published: (2025)
by: Seibold, Constantin, et al.
Published: (2025)
CLIP Brings Better Features to Visual Aesthetics Learners
by: Xu, Liwu, et al.
Published: (2023)
by: Xu, Liwu, et al.
Published: (2023)
SGDM: Static-Guided Dynamic Module Make Stronger Visual Models
by: Xing, Wenjie, et al.
Published: (2024)
by: Xing, Wenjie, et al.
Published: (2024)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
by: Huang, Chaoqin, et al.
Published: (2024)
by: Huang, Chaoqin, et al.
Published: (2024)
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2023)
by: Xiao, Linhui, et al.
Published: (2023)
FontCrafter: High-Fidelity Element-Driven Artistic Font Creation with Visual In-Context Generation
by: Luo, Wuyang, et al.
Published: (2026)
by: Luo, Wuyang, et al.
Published: (2026)
Unimodal and Multimodal Static Facial Expression Recognition for Virtual Reality Users with EmoHeVRDB
by: Ortmann, Thorben, et al.
Published: (2024)
by: Ortmann, Thorben, et al.
Published: (2024)
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)
by: Zhang, Wenqi, et al.
Published: (2025)
VICR: Visual In-Context Restoration for Real-World Image Super-Resolution
by: Zhang, Qichang, et al.
Published: (2026)
by: Zhang, Qichang, et al.
Published: (2026)
Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding
by: Barbany, Oriol, et al.
Published: (2025)
by: Barbany, Oriol, et al.
Published: (2025)
Visualizing the Invisible: Enhancing Radiologist Performance in Breast Mammography via Task-Driven Chromatic Encoding
by: Ye, Hui, et al.
Published: (2026)
by: Ye, Hui, et al.
Published: (2026)
Similar Items
-
Conquering the Retina: Bringing Visual in-Context Learning to OCT
by: Negrini, Alessio, et al.
Published: (2025) -
Is Visual in-Context Learning for Compositional Medical Tasks within Reach?
by: Reiß, Simon, et al.
Published: (2025) -
Probing Intrinsic Medical Task Relationships: A Contrastive Learning Perspective
by: Muth, Jonas, et al.
Published: (2026) -
GazeGen: Gaze-Driven User Interaction for Visual Content Generation
by: Hsieh, He-Yen, et al.
Published: (2024) -
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
by: Bai, Yang, et al.
Published: (2024)