Saved in:
| Main Authors: | Brien, Darrin O', Gajulapalli, Dhikshith, Xia, Eric |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.14880 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
by: Bini, Massimo, et al.
Published: (2024)
by: Bini, Massimo, et al.
Published: (2024)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
Orthogonal Finetuning Made Scalable
by: Qiu, Zeju, et al.
Published: (2025)
by: Qiu, Zeju, et al.
Published: (2025)
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
by: Liu, Weiyang, et al.
Published: (2023)
by: Liu, Weiyang, et al.
Published: (2023)
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
by: Maharana, Adyasha, et al.
Published: (2023)
by: Maharana, Adyasha, et al.
Published: (2023)
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling
by: Teo, Rachel S. Y., et al.
Published: (2025)
by: Teo, Rachel S. Y., et al.
Published: (2025)
Linear Alignment of Vision-language Models for Image Captioning
by: Paischer, Fabian, et al.
Published: (2023)
by: Paischer, Fabian, et al.
Published: (2023)
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
by: Ebrahimi, Sayna, et al.
Published: (2024)
by: Ebrahimi, Sayna, et al.
Published: (2024)
VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks
by: Dhakal, Manish, et al.
Published: (2024)
by: Dhakal, Manish, et al.
Published: (2024)
LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation
by: Lee, Suhyeon, et al.
Published: (2023)
by: Lee, Suhyeon, et al.
Published: (2023)
BYOM: Building Your Own Multi-Task Model For Free
by: Jiang, Weisen, et al.
Published: (2023)
by: Jiang, Weisen, et al.
Published: (2023)
Multi-Task Model Merging via Adaptive Weight Disentanglement
by: Xiong, Feng, et al.
Published: (2024)
by: Xiong, Feng, et al.
Published: (2024)
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning
by: Xu, Ziwen, et al.
Published: (2026)
by: Xu, Ziwen, et al.
Published: (2026)
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
by: Park, Jonggwon, et al.
Published: (2025)
by: Park, Jonggwon, et al.
Published: (2025)
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
by: He, Yifei, et al.
Published: (2024)
by: He, Yifei, et al.
Published: (2024)
MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks
by: Wu, Yiming, et al.
Published: (2024)
by: Wu, Yiming, et al.
Published: (2024)
X-VILA: Cross-Modality Alignment for Large Language Model
by: Ye, Hanrong, et al.
Published: (2024)
by: Ye, Hanrong, et al.
Published: (2024)
Efficient Stitchable Task Adaptation
by: He, Haoyu, et al.
Published: (2023)
by: He, Haoyu, et al.
Published: (2023)
How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey
by: Qi, Yayun, et al.
Published: (2024)
by: Qi, Yayun, et al.
Published: (2024)
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
by: Zhang, Yuhui, et al.
Published: (2024)
by: Zhang, Yuhui, et al.
Published: (2024)
Revisiting Mixout: An Overlooked Path to Robust Finetuning
by: Aminbeidokhti, Masih, et al.
Published: (2025)
by: Aminbeidokhti, Masih, et al.
Published: (2025)
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)
by: Srivastava, Varun, et al.
Published: (2025)
Mind the Gap Between Prototypes and Images in Cross-domain Finetuning
by: Tian, Hongduan, et al.
Published: (2024)
by: Tian, Hongduan, et al.
Published: (2024)
Transformer-VQ: Linear-Time Transformers via Vector Quantization
by: Lingle, Lucas D.
Published: (2023)
by: Lingle, Lucas D.
Published: (2023)
Understanding Space Is Rocket Science -- Only Top Reasoning Models Can Solve Spatial Understanding Tasks
by: Hoehing, Nils, et al.
Published: (2025)
by: Hoehing, Nils, et al.
Published: (2025)
Transferability-Guided Cross-Domain Cross-Task Transfer Learning
by: Tan, Yang, et al.
Published: (2022)
by: Tan, Yang, et al.
Published: (2022)
Neural Style Transfer for Synthesising a Dataset of Ancient Egyptian Hieroglyphs
by: Creed, Lewis Matheson
Published: (2025)
by: Creed, Lewis Matheson
Published: (2025)
GenSim: Generating Robotic Simulation Tasks via Large Language Models
by: Wang, Lirui, et al.
Published: (2023)
by: Wang, Lirui, et al.
Published: (2023)
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
by: Joshi, Siddharth, et al.
Published: (2025)
by: Joshi, Siddharth, et al.
Published: (2025)
Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning
by: Si, Chongjie, et al.
Published: (2024)
by: Si, Chongjie, et al.
Published: (2024)
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
by: Luo, Yaxin, et al.
Published: (2026)
by: Luo, Yaxin, et al.
Published: (2026)
Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)
by: Prottasha, Nusrat Jahan, et al.
Published: (2026)
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
by: Koh, Jing Yu, et al.
Published: (2024)
by: Koh, Jing Yu, et al.
Published: (2024)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
by: Girrbach, Leander, et al.
Published: (2025)
by: Girrbach, Leander, et al.
Published: (2025)
VLM Judges Can Rank but Cannot Score: Task-Dependent Uncertainty in Multimodal Evaluation
by: Kumar, Divake, et al.
Published: (2026)
by: Kumar, Divake, et al.
Published: (2026)
Crossing Language Borders: A Pipeline for Indonesian Manhwa Translation
by: Narasimhan, Nithyasri, et al.
Published: (2025)
by: Narasimhan, Nithyasri, et al.
Published: (2025)
Cross-modal Causal Relation Alignment for Video Question Grounding
by: Chen, Weixing, et al.
Published: (2025)
by: Chen, Weixing, et al.
Published: (2025)
Text-to-Image Cross-Modal Generation: A Systematic Review
by: Żelaszczyk, Maciej, et al.
Published: (2024)
by: Żelaszczyk, Maciej, et al.
Published: (2024)
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
by: Xia, Peng, et al.
Published: (2024)
by: Xia, Peng, et al.
Published: (2024)
The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution
by: Zhang, Erjian, et al.
Published: (2026)
by: Zhang, Erjian, et al.
Published: (2026)
Similar Items
-
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections
by: Bini, Massimo, et al.
Published: (2024) -
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024) -
Orthogonal Finetuning Made Scalable
by: Qiu, Zeju, et al.
Published: (2025) -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
by: Liu, Weiyang, et al.
Published: (2023) -
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
by: Maharana, Adyasha, et al.
Published: (2023)