Saved in:
| Main Authors: | Chowdhury, Sanjoy, Yang, Karren D., Liu, Xudong, Faghri, Fartash, Vasu, Pavan Kumar Anasosalu, Tuzel, Oncel, Manocha, Dinesh, Li, Chun-Liang, Vemulapalli, Raviteja |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.16250 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023)
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
by: Wang, Haoxiang, et al.
Published: (2023)
by: Wang, Haoxiang, et al.
Published: (2023)
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
MobileCLIP2: Improving Multi-Modal Reinforced Training
by: Faghri, Fartash, et al.
Published: (2025)
by: Faghri, Fartash, et al.
Published: (2025)
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
by: Vemulapalli, Raviteja, et al.
Published: (2023)
by: Vemulapalli, Raviteja, et al.
Published: (2023)
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
by: Echterhoff, Jessica, et al.
Published: (2024)
by: Echterhoff, Jessica, et al.
Published: (2024)
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2026)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2026)
TiC-CLIP: Continual Training of CLIP Models
by: Garg, Saurabh, et al.
Published: (2023)
by: Garg, Saurabh, et al.
Published: (2023)
Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
by: Huang, Chen, et al.
Published: (2025)
by: Huang, Chen, et al.
Published: (2025)
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
by: Li, Jeffrey, et al.
Published: (2025)
by: Li, Jeffrey, et al.
Published: (2025)
FastVLM: Efficient Vision Encoding for Vision Language Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
by: Pouransari, Hadi, et al.
Published: (2024)
by: Pouransari, Hadi, et al.
Published: (2024)
GAMEOPT+: Improving Fuel Efficiency in Unregulated Heterogeneous Traffic Intersections via Optimal Multi-agent Cooperative Control
by: Suriyarachchi, Nilesh, et al.
Published: (2024)
by: Suriyarachchi, Nilesh, et al.
Published: (2024)
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces
by: Chowdhury, Sanjoy, et al.
Published: (2026)
by: Chowdhury, Sanjoy, et al.
Published: (2026)
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
by: Mehta, Sachin, et al.
Published: (2024)
by: Mehta, Sachin, et al.
Published: (2024)
Learning from Self Critique and Refinement for Faithful LLM Summarization
by: Hu, Ting-Yao, et al.
Published: (2025)
by: Hu, Ting-Yao, et al.
Published: (2025)
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by: Hsieh, Yu-Guan, et al.
Published: (2024)
by: Hsieh, Yu-Guan, et al.
Published: (2024)
AgentWebBench: Benchmarking Multi-Agent Coordination in Agentic Web
by: Zhong, Shanshan, et al.
Published: (2026)
by: Zhong, Shanshan, et al.
Published: (2026)
Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning
by: Yu, Peihong, et al.
Published: (2024)
by: Yu, Peihong, et al.
Published: (2024)
ClawMobile: Rethinking Smartphone-Native Agentic Systems
by: Du, Hongchao, et al.
Published: (2026)
by: Du, Hongchao, et al.
Published: (2026)
CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale
by: Hyun, Jonathan, et al.
Published: (2025)
by: Hyun, Jonathan, et al.
Published: (2025)
Agentization of Digital Assets for the Agentic Web: Concepts, Techniques, and Benchmark
by: Chen, Linyao, et al.
Published: (2026)
by: Chen, Linyao, et al.
Published: (2026)
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems
by: Moshkovich, Dany, et al.
Published: (2025)
by: Moshkovich, Dany, et al.
Published: (2025)
Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis
by: Dorbala, Vishnu Sashank, et al.
Published: (2024)
by: Dorbala, Vishnu Sashank, et al.
Published: (2024)
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
by: Chowdhury, Sanjoy, et al.
Published: (2024)
by: Chowdhury, Sanjoy, et al.
Published: (2024)
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
by: Chowdhury, Sanjoy, et al.
Published: (2025)
by: Chowdhury, Sanjoy, et al.
Published: (2025)
LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings
by: Jolicoeur-Martineau, Alexia
Published: (2025)
by: Jolicoeur-Martineau, Alexia
Published: (2025)
ATOD: An Evaluation Framework and Benchmark for Agentic Task-Oriented Dialogue Systems
by: Zhang, Yifei, et al.
Published: (2026)
by: Zhang, Yifei, et al.
Published: (2026)
Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations
by: Han, Xudong, et al.
Published: (2025)
by: Han, Xudong, et al.
Published: (2025)
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
by: Feng, Yukang, et al.
Published: (2026)
by: Feng, Yukang, et al.
Published: (2026)
Benchmarking Agentic Workflow Generation
by: Qiao, Shuofei, et al.
Published: (2024)
by: Qiao, Shuofei, et al.
Published: (2024)
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
by: Chowdhury, Sanjoy, et al.
Published: (2025)
by: Chowdhury, Sanjoy, et al.
Published: (2025)
Multi-Agentic Approach for History Matching of Oil Reservoirs
by: Samigullin, Linar, et al.
Published: (2026)
by: Samigullin, Linar, et al.
Published: (2026)
RobustFlow: Towards Robust Agentic Workflow Generation
by: Xu, Shengxiang, et al.
Published: (2025)
by: Xu, Shengxiang, et al.
Published: (2025)
CARES: Collaborative Agentic Reasoning for Error Detection in Surgery
by: Low, Chang Han, et al.
Published: (2025)
by: Low, Chang Han, et al.
Published: (2025)
Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective
by: Raj, Ritik, et al.
Published: (2025)
by: Raj, Ritik, et al.
Published: (2025)
SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models
by: Saha, Dipayan, et al.
Published: (2025)
by: Saha, Dipayan, et al.
Published: (2025)
Agentic SPARQL: Evaluating SPARQL-MCP-powered Intelligent Agents on the Federated KGQA Benchmark
by: Dobriy, Daniel, et al.
Published: (2026)
by: Dobriy, Daniel, et al.
Published: (2026)
Similar Items
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023) -
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024) -
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
by: Wang, Haoxiang, et al.
Published: (2023) -
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025) -
MobileCLIP2: Improving Multi-Modal Reinforced Training
by: Faghri, Fartash, et al.
Published: (2025)