:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Vasu, Pavan Kumar Anasosalu, Faghri, Fartash, Li, Chun-Liang, Koc, Cem, True, Nate, Antony, Albert, Santhanam, Gokul, Gabriel, James, Grasch, Peter, Tuzel, Oncel, Pouransari, Hadi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2412.13303
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023)

MobileCLIP2: Improving Multi-Modal Reinforced Training
by: Faghri, Fartash, et al.
Published: (2025)

VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2026)

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
by: Wang, Haoxiang, et al.
Published: (2023)

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
by: Pouransari, Hadi, et al.
Published: (2024)

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
by: Vemulapalli, Raviteja, et al.
Published: (2023)

MUSCLE: A Model Update Strategy for Compatible LLM Evolution
by: Echterhoff, Jessica, et al.
Published: (2024)

Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
by: Huang, Chen, et al.
Published: (2025)

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
by: Chowdhury, Sanjoy, et al.
Published: (2025)

TiC-CLIP: Continual Training of CLIP Models
by: Garg, Saurabh, et al.
Published: (2023)

Learning from Self Critique and Refinement for Faithful LLM Summarization
by: Hu, Ting-Yao, et al.
Published: (2025)

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
by: Li, Jeffrey, et al.
Published: (2025)

Pretraining with hierarchical memories: separating long-tail and common knowledge
by: Pouransari, Hadi, et al.
Published: (2025)

FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
by: Bajpai, Divya Jyoti, et al.
Published: (2025)

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
by: Mehta, Sachin, et al.
Published: (2024)

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
by: Hsieh, Yu-Guan, et al.
Published: (2024)

Learning to Reason for Hallucination Span Detection
by: Su, Hsuan, et al.
Published: (2025)

Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
by: Li, Jeffrey, et al.
Published: (2026)

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention
by: Wu, Yu, et al.
Published: (2026)

Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)

LiTo: Surface Light Field Tokenization
by: Chang, Jen-Hao Rick, et al.
Published: (2026)

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
by: Mirzadeh, Iman, et al.
Published: (2024)

SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
by: Chen, Pingyi, et al.
Published: (2025)

SpecVLM: Fast Speculative Decoding in Vision-Language Models
by: Huang, Haiduo, et al.
Published: (2025)

Computational Bottlenecks of Training Small-scale Large Language Models
by: Ashkboos, Saleh, et al.
Published: (2024)

Co‐Agent Assisted Peroxide Vulcanization of Halogen‐Free Flame Retardant EPDM Compounds for Cable Sheathing
by: Gürcan Gül, et al.
Published: (2025)

Local-to-Global Logical Explanations for Deep Vision Models
by: Vasu, Bhavan, et al.
Published: (2026)

Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
by: Joudaki, Amir, et al.
Published: (2025)

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
by: Samragh, Mohammad, et al.
Published: (2024)

Description of a new species of bat, Vespertilio longicrus, from Puget Sound
by: True, Frederick W.
Published: (1887)

Presentación
by: Tirza True Latimer
Published: (2013)

TrajTok: Learning Trajectory Tokens enables better Video Understanding
by: Zheng, Chenhao, et al.
Published: (2026)

Velox: Learning Representations of 4D Geometry and Appearance
by: Malik, Anagh, et al.
Published: (2026)

El uso de las redes sociales y la cultura popular para una mejor comprensión intercultural
by: Sait Tuzel
Published: (2017)

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
by: Chu, Xiangxiang, et al.
Published: (2023)

EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models
by: Seo, Minjae, et al.
Published: (2025)

A Simple and Fast $(3+\varepsilon)$-approximation for Constrained Correlation Clustering
by: Veldt, Nate
Published: (2025)

Adapting Vision-Language Models for E-commerce Understanding at Scale
by: Nulli, Matteo, et al.
Published: (2026)