Saved in:
| Main Authors: | Mou, Shancong, Vemulapalli, Raviteja, Li, Shiyu, Liu, Yuxuan, Thomas, C, Cao, Meng, Bai, Haoping, Tuzel, Oncel, Huang, Ping, Shan, Jiulong, Shi, Jianjun |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.18490 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023)
Learning from Self Critique and Refinement for Faithful LLM Summarization
by: Hu, Ting-Yao, et al.
Published: (2025)
by: Hu, Ting-Yao, et al.
Published: (2025)
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
by: Vemulapalli, Raviteja, et al.
Published: (2023)
by: Vemulapalli, Raviteja, et al.
Published: (2023)
Additive Tensor Decomposition Considering Structural Data Information
by: Mou, Shancong, et al.
Published: (2020)
by: Mou, Shancong, et al.
Published: (2020)
TiC-CLIP: Continual Training of CLIP Models
by: Garg, Saurabh, et al.
Published: (2023)
by: Garg, Saurabh, et al.
Published: (2023)
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
by: Echterhoff, Jessica, et al.
Published: (2024)
by: Echterhoff, Jessica, et al.
Published: (2024)
Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
by: Huang, Chen, et al.
Published: (2025)
by: Huang, Chen, et al.
Published: (2025)
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
by: Wang, Haoxiang, et al.
Published: (2023)
by: Wang, Haoxiang, et al.
Published: (2023)
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
by: Hsieh, Cheng-Yu, et al.
Published: (2025)
ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context
by: Xiu, Zidi, et al.
Published: (2026)
by: Xiu, Zidi, et al.
Published: (2026)
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
by: Chowdhury, Sanjoy, et al.
Published: (2025)
by: Chowdhury, Sanjoy, et al.
Published: (2025)
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
by: Li, Jeffrey, et al.
Published: (2025)
by: Li, Jeffrey, et al.
Published: (2025)
Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)
by: Lu, Yen-Ju, et al.
Published: (2025)
Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation
by: Liu, Aiwei, et al.
Published: (2024)
by: Liu, Aiwei, et al.
Published: (2024)
Learning to Reason for Hallucination Span Detection
by: Su, Hsuan, et al.
Published: (2025)
by: Su, Hsuan, et al.
Published: (2025)
ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
by: Zhu, Jingyuan, et al.
Published: (2024)
by: Zhu, Jingyuan, et al.
Published: (2024)
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
by: Liu, Aiwei, et al.
Published: (2024)
by: Liu, Aiwei, et al.
Published: (2024)
Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation
by: Kong, Deyi, et al.
Published: (2026)
by: Kong, Deyi, et al.
Published: (2026)
Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products
by: Liu, Jiayu, et al.
Published: (2024)
by: Liu, Jiayu, et al.
Published: (2024)
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)
LiTo: Surface Light Field Tokenization
by: Chang, Jen-Hao Rick, et al.
Published: (2026)
by: Chang, Jen-Hao Rick, et al.
Published: (2026)
Pretraining with hierarchical memories: separating long-tail and common knowledge
by: Pouransari, Hadi, et al.
Published: (2025)
by: Pouransari, Hadi, et al.
Published: (2025)
COMPASS: Benchmarking Constrained Optimization in LLM Agents
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
VeCLIP: Improving CLIP Training via Visual-enriched Captions
by: Lai, Zhengfeng, et al.
Published: (2023)
by: Lai, Zhengfeng, et al.
Published: (2023)
DeltaSeg: Tiered Attention and Deep Delta Learning for Multi-Class Structural Defect Segmentation
by: Noguera, Enrique Hernandez, et al.
Published: (2026)
by: Noguera, Enrique Hernandez, et al.
Published: (2026)
SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation
by: Wu, Wangyu, et al.
Published: (2025)
by: Wu, Wangyu, et al.
Published: (2025)
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
by: Mirzadeh, Iman, et al.
Published: (2024)
by: Mirzadeh, Iman, et al.
Published: (2024)
RayRoPE: Projective Ray Positional Encoding for Multi-view Attention
by: Wu, Yu, et al.
Published: (2026)
by: Wu, Yu, et al.
Published: (2026)
LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation
by: Liu, Chang, et al.
Published: (2025)
by: Liu, Chang, et al.
Published: (2025)
Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
by: Ahn, Byeongjoo, et al.
Published: (2023)
by: Ahn, Byeongjoo, et al.
Published: (2023)
Efficient ConvBN Blocks for Transfer Learning and Beyond
by: You, Kaichao, et al.
Published: (2023)
by: You, Kaichao, et al.
Published: (2023)
MR. Judge: Multimodal Reasoner as a Judge
by: Pi, Renjie, et al.
Published: (2025)
by: Pi, Renjie, et al.
Published: (2025)
BiSeg-SAM: Weakly-Supervised Post-Processing Framework for Boosting Binary Segmentation in Segment Anything Models
by: Su, Encheng, et al.
Published: (2025)
by: Su, Encheng, et al.
Published: (2025)
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
by: Mehta, Sachin, et al.
Published: (2024)
by: Mehta, Sachin, et al.
Published: (2024)
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
by: Li, Kaiyu, et al.
Published: (2024)
by: Li, Kaiyu, et al.
Published: (2024)
Coding for Synthesis Defects
by: Lu, Ziyang, et al.
Published: (2024)
by: Lu, Ziyang, et al.
Published: (2024)
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2026)
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2026)
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
by: Cao, Qingqing, et al.
Published: (2024)
by: Cao, Qingqing, et al.
Published: (2024)
Velox: Learning Representations of 4D Geometry and Appearance
by: Malik, Anagh, et al.
Published: (2026)
by: Malik, Anagh, et al.
Published: (2026)
El uso de las redes sociales y la cultura popular para una mejor comprensión intercultural
by: Sait Tuzel
Published: (2017)
by: Sait Tuzel
Published: (2017)
Similar Items
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023) -
Learning from Self Critique and Refinement for Faithful LLM Summarization
by: Hu, Ting-Yao, et al.
Published: (2025) -
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
by: Vemulapalli, Raviteja, et al.
Published: (2023) -
Additive Tensor Decomposition Considering Structural Data Information
by: Mou, Shancong, et al.
Published: (2020) -
TiC-CLIP: Continual Training of CLIP Models
by: Garg, Saurabh, et al.
Published: (2023)