:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mou, Shancong, Vemulapalli, Raviteja, Li, Shiyu, Liu, Yuxuan, Thomas, C, Cao, Meng, Bai, Haoping, Tuzel, Oncel, Huang, Ping, Shan, Jiulong, Shi, Jianjun
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.18490
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2023)

Learning from Self Critique and Refinement for Faithful LLM Summarization
by: Hu, Ting-Yao, et al.
Published: (2025)

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
by: Vemulapalli, Raviteja, et al.
Published: (2023)

Additive Tensor Decomposition Considering Structural Data Information
by: Mou, Shancong, et al.
Published: (2020)

TiC-CLIP: Continual Training of CLIP Models
by: Garg, Saurabh, et al.
Published: (2023)

MUSCLE: A Model Update Strategy for Compatible LLM Evolution
by: Echterhoff, Jessica, et al.
Published: (2024)

Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
by: Huang, Chen, et al.
Published: (2025)

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
by: Wang, Haoxiang, et al.
Published: (2023)

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
by: Hsieh, Cheng-Yu, et al.
Published: (2025)

ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context
by: Xiu, Zidi, et al.
Published: (2026)

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
by: Chowdhury, Sanjoy, et al.
Published: (2025)

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
by: Li, Jeffrey, et al.
Published: (2025)

Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)

Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation
by: Liu, Aiwei, et al.
Published: (2024)

Learning to Reason for Hallucination Span Detection
by: Su, Hsuan, et al.
Published: (2025)

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
by: Zhu, Jingyuan, et al.
Published: (2024)

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
by: Liu, Aiwei, et al.
Published: (2024)

Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation
by: Kong, Deyi, et al.
Published: (2026)

Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products
by: Liu, Jiayu, et al.
Published: (2024)

CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2024)

LiTo: Surface Light Field Tokenization
by: Chang, Jen-Hao Rick, et al.
Published: (2026)

Pretraining with hierarchical memories: separating long-tail and common knowledge
by: Pouransari, Hadi, et al.
Published: (2025)

COMPASS: Benchmarking Constrained Optimization in LLM Agents
by: Qin, Tian, et al.
Published: (2025)

VeCLIP: Improving CLIP Training via Visual-enriched Captions
by: Lai, Zhengfeng, et al.
Published: (2023)

DeltaSeg: Tiered Attention and Deep Delta Learning for Multi-Class Structural Defect Segmentation
by: Noguera, Enrique Hernandez, et al.
Published: (2026)

SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation
by: Wu, Wangyu, et al.
Published: (2025)

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
by: Mirzadeh, Iman, et al.
Published: (2024)

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention
by: Wu, Yu, et al.
Published: (2026)

LangDA: Building Context-Awareness via Language for Domain Adaptive Semantic Segmentation
by: Liu, Chang, et al.
Published: (2025)

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
by: Ahn, Byeongjoo, et al.
Published: (2023)

Efficient ConvBN Blocks for Transfer Learning and Beyond
by: You, Kaichao, et al.
Published: (2023)

MR. Judge: Multimodal Reasoner as a Judge
by: Pi, Renjie, et al.
Published: (2025)

BiSeg-SAM: Weakly-Supervised Post-Processing Framework for Boosting Binary Segmentation in Segment Anything Models
by: Su, Encheng, et al.
Published: (2025)

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
by: Mehta, Sachin, et al.
Published: (2024)

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
by: Li, Kaiyu, et al.
Published: (2024)

Coding for Synthesis Defects
by: Lu, Ziyang, et al.
Published: (2024)

VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
by: Vasu, Pavan Kumar Anasosalu, et al.
Published: (2026)

CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
by: Cao, Qingqing, et al.
Published: (2024)

Velox: Learning Representations of 4D Geometry and Appearance
by: Malik, Anagh, et al.
Published: (2026)

El uso de las redes sociales y la cultura popular para una mejor comprensión intercultural
by: Sait Tuzel
Published: (2017)