:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wei, Xiyuan, Lin, Ming, Ye, Fanjiang, Song, Fengguang, Cao, Liangliang, Thai, My T., Yang, Tianbao
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.06699
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
by: Wei, Xiyuan, et al.
Published: (2025)

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
by: Wei, Xiyuan, et al.
Published: (2024)

Breaking the Limits of Open-Weight CLIP: An Optimization Framework for Self-supervised Fine-tuning of CLIP
by: Mehta, Anant, et al.
Published: (2026)

Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
by: Ding, Xin, et al.
Published: (2025)

PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models
by: Hoang-Xuan, Nhat, et al.
Published: (2025)

LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions
by: Hoang-Xuan, Nhat, et al.
Published: (2024)

Steer Away From Mode Collisions: Improving Composition In Diffusion Models
by: Dutta, Debottam, et al.
Published: (2025)

TIDE: Text-Informed Dynamic Extrapolation with Step-Aware Temperature Control for Diffusion Transformers
by: Liu, Yihua, et al.
Published: (2026)

AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays
by: Yi, Chenlang, et al.
Published: (2025)

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
by: Patel, Maitreya, et al.
Published: (2024)

Scaling Laws for Deepfake Detection
by: Wang, Wenhao, et al.
Published: (2025)

AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning
by: Song, Fei, et al.
Published: (2025)

Referring Multiple Regions with Large Multimodal Models via Contextual Latent Steering
by: Xing, Yun, et al.
Published: (2026)

Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model
by: Wang, Xiyuan, et al.
Published: (2025)

NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions
by: Cao, Tue M., et al.
Published: (2025)

Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
by: Balmaseda, Vicente, et al.
Published: (2025)

ConstStyle: Robust Domain Generalization with Unified Style Transformation
by: Tran, Nam Duong, et al.
Published: (2025)

Test-Time Computing for Referring Multimodal Large Language Models
by: Wu, Mingrui, et al.
Published: (2026)

Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning
by: Qi, Qi, et al.
Published: (2024)

Scaling Laws for Native Multimodal Models
by: Shukor, Mustafa, et al.
Published: (2025)

A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation
by: Du, Dazhao, et al.
Published: (2024)

A General Framework for Inference-time Scaling and Steering of Diffusion Models
by: Singhal, Raghav, et al.
Published: (2025)

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)

End-To-End Underwater Video Enhancement: Dataset and Model
by: Du, Dazhao, et al.
Published: (2024)

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
by: Lu, Shunlin, et al.
Published: (2024)

Learning Novel View Synthesis from Heterogeneous Low-light Captures
by: Zheng, Quan, et al.
Published: (2024)

Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames
by: Cao, Hu, et al.
Published: (2026)

CountSteer: Steering Attention for Object Counting in Diffusion Models
by: Boo, Hyemin, et al.
Published: (2025)

Controllable Generation with Text-to-Image Diffusion Models: A Survey
by: Cao, Pu, et al.
Published: (2024)

EdgeSpotter: Multi-Scale Dense Text Spotting for Industrial Panel Monitoring
by: Fu, Changhong, et al.
Published: (2025)

Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning
by: Cao, Xusheng, et al.
Published: (2025)

Neural Residual Diffusion Models for Deep Scalable Vision Generation
by: Ma, Zhiyuan, et al.
Published: (2024)

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
by: Zhang, Haotian, et al.
Published: (2024)

Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model
by: Zhang, Zhenxing, et al.
Published: (2025)

RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation
by: Ye, Kai, et al.
Published: (2025)

Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)

Data Scaling Laws for Radiology Foundation Models
by: Ilse, Maximilian, et al.
Published: (2025)

Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection
by: Bao, Wenxuan, et al.
Published: (2026)

Scaling Laws For Diffusion Transformers
by: Liang, Zhengyang, et al.
Published: (2024)

RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
by: Cao, Boyuan, et al.
Published: (2024)