Saved in:
| Main Authors: | Wei, Xiyuan, Lin, Ming, Ye, Fanjiang, Song, Fengguang, Cao, Liangliang, Thai, My T., Yang, Tianbao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.06699 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
by: Wei, Xiyuan, et al.
Published: (2025)
by: Wei, Xiyuan, et al.
Published: (2025)
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
by: Wei, Xiyuan, et al.
Published: (2024)
by: Wei, Xiyuan, et al.
Published: (2024)
Breaking the Limits of Open-Weight CLIP: An Optimization Framework for Self-supervised Fine-tuning of CLIP
by: Mehta, Anant, et al.
Published: (2026)
by: Mehta, Anant, et al.
Published: (2026)
Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
by: Ding, Xin, et al.
Published: (2025)
by: Ding, Xin, et al.
Published: (2025)
PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models
by: Hoang-Xuan, Nhat, et al.
Published: (2025)
by: Hoang-Xuan, Nhat, et al.
Published: (2025)
LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions
by: Hoang-Xuan, Nhat, et al.
Published: (2024)
by: Hoang-Xuan, Nhat, et al.
Published: (2024)
Steer Away From Mode Collisions: Improving Composition In Diffusion Models
by: Dutta, Debottam, et al.
Published: (2025)
by: Dutta, Debottam, et al.
Published: (2025)
TIDE: Text-Informed Dynamic Extrapolation with Step-Aware Temperature Control for Diffusion Transformers
by: Liu, Yihua, et al.
Published: (2026)
by: Liu, Yihua, et al.
Published: (2026)
AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays
by: Yi, Chenlang, et al.
Published: (2025)
by: Yi, Chenlang, et al.
Published: (2025)
Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
Scaling Laws for Deepfake Detection
by: Wang, Wenhao, et al.
Published: (2025)
by: Wang, Wenhao, et al.
Published: (2025)
AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning
by: Song, Fei, et al.
Published: (2025)
by: Song, Fei, et al.
Published: (2025)
Referring Multiple Regions with Large Multimodal Models via Contextual Latent Steering
by: Xing, Yun, et al.
Published: (2026)
by: Xing, Yun, et al.
Published: (2026)
Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model
by: Wang, Xiyuan, et al.
Published: (2025)
by: Wang, Xiyuan, et al.
Published: (2025)
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions
by: Cao, Tue M., et al.
Published: (2025)
by: Cao, Tue M., et al.
Published: (2025)
Discovering Global False Negatives On the Fly for Self-supervised Contrastive Learning
by: Balmaseda, Vicente, et al.
Published: (2025)
by: Balmaseda, Vicente, et al.
Published: (2025)
ConstStyle: Robust Domain Generalization with Unified Style Transformation
by: Tran, Nam Duong, et al.
Published: (2025)
by: Tran, Nam Duong, et al.
Published: (2025)
Test-Time Computing for Referring Multimodal Large Language Models
by: Wu, Mingrui, et al.
Published: (2026)
by: Wu, Mingrui, et al.
Published: (2026)
Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning
by: Qi, Qi, et al.
Published: (2024)
by: Qi, Qi, et al.
Published: (2024)
Scaling Laws for Native Multimodal Models
by: Shukor, Mustafa, et al.
Published: (2025)
by: Shukor, Mustafa, et al.
Published: (2025)
A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation
by: Du, Dazhao, et al.
Published: (2024)
by: Du, Dazhao, et al.
Published: (2024)
A General Framework for Inference-time Scaling and Steering of Diffusion Models
by: Singhal, Raghav, et al.
Published: (2025)
by: Singhal, Raghav, et al.
Published: (2025)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
End-To-End Underwater Video Enhancement: Dataset and Model
by: Du, Dazhao, et al.
Published: (2024)
by: Du, Dazhao, et al.
Published: (2024)
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
by: Lu, Shunlin, et al.
Published: (2024)
by: Lu, Shunlin, et al.
Published: (2024)
Learning Novel View Synthesis from Heterogeneous Low-light Captures
by: Zheng, Quan, et al.
Published: (2024)
by: Zheng, Quan, et al.
Published: (2024)
Energy-Aware Imitation Learning for Steering Prediction Using Events and Frames
by: Cao, Hu, et al.
Published: (2026)
by: Cao, Hu, et al.
Published: (2026)
CountSteer: Steering Attention for Object Counting in Diffusion Models
by: Boo, Hyemin, et al.
Published: (2025)
by: Boo, Hyemin, et al.
Published: (2025)
Controllable Generation with Text-to-Image Diffusion Models: A Survey
by: Cao, Pu, et al.
Published: (2024)
by: Cao, Pu, et al.
Published: (2024)
EdgeSpotter: Multi-Scale Dense Text Spotting for Industrial Panel Monitoring
by: Fu, Changhong, et al.
Published: (2025)
by: Fu, Changhong, et al.
Published: (2025)
Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning
by: Cao, Xusheng, et al.
Published: (2025)
by: Cao, Xusheng, et al.
Published: (2025)
Neural Residual Diffusion Models for Deep Scalable Vision Generation
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
by: Zhang, Haotian, et al.
Published: (2024)
by: Zhang, Haotian, et al.
Published: (2024)
Kaleido: Open-Sourced Multi-Subject Reference Video Generation Model
by: Zhang, Zhenxing, et al.
Published: (2025)
by: Zhang, Zhenxing, et al.
Published: (2025)
RIS-LAD: A Benchmark and Model for Referring Low-Altitude Drone Image Segmentation
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)
by: Yang, Jiaxi, et al.
Published: (2026)
Data Scaling Laws for Radiology Foundation Models
by: Ilse, Maximilian, et al.
Published: (2025)
by: Ilse, Maximilian, et al.
Published: (2025)
Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection
by: Bao, Wenxuan, et al.
Published: (2026)
by: Bao, Wenxuan, et al.
Published: (2026)
Scaling Laws For Diffusion Transformers
by: Liang, Zhengyang, et al.
Published: (2024)
by: Liang, Zhengyang, et al.
Published: (2024)
RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
by: Cao, Boyuan, et al.
Published: (2024)
by: Cao, Boyuan, et al.
Published: (2024)
Similar Items
-
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
by: Wei, Xiyuan, et al.
Published: (2025) -
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
by: Wei, Xiyuan, et al.
Published: (2024) -
Breaking the Limits of Open-Weight CLIP: An Optimization Framework for Self-supervised Fine-tuning of CLIP
by: Mehta, Anant, et al.
Published: (2026) -
Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
by: Ding, Xin, et al.
Published: (2025) -
PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models
by: Hoang-Xuan, Nhat, et al.
Published: (2025)