:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wen, Xin, Zhao, Bingchen, Elezi, Ismail, Deng, Jiankang, Qi, Xiaojuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.08685
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

G3DR: Generative 3D Reconstruction in ImageNet
by: Reddy, Pradyumna, et al.
Published: (2024)

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
by: Miles, Roy, et al.
Published: (2024)

Deep Active Learning: A Reality Check
by: Gashi, Edrina, et al.
Published: (2024)

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
by: Ye-Bin, Moon, et al.
Published: (2025)

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
by: Miles, Roy, et al.
Published: (2024)

Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning
by: Ma, Chengcheng, et al.
Published: (2023)

SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing
by: Toker, Aysim, et al.
Published: (2025)

Fractal Calibration for long-tailed object detection
by: Alexandridis, Konstantinos Panagiotis, et al.
Published: (2024)

Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering
by: Choi, Yura, et al.
Published: (2026)

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
by: Wen, Xin, et al.
Published: (2025)

What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
by: Wen, Xin, et al.
Published: (2024)

ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
by: Xie, Yin, et al.
Published: (2024)

Region-based Cluster Discrimination for Visual Representation Learning
by: Xie, Yin, et al.
Published: (2025)

DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces
by: Khan, Mohammad Sadil, et al.
Published: (2026)

Interpretable Text-Guided Image Clustering via Iterative Search
by: Zhao, Bingchen, et al.
Published: (2025)

Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
by: Zhao, Shizhen, et al.
Published: (2025)

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
by: Zhang, Letian, et al.
Published: (2023)

MaDiS: Taming Masked Diffusion Language Models for Sign Language Generation
by: Zuo, Ronglai, et al.
Published: (2026)

Vision Foundation Models as Generalist Tokenizers for Image Generation
by: Zheng, Anlin, et al.
Published: (2026)

LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents
by: Li, Bingchen, et al.
Published: (2024)

Learning from Neighbors: Category Extrapolation for Long-Tail Learning
by: Zhao, Shizhen, et al.
Published: (2024)

Generalized Category Discovery under the Long-Tailed Distribution
by: Zhao, Bingchen, et al.
Published: (2025)

Can OOD Object Detectors Learn from Foundation Models?
by: Liu, Jiahui, et al.
Published: (2024)

MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs
by: Ren, Yulin, et al.
Published: (2024)

LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4$\times$RTX 4090s
by: Wang, Xijun, et al.
Published: (2025)

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
by: Cui, Jiequan, et al.
Published: (2024)

Hyperspectral Image Spectral-Spatial Feature Extraction via Tensor Principal Component Analysis
by: Ren, Yuemei, et al.
Published: (2024)

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
by: Cui, Siying, et al.
Published: (2024)

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
by: Zhao, Bingchen, et al.
Published: (2024)

Feature Aligning Few shot Learning Method Using Local Descriptors Weighted Rules
by: Yan, Bingchen
Published: (2024)

Unleashing Vision-Language Semantics for Deepfake Video Detection
by: Zhu, Jiawen, et al.
Published: (2026)

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
by: Du, Zhipeng, et al.
Published: (2023)

MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation
by: Ye, Xinyan, et al.
Published: (2026)

WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
by: Miao, Yunqi, et al.
Published: (2024)

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
by: Lu, Yanzuo, et al.
Published: (2026)

Eigenpatches -- Adversarial Patches from Principal Components
by: Bayer, Jens, et al.
Published: (2023)

Robust Principal Component Analysis via Discriminant Sample Weight Learning
by: Deng, Yingzhuo, et al.
Published: (2024)

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
by: Song, Nan, et al.
Published: (2025)

Robust Principal Component Completion
by: Wang, Yinjian, et al.
Published: (2026)

Can 3D Vision-Language Models Truly Understand Natural Language?
by: Deng, Weipeng, et al.
Published: (2024)