Saved in:
| Main Authors: | Wen, Xin, Zhao, Bingchen, Elezi, Ismail, Deng, Jiankang, Qi, Xiaojuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.08685 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
G3DR: Generative 3D Reconstruction in ImageNet
by: Reddy, Pradyumna, et al.
Published: (2024)
by: Reddy, Pradyumna, et al.
Published: (2024)
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
by: Miles, Roy, et al.
Published: (2024)
by: Miles, Roy, et al.
Published: (2024)
Deep Active Learning: A Reality Check
by: Gashi, Edrina, et al.
Published: (2024)
by: Gashi, Edrina, et al.
Published: (2024)
RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
by: Ye-Bin, Moon, et al.
Published: (2025)
by: Ye-Bin, Moon, et al.
Published: (2025)
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
by: Miles, Roy, et al.
Published: (2024)
by: Miles, Roy, et al.
Published: (2024)
Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning
by: Ma, Chengcheng, et al.
Published: (2023)
by: Ma, Chengcheng, et al.
Published: (2023)
SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing
by: Toker, Aysim, et al.
Published: (2025)
by: Toker, Aysim, et al.
Published: (2025)
Fractal Calibration for long-tailed object detection
by: Alexandridis, Konstantinos Panagiotis, et al.
Published: (2024)
by: Alexandridis, Konstantinos Panagiotis, et al.
Published: (2024)
Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering
by: Choi, Yura, et al.
Published: (2026)
by: Choi, Yura, et al.
Published: (2026)
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
by: Wen, Xin, et al.
Published: (2025)
by: Wen, Xin, et al.
Published: (2025)
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
by: Wen, Xin, et al.
Published: (2024)
by: Wen, Xin, et al.
Published: (2024)
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
by: Xie, Yin, et al.
Published: (2024)
by: Xie, Yin, et al.
Published: (2024)
Region-based Cluster Discrimination for Visual Representation Learning
by: Xie, Yin, et al.
Published: (2025)
by: Xie, Yin, et al.
Published: (2025)
DreamCAD: Scaling Multi-modal CAD Generation using Differentiable Parametric Surfaces
by: Khan, Mohammad Sadil, et al.
Published: (2026)
by: Khan, Mohammad Sadil, et al.
Published: (2026)
Interpretable Text-Guided Image Clustering via Iterative Search
by: Zhao, Bingchen, et al.
Published: (2025)
by: Zhao, Bingchen, et al.
Published: (2025)
Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
by: Zhao, Shizhen, et al.
Published: (2025)
by: Zhao, Shizhen, et al.
Published: (2025)
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
by: Zhang, Letian, et al.
Published: (2023)
by: Zhang, Letian, et al.
Published: (2023)
MaDiS: Taming Masked Diffusion Language Models for Sign Language Generation
by: Zuo, Ronglai, et al.
Published: (2026)
by: Zuo, Ronglai, et al.
Published: (2026)
Vision Foundation Models as Generalist Tokenizers for Image Generation
by: Zheng, Anlin, et al.
Published: (2026)
by: Zheng, Anlin, et al.
Published: (2026)
LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents
by: Li, Bingchen, et al.
Published: (2024)
by: Li, Bingchen, et al.
Published: (2024)
Learning from Neighbors: Category Extrapolation for Long-Tail Learning
by: Zhao, Shizhen, et al.
Published: (2024)
by: Zhao, Shizhen, et al.
Published: (2024)
Generalized Category Discovery under the Long-Tailed Distribution
by: Zhao, Bingchen, et al.
Published: (2025)
by: Zhao, Bingchen, et al.
Published: (2025)
Can OOD Object Detectors Learn from Foundation Models?
by: Liu, Jiahui, et al.
Published: (2024)
by: Liu, Jiahui, et al.
Published: (2024)
MambaCSR: Dual-Interleaved Scanning for Compressed Image Super-Resolution With SSMs
by: Ren, Yulin, et al.
Published: (2024)
by: Ren, Yulin, et al.
Published: (2024)
LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4$\times$RTX 4090s
by: Wang, Xijun, et al.
Published: (2025)
by: Wang, Xijun, et al.
Published: (2025)
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
by: Cui, Jiequan, et al.
Published: (2024)
by: Cui, Jiequan, et al.
Published: (2024)
Hyperspectral Image Spectral-Spatial Feature Extraction via Tensor Principal Component Analysis
by: Ren, Yuemei, et al.
Published: (2024)
by: Ren, Yuemei, et al.
Published: (2024)
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
by: Cui, Siying, et al.
Published: (2024)
by: Cui, Siying, et al.
Published: (2024)
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
by: Zhao, Bingchen, et al.
Published: (2024)
by: Zhao, Bingchen, et al.
Published: (2024)
Feature Aligning Few shot Learning Method Using Local Descriptors Weighted Rules
by: Yan, Bingchen
Published: (2024)
by: Yan, Bingchen
Published: (2024)
Unleashing Vision-Language Semantics for Deepfake Video Detection
by: Zhu, Jiawen, et al.
Published: (2026)
by: Zhu, Jiawen, et al.
Published: (2026)
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
by: Du, Zhipeng, et al.
Published: (2023)
by: Du, Zhipeng, et al.
Published: (2023)
MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation
by: Ye, Xinyan, et al.
Published: (2026)
by: Ye, Xinyan, et al.
Published: (2026)
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
by: Miao, Yunqi, et al.
Published: (2024)
by: Miao, Yunqi, et al.
Published: (2024)
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
by: Lu, Yanzuo, et al.
Published: (2026)
by: Lu, Yanzuo, et al.
Published: (2026)
Eigenpatches -- Adversarial Patches from Principal Components
by: Bayer, Jens, et al.
Published: (2023)
by: Bayer, Jens, et al.
Published: (2023)
Robust Principal Component Analysis via Discriminant Sample Weight Learning
by: Deng, Yingzhuo, et al.
Published: (2024)
by: Deng, Yingzhuo, et al.
Published: (2024)
LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
by: Song, Nan, et al.
Published: (2025)
by: Song, Nan, et al.
Published: (2025)
Robust Principal Component Completion
by: Wang, Yinjian, et al.
Published: (2026)
by: Wang, Yinjian, et al.
Published: (2026)
Can 3D Vision-Language Models Truly Understand Natural Language?
by: Deng, Weipeng, et al.
Published: (2024)
by: Deng, Weipeng, et al.
Published: (2024)
Similar Items
-
G3DR: Generative 3D Reconstruction in ImageNet
by: Reddy, Pradyumna, et al.
Published: (2024) -
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
by: Miles, Roy, et al.
Published: (2024) -
Deep Active Learning: A Reality Check
by: Gashi, Edrina, et al.
Published: (2024) -
RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
by: Ye-Bin, Moon, et al.
Published: (2025) -
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
by: Miles, Roy, et al.
Published: (2024)