:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Du, Ke, Peng, Yimin, Gao, Chao, Zhou, Fan, Xue, Siqiao
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.04394
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval
by: ai, Gensmo., et al.
Published: (2026)

Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models
by: Chu, Zhixuan, et al.
Published: (2024)

Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration
by: Xu, Yimin, et al.
Published: (2024)

GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)

Scaling Language-Free Visual Representation Learning
by: Fan, David, et al.
Published: (2025)

Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
by: Chen, Xin, et al.
Published: (2023)

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
by: Liu, Zhiheng, et al.
Published: (2025)

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
by: Tang, Changli, et al.
Published: (2025)

Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
by: Zhou, Chao, et al.
Published: (2025)

One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
by: Zhao, Jiale, et al.
Published: (2025)

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
by: Hong, Lingyi, et al.
Published: (2024)

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
by: Du, Henghui, et al.
Published: (2025)

Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking
by: Guo, Mingzhe, et al.
Published: (2024)

VastTrack: Vast Category Visual Object Tracking
by: Peng, Liang, et al.
Published: (2024)

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
by: Wang, Zehan, et al.
Published: (2024)

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
by: Jin, Peng, et al.
Published: (2023)

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
by: He, Liren, et al.
Published: (2024)

Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
by: Zhong, Xinhao, et al.
Published: (2025)

Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
by: Wu, Yuanchen, et al.
Published: (2025)

Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening
by: Zhou, Zirui, et al.
Published: (2025)

Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
by: Huang, Yinxuan, et al.
Published: (2024)

Vid2World: Crafting Video Diffusion Models to Interactive World Models
by: Huang, Siqiao, et al.
Published: (2025)

Multi-Scale Fusion for Object Representation
by: Zhao, Rongzhen, et al.
Published: (2024)

DSFormer: A Dual-Scale Cross-Learning Transformer for Visual Place Recognition
by: Jiang, Haiyang, et al.
Published: (2025)

Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
by: Zhang, Yabo, et al.
Published: (2026)

Attention Distillation: A Unified Approach to Visual Characteristics Transfer
by: Zhou, Yang, et al.
Published: (2025)

High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
by: Xue, Jialong, et al.
Published: (2025)

Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification
by: Zhu, Yimin, et al.
Published: (2025)

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
by: Fan, Lei, et al.
Published: (2024)

Conditional Representation Learning for Customized Tasks
by: Liu, Honglin, et al.
Published: (2025)

Progressive Scaling Visual Object Tracking
by: Hong, Jack, et al.
Published: (2025)

UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
by: Le, Huy, et al.
Published: (2025)

Multi-Object Tracking by Hierarchical Visual Representations
by: Cao, Jinkun, et al.
Published: (2024)

Multi-Scale Representation Learning for Image Restoration with State-Space Model
by: He, Yuhong, et al.
Published: (2024)

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
by: Zhang, Libo, et al.
Published: (2023)

Unifying Global-Local Representations in Salient Object Detection with Transformer
by: Ren, Sucheng, et al.
Published: (2021)

A Unified Structure for Efficient RGB and RGB-D Salient Object Detection
by: Peng, Peng, et al.
Published: (2020)

Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
by: Shi, Fan, et al.
Published: (2025)

DSU-Net:An Improved U-Net Model Based on DINOv2 and SAM2 with Multi-scale Cross-model Feature Enhancement
by: Xu, Yimin, et al.
Published: (2025)

Learning Global Object-Centric Representations via Disentangled Slot Attention
by: Chen, Tonglin, et al.
Published: (2024)