Saved in:
| Main Authors: | Du, Ke, Peng, Yimin, Gao, Chao, Zhou, Fan, Xue, Siqiao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.04394 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval
by: ai, Gensmo., et al.
Published: (2026)
by: ai, Gensmo., et al.
Published: (2026)
Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models
by: Chu, Zhixuan, et al.
Published: (2024)
by: Chu, Zhixuan, et al.
Published: (2024)
Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration
by: Xu, Yimin, et al.
Published: (2024)
by: Xu, Yimin, et al.
Published: (2024)
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Scaling Language-Free Visual Representation Learning
by: Fan, David, et al.
Published: (2025)
by: Fan, David, et al.
Published: (2025)
Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
by: Chen, Xin, et al.
Published: (2023)
by: Chen, Xin, et al.
Published: (2023)
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
by: Liu, Zhiheng, et al.
Published: (2025)
by: Liu, Zhiheng, et al.
Published: (2025)
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
by: Tang, Changli, et al.
Published: (2025)
by: Tang, Changli, et al.
Published: (2025)
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
by: Zhou, Chao, et al.
Published: (2025)
by: Zhou, Chao, et al.
Published: (2025)
One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
by: Zhao, Jiale, et al.
Published: (2025)
by: Zhao, Jiale, et al.
Published: (2025)
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
by: Hong, Lingyi, et al.
Published: (2024)
by: Hong, Lingyi, et al.
Published: (2024)
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
by: Du, Henghui, et al.
Published: (2025)
by: Du, Henghui, et al.
Published: (2025)
Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking
by: Guo, Mingzhe, et al.
Published: (2024)
by: Guo, Mingzhe, et al.
Published: (2024)
VastTrack: Vast Category Visual Object Tracking
by: Peng, Liang, et al.
Published: (2024)
by: Peng, Liang, et al.
Published: (2024)
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
by: Wang, Zehan, et al.
Published: (2024)
by: Wang, Zehan, et al.
Published: (2024)
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
by: Jin, Peng, et al.
Published: (2023)
by: Jin, Peng, et al.
Published: (2023)
Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
by: He, Liren, et al.
Published: (2024)
by: He, Liren, et al.
Published: (2024)
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
by: Zhong, Xinhao, et al.
Published: (2025)
by: Zhong, Xinhao, et al.
Published: (2025)
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
by: Wu, Yuanchen, et al.
Published: (2025)
by: Wu, Yuanchen, et al.
Published: (2025)
Pose as Clinical Prior: Learning Dual Representations for Scoliosis Screening
by: Zhou, Zirui, et al.
Published: (2025)
by: Zhou, Zirui, et al.
Published: (2025)
Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
by: Huang, Yinxuan, et al.
Published: (2024)
by: Huang, Yinxuan, et al.
Published: (2024)
Vid2World: Crafting Video Diffusion Models to Interactive World Models
by: Huang, Siqiao, et al.
Published: (2025)
by: Huang, Siqiao, et al.
Published: (2025)
Multi-Scale Fusion for Object Representation
by: Zhao, Rongzhen, et al.
Published: (2024)
by: Zhao, Rongzhen, et al.
Published: (2024)
DSFormer: A Dual-Scale Cross-Learning Transformer for Visual Place Recognition
by: Jiang, Haiyang, et al.
Published: (2025)
by: Jiang, Haiyang, et al.
Published: (2025)
Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation
by: Zhang, Yabo, et al.
Published: (2026)
by: Zhang, Yabo, et al.
Published: (2026)
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
by: Xue, Jialong, et al.
Published: (2025)
by: Xue, Jialong, et al.
Published: (2025)
Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification
by: Zhu, Yimin, et al.
Published: (2025)
by: Zhu, Yimin, et al.
Published: (2025)
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
by: Fan, Lei, et al.
Published: (2024)
by: Fan, Lei, et al.
Published: (2024)
Conditional Representation Learning for Customized Tasks
by: Liu, Honglin, et al.
Published: (2025)
by: Liu, Honglin, et al.
Published: (2025)
Progressive Scaling Visual Object Tracking
by: Hong, Jack, et al.
Published: (2025)
by: Hong, Jack, et al.
Published: (2025)
UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning
by: Le, Huy, et al.
Published: (2025)
by: Le, Huy, et al.
Published: (2025)
Multi-Object Tracking by Hierarchical Visual Representations
by: Cao, Jinkun, et al.
Published: (2024)
by: Cao, Jinkun, et al.
Published: (2024)
Multi-Scale Representation Learning for Image Restoration with State-Space Model
by: He, Yuhong, et al.
Published: (2024)
by: He, Yuhong, et al.
Published: (2024)
Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
by: Zhang, Libo, et al.
Published: (2023)
by: Zhang, Libo, et al.
Published: (2023)
Unifying Global-Local Representations in Salient Object Detection with Transformer
by: Ren, Sucheng, et al.
Published: (2021)
by: Ren, Sucheng, et al.
Published: (2021)
A Unified Structure for Efficient RGB and RGB-D Salient Object Detection
by: Peng, Peng, et al.
Published: (2020)
by: Peng, Peng, et al.
Published: (2020)
Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning
by: Shi, Fan, et al.
Published: (2025)
by: Shi, Fan, et al.
Published: (2025)
DSU-Net:An Improved U-Net Model Based on DINOv2 and SAM2 with Multi-scale Cross-model Feature Enhancement
by: Xu, Yimin, et al.
Published: (2025)
by: Xu, Yimin, et al.
Published: (2025)
Learning Global Object-Centric Representations via Disentangled Slot Attention
by: Chen, Tonglin, et al.
Published: (2024)
by: Chen, Tonglin, et al.
Published: (2024)
Similar Items
-
LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval
by: ai, Gensmo., et al.
Published: (2026) -
Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models
by: Chu, Zhixuan, et al.
Published: (2024) -
Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration
by: Xu, Yimin, et al.
Published: (2024) -
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024) -
Scaling Language-Free Visual Representation Learning
by: Fan, David, et al.
Published: (2025)