:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Jingru, Yu, Huan, Jingxin, Yang, Xu, Chentianye, Biao, Yin, Sun, Yu, He, Shengfeng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.10252
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Morpho-Aware Global Attention for Image Matting
by: Yang, Jingru, et al.
Published: (2024)

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders
by: Xu, Chentianye, et al.
Published: (2024)

Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024)

Towards General Visual-Linguistic Face Forgery Detection
by: Sun, Ke, et al.
Published: (2023)

FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)

Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
by: Song, Zikai, et al.
Published: (2026)

StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models
by: Yang, Haoxin, et al.
Published: (2025)

Transparent Visual Reasoning via Object-Centric Agent Collaboration
by: Teoh, Benjamin, et al.
Published: (2025)

Expanding Zero-Shot Object Counting with Rich Prompts
by: Zhu, Huilin, et al.
Published: (2025)

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)

Zero-Shot Video Translation via Token Warping
by: Zhu, Haiming, et al.
Published: (2024)

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
by: Huang, Ronggang, et al.
Published: (2025)

ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
by: Lin, Honglin, et al.
Published: (2024)

Lagrangian Motion Fields for Long-term Motion Generation
by: Yang, Yifei, et al.
Published: (2024)

MixSA: Training-free Reference-based Sketch Extraction via Mixture-of-Self-Attention
by: Yang, Rui, et al.
Published: (2025)

Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
by: Yu, Xinlei, et al.
Published: (2025)

PanopticQuery: Unified Query-Time Reasoning for 4D Scenes
by: Tang, Ruilin, et al.
Published: (2026)

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)

Registration is a Powerful Rotation-Invariance Learner for 3D Anomaly Detection
by: Yu, Yuyang, et al.
Published: (2025)

CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model
by: Yin, Pengwei, et al.
Published: (2024)

Instruct2See: Learning to Remove Any Obstructions Across Distributions
by: Li, Junhang, et al.
Published: (2025)

Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
by: Guo, Yu, et al.
Published: (2025)

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
by: Jiang, Huajie, et al.
Published: (2025)

Unifying Global-Local Representations in Salient Object Detection with Transformer
by: Ren, Sucheng, et al.
Published: (2021)

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
by: Liu, Xiao, et al.
Published: (2024)

Towards General Visual-Linguistic Face Forgery Detection(V2)
by: Sun, Ke, et al.
Published: (2025)

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning
by: Li, Yian, et al.
Published: (2026)

Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
by: Xiong, Yuqi, et al.
Published: (2026)

CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)

Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
by: Sun, Jiayang, et al.
Published: (2025)

Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
by: Ou, Linyu, et al.
Published: (2025)

ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering
by: Lassoued, Aymen, et al.
Published: (2026)

VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
by: Wang, Shaoan, et al.
Published: (2026)

StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion
by: Yang, Haoxin, et al.
Published: (2025)

VrdONE: One-stage Video Visual Relation Detection
by: Jiang, Xinjie, et al.
Published: (2024)

One-for-All: Towards Universal Domain Translation with a Single StyleGAN
by: Du, Yong, et al.
Published: (2023)

Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation
by: Jiang, Haichao, et al.
Published: (2026)

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection
by: Tang, Hao, et al.
Published: (2024)

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
by: Ranasinghe, Kanchana, et al.
Published: (2024)

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2026)