Saved in:
| Main Authors: | Yang, Jingru, Yu, Huan, Jingxin, Yang, Xu, Chentianye, Biao, Yin, Sun, Yu, He, Shengfeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.10252 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Morpho-Aware Global Attention for Image Matting
by: Yang, Jingru, et al.
Published: (2024)
by: Yang, Jingru, et al.
Published: (2024)
CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders
by: Xu, Chentianye, et al.
Published: (2024)
by: Xu, Chentianye, et al.
Published: (2024)
Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024)
by: Zhu, Huilin, et al.
Published: (2024)
Towards General Visual-Linguistic Face Forgery Detection
by: Sun, Ke, et al.
Published: (2023)
by: Sun, Ke, et al.
Published: (2023)
FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)
by: Zhu, Huilin, et al.
Published: (2025)
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
by: Song, Zikai, et al.
Published: (2026)
by: Song, Zikai, et al.
Published: (2026)
StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models
by: Yang, Haoxin, et al.
Published: (2025)
by: Yang, Haoxin, et al.
Published: (2025)
Transparent Visual Reasoning via Object-Centric Agent Collaboration
by: Teoh, Benjamin, et al.
Published: (2025)
by: Teoh, Benjamin, et al.
Published: (2025)
Expanding Zero-Shot Object Counting with Rich Prompts
by: Zhu, Huilin, et al.
Published: (2025)
by: Zhu, Huilin, et al.
Published: (2025)
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
Zero-Shot Video Translation via Token Warping
by: Zhu, Haiming, et al.
Published: (2024)
by: Zhu, Haiming, et al.
Published: (2024)
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
by: Huang, Ronggang, et al.
Published: (2025)
by: Huang, Ronggang, et al.
Published: (2025)
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
by: Lin, Honglin, et al.
Published: (2024)
by: Lin, Honglin, et al.
Published: (2024)
Lagrangian Motion Fields for Long-term Motion Generation
by: Yang, Yifei, et al.
Published: (2024)
by: Yang, Yifei, et al.
Published: (2024)
MixSA: Training-free Reference-based Sketch Extraction via Mixture-of-Self-Attention
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
by: Yu, Xinlei, et al.
Published: (2025)
by: Yu, Xinlei, et al.
Published: (2025)
PanopticQuery: Unified Query-Time Reasoning for 4D Scenes
by: Tang, Ruilin, et al.
Published: (2026)
by: Tang, Ruilin, et al.
Published: (2026)
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)
by: Yuan, Haobo, et al.
Published: (2025)
Registration is a Powerful Rotation-Invariance Learner for 3D Anomaly Detection
by: Yu, Yuyang, et al.
Published: (2025)
by: Yu, Yuyang, et al.
Published: (2025)
CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model
by: Yin, Pengwei, et al.
Published: (2024)
by: Yin, Pengwei, et al.
Published: (2024)
Instruct2See: Learning to Remove Any Obstructions Across Distributions
by: Li, Junhang, et al.
Published: (2025)
by: Li, Junhang, et al.
Published: (2025)
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
by: Guo, Yu, et al.
Published: (2025)
by: Guo, Yu, et al.
Published: (2025)
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
by: Jiang, Huajie, et al.
Published: (2025)
by: Jiang, Huajie, et al.
Published: (2025)
Unifying Global-Local Representations in Salient Object Detection with Transformer
by: Ren, Sucheng, et al.
Published: (2021)
by: Ren, Sucheng, et al.
Published: (2021)
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
by: Liu, Xiao, et al.
Published: (2024)
by: Liu, Xiao, et al.
Published: (2024)
Towards General Visual-Linguistic Face Forgery Detection(V2)
by: Sun, Ke, et al.
Published: (2025)
by: Sun, Ke, et al.
Published: (2025)
SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning
by: Li, Yian, et al.
Published: (2026)
by: Li, Yian, et al.
Published: (2026)
Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
by: Xiong, Yuqi, et al.
Published: (2026)
by: Xiong, Yuqi, et al.
Published: (2026)
CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)
by: Li, Kailing, et al.
Published: (2025)
Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
by: Sun, Jiayang, et al.
Published: (2025)
by: Sun, Jiayang, et al.
Published: (2025)
Empowering Lightweight MLLMs with Reasoning via Long CoT SFT
by: Ou, Linyu, et al.
Published: (2025)
by: Ou, Linyu, et al.
Published: (2025)
ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering
by: Lassoued, Aymen, et al.
Published: (2026)
by: Lassoued, Aymen, et al.
Published: (2026)
VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
by: Wang, Shaoan, et al.
Published: (2026)
by: Wang, Shaoan, et al.
Published: (2026)
StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion
by: Yang, Haoxin, et al.
Published: (2025)
by: Yang, Haoxin, et al.
Published: (2025)
VrdONE: One-stage Video Visual Relation Detection
by: Jiang, Xinjie, et al.
Published: (2024)
by: Jiang, Xinjie, et al.
Published: (2024)
One-for-All: Towards Universal Domain Translation with a Single StyleGAN
by: Du, Yong, et al.
Published: (2023)
by: Du, Yong, et al.
Published: (2023)
Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation
by: Jiang, Haichao, et al.
Published: (2026)
by: Jiang, Haichao, et al.
Published: (2026)
Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection
by: Tang, Hao, et al.
Published: (2024)
by: Tang, Hao, et al.
Published: (2024)
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
by: Ranasinghe, Kanchana, et al.
Published: (2024)
by: Ranasinghe, Kanchana, et al.
Published: (2024)
DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2026)
by: Zhong, Zhide, et al.
Published: (2026)
Similar Items
-
Morpho-Aware Global Attention for Image Matting
by: Yang, Jingru, et al.
Published: (2024) -
CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders
by: Xu, Chentianye, et al.
Published: (2024) -
Zero-shot Object Counting with Good Exemplars
by: Zhu, Huilin, et al.
Published: (2024) -
Towards General Visual-Linguistic Face Forgery Detection
by: Sun, Ke, et al.
Published: (2023) -
FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting
by: Zhu, Huilin, et al.
Published: (2025)