:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gao, Junyu, Zhang, Da, Wang, Qiyu, Zhao, Zhiyuan, Li, Xuelong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.13992
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
by: Li, Bingyu, et al.
Published: (2025)

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation
by: Li, Bingyu, et al.
Published: (2024)

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation
by: Li, Bingyu, et al.
Published: (2024)

MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment
by: Li, Bingyu, et al.
Published: (2025)

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing
by: Li, Bingyu, et al.
Published: (2025)

Exploring the Underwater World Segmentation without Extra Training
by: Li, Bingyu, et al.
Published: (2025)

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation
by: Li, Bingyu, et al.
Published: (2026)

Prototype-Based Low Altitude UAV Semantic Segmentation
by: Zhang, Da, et al.
Published: (2026)

Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline
by: Li, Bingyu, et al.
Published: (2026)

SVGen: Interpretable Vector Graphics Generation with Large Language Models
by: Wang, Feiyu, et al.
Published: (2025)

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding
by: Zhang, Da, et al.
Published: (2025)

NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images
by: Gao, Junyu, et al.
Published: (2024)

Boosting Quantitive and Spatial Awareness for Zero-Shot Object Counting
by: Zhang, Da, et al.
Published: (2026)

Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs
by: Ou, Siqu, et al.
Published: (2026)

IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework
by: Wang, Feiyu, et al.
Published: (2026)

Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture
by: Zhao, Zhiyuan, et al.
Published: (2025)

Exploring Scale Shift in Crowd Localization under the Context of Domain Generalization
by: Wang, Juncheng, et al.
Published: (2025)

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
by: Dai, Muzhi, et al.
Published: (2025)

One-Shot Crowd Counting With Density Guidance For Scene Adaptation
by: Chen, Jiwei, et al.
Published: (2026)

Quantum-inspired Interpretable Deep Learning Architecture for Text Sentiment Analysis
by: Li, Bingyu, et al.
Published: (2024)

Single Domain Generalization for Crowd Counting
by: Peng, Zhuoxuan, et al.
Published: (2024)

ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
by: Lan, Mengcheng, et al.
Published: (2024)

Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation
by: Jia, Yuyu, et al.
Published: (2024)

Granular Ball Guided Stable Latent Domain Discovery for Domain-General Crowd Counting
by: Chen, Fan, et al.
Published: (2026)

DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies
by: Wang, Renke, et al.
Published: (2026)

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
by: Guo, Mingyue, et al.
Published: (2023)

Boosting SAM for Cross-Domain Few-Shot Segmentation via Conditional Point Sparsification
by: Nie, Jiahao, et al.
Published: (2026)

Proxy Denoising for Source-Free Domain Adaptation
by: Tang, Song, et al.
Published: (2024)

Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification
by: Zhang, Yukang, et al.
Published: (2024)

Domain Game: Disentangle Anatomical Feature for Single Domain Generalized Segmentation
by: Chen, Hao, et al.
Published: (2024)

AHAP: Reconstructing Arbitrary Humans from Arbitrary Perspectives with Geometric Priors
by: Qiao, Xiaozhen, et al.
Published: (2026)

SamLP: A Customized Segment Anything Model for License Plate Detection
by: Ding, Haoxuan, et al.
Published: (2024)

Referring Video Object Segmentation with Cross-Modality Proxy Queries
by: Sun, Baoli, et al.
Published: (2025)

HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos
by: Xiong, Weitao, et al.
Published: (2025)

Towards Diverse Binary Segmentation via A Simple yet General Gated Network
by: Zhao, Xiaoqi, et al.
Published: (2023)

M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis
by: Li, Junyu, et al.
Published: (2024)

Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian Detection
by: Gao, Feng, et al.
Published: (2023)

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
by: Jing, Linglin, et al.
Published: (2024)

DyCrowd: Towards Dynamic Crowd Reconstruction from a Large-scene Video
by: Wen, Hao, et al.
Published: (2025)

Open-Vocabulary Domain Generalization in Urban-Scene Segmentation
by: Zhao, Dong, et al.
Published: (2026)