:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Deng, Xueqing, Yu, Qihang, Athar, Ali, Yang, Chenglin, Yang, Linjie, Jin, Xiaojie, Shen, Xiaohui, Chen, Liang-Chieh
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.02589
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

COCONut: Modernizing COCO Segmentation
by: Deng, Xueqing, et al.
Published: (2024)

ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
by: Athar, Ali, et al.
Published: (2024)

Randomized Autoregressive Visual Generation
by: Yu, Qihang, et al.
Published: (2024)

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
by: He, Ju, et al.
Published: (2023)

PanDepth: Joint Panoptic Segmentation and Depth Completion
by: Lagos, Juan, et al.
Published: (2022)

1.58-bit FLUX
by: Yang, Chenglin, et al.
Published: (2024)

An Image is Worth 32 Tokens for Reconstruction and Generation
by: Yu, Qihang, et al.
Published: (2024)

MaskBit: Embedding-free Image Generation via Bit Tokens
by: Weber, Mark, et al.
Published: (2024)

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
by: Kim, Dongwon, et al.
Published: (2025)

FingerCap: Fine-grained Finger-level Hand Motion Captioning
by: Shen, Xin, et al.
Published: (2025)

PanORama: Multiview Consistent Panoptic Segmentation in Operating Rooms
by: Gürbüz, Tuna, et al.
Published: (2026)

PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
by: Žust, Lojze, et al.
Published: (2024)

PanDA: Unsupervised Domain Adaptation for Multimodal 3D Panoptic Segmentation in Autonomous Driving
by: Pan, Yining, et al.
Published: (2026)

GroundCap: A Visually Grounded Image Captioning Dataset
by: Oliveira, Daniel A. P., et al.
Published: (2025)

PanSt3R: Multi-view Consistent Panoptic Segmentation
by: Zust, Lojze, et al.
Published: (2025)

CompCap: Improving Multimodal Large Language Models with Composite Captions
by: Chen, Xiaohui, et al.
Published: (2024)

SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding
by: Yang, Zhiliu, et al.
Published: (2025)

FSAR-Cap: A Fine-Grained Two-Stage Annotated Dataset for SAR Image Captioning
by: Zhang, Jinqi, et al.
Published: (2025)

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding
by: Zheng, Lihao, et al.
Published: (2026)

ViTamin: Designing Scalable Vision Models in the Vision-Language Era
by: Chen, Jieneng, et al.
Published: (2024)

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
by: Li, Xiangtai, et al.
Published: (2023)

Panoptic Captioning: An Equivalence Bridge for Image and Text
by: Lin, Kun-Yu, et al.
Published: (2025)

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
by: Shin, Inkyu, et al.
Published: (2024)

Scene Graph-guided SegCaptioning Transformer with Fine-grained Alignment for Controllable Video Segmentation and Captioning
by: Zhang, Xu, et al.
Published: (2026)

VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
by: Wang, Ziteng, et al.
Published: (2025)

VoCap: Video Object Captioning and Segmentation from Any Prompt
by: Uijlings, Jasper, et al.
Published: (2025)

ProCap: Projection-Aware Captioning for Spatial Augmented Reality
by: Cao, Zimo, et al.
Published: (2026)

MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
by: Martinović, Ivan, et al.
Published: (2024)

Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
by: Ren, Sucheng, et al.
Published: (2025)

Frequency-Aware Flow Matching for High-Quality Image Generation
by: Ren, Sucheng, et al.
Published: (2026)

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization
by: Liu, Qihao, et al.
Published: (2024)

FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
by: Ren, Sucheng, et al.
Published: (2024)

Deeply Supervised Flow-Based Generative Models
by: Shin, Inkyu, et al.
Published: (2025)

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
by: Yang, Chenglin, et al.
Published: (2023)

Video ReCap: Recursive Captioning of Hour-Long Videos
by: Islam, Md Mohaiminul, et al.
Published: (2024)

Rational Design Strategies in DNA‐Encoded Libraries for Drug Discovery
by: Xudong Wang, et al.
Published: (2025)

COCO-OLAC: A Benchmark for Occluded Panoptic Segmentation and Image Understanding
by: Wei, Wenbo, et al.
Published: (2024)

ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding
by: Cao, Shuo, et al.
Published: (2025)

CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval
by: Xu, Yifan, et al.
Published: (2024)

Open-World Panoptic Segmentation
by: Sodano, Matteo, et al.
Published: (2024)