:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yuchen, Yan, Haoran, Chen, Yanhao, Wu, Qingqiang, Hong, Qingqi
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.18327
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Intra- and Cross-frame Topological Consistency Scheme for Semi-supervised Atherosclerotic Coronary Plaque Segmentation
by: Zhang, Ziheng, et al.
Published: (2025)

STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation
by: Shan, Dandan, et al.
Published: (2025)

Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
by: Wu, Xun, et al.
Published: (2024)

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
by: Luan, Bozhi, et al.
Published: (2024)

Scale-aware Adaptive Supervised Network with Limited Medical Annotations
by: Li, Zihan, et al.
Published: (2026)

Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
by: Jiang, Jiayu, et al.
Published: (2025)

Pura: An Efficient Privacy-Preserving Solution for Face Recognition
by: Xu, Guotao, et al.
Published: (2025)

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
by: Zhu, Muzhi, et al.
Published: (2025)

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
by: Chen, Wei, et al.
Published: (2024)

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
by: Zhang, Tianle, et al.
Published: (2024)

Understanding Reward Hacking in Text-to-Image Reinforcement Learning
by: Hong, Yunqi, et al.
Published: (2026)

Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
by: Rao, Yuchen, et al.
Published: (2025)

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
by: Tudosiu, Petru-Daniel, et al.
Published: (2024)

Learning Multi-dimensional Human Preference for Text-to-Image Generation
by: Zhang, Sixian, et al.
Published: (2024)

GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts
by: Kang, Jenna, et al.
Published: (2025)

Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA
by: Yu, Ruizhi, et al.
Published: (2026)

Out-of-Distribution Detection with Prototypical Outlier Proxy
by: Gong, Mingrong, et al.
Published: (2024)

Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition
by: Liu, Mengyuan, et al.
Published: (2024)

VidText: Towards Comprehensive Evaluation for Video Text Understanding
by: Yang, Zhoufaran, et al.
Published: (2025)

InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing
by: Yu, Haoran, et al.
Published: (2025)

Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
by: Yang, Huan, et al.
Published: (2024)

SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization
by: Zhuang, Peiyu, et al.
Published: (2026)

Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)

LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations
by: Yang, Zhichao, et al.
Published: (2025)

Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2026)

SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
by: Peng, Liang, et al.
Published: (2025)

UNIT: Unifying Image and Text Recognition in One Vision Encoder
by: Zhu, Yi, et al.
Published: (2024)

Large-scale Remote Sensing Image Target Recognition and Automatic Annotation
by: Dong, Wuzheng
Published: (2024)

Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
by: Lin, Tiancheng, et al.
Published: (2024)

Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
by: Zhou, Yuchen, et al.
Published: (2024)

OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
by: Pan, Yuchen, et al.
Published: (2024)

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations
by: Li, Zejian, et al.
Published: (2024)

ECNet: Effective Controllable Text-to-Image Diffusion Models
by: Li, Sicheng, et al.
Published: (2024)

Rich Human Feedback for Text-to-Image Generation
by: Liang, Youwei, et al.
Published: (2023)

Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models
by: Li, Boheng, et al.
Published: (2024)

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
by: Tian, Feng, et al.
Published: (2024)

SNN-Driven Multimodal Human Action Recognition via Sparse Spatial-Temporal Data Fusion
by: Zheng, Naichuan, et al.
Published: (2025)

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
by: Zhang, Gaoyang, et al.
Published: (2024)

LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic Annotation
by: Dong, Wuzheng, et al.
Published: (2024)