Saved in:
| Main Authors: | Yang, Yuchen, Yan, Haoran, Chen, Yanhao, Wu, Qingqiang, Hong, Qingqi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.18327 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Intra- and Cross-frame Topological Consistency Scheme for Semi-supervised Atherosclerotic Coronary Plaque Segmentation
by: Zhang, Ziheng, et al.
Published: (2025)
by: Zhang, Ziheng, et al.
Published: (2025)
STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation
by: Shan, Dandan, et al.
Published: (2025)
by: Shan, Dandan, et al.
Published: (2025)
Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024)
by: Du, Yongkun, et al.
Published: (2024)
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
by: Luan, Bozhi, et al.
Published: (2024)
by: Luan, Bozhi, et al.
Published: (2024)
Scale-aware Adaptive Supervised Network with Limited Medical Annotations
by: Li, Zihan, et al.
Published: (2026)
by: Li, Zihan, et al.
Published: (2026)
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
by: Jiang, Jiayu, et al.
Published: (2025)
by: Jiang, Jiayu, et al.
Published: (2025)
Pura: An Efficient Privacy-Preserving Solution for Face Recognition
by: Xu, Guotao, et al.
Published: (2025)
by: Xu, Guotao, et al.
Published: (2025)
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
by: Zhu, Muzhi, et al.
Published: (2025)
by: Zhu, Muzhi, et al.
Published: (2025)
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
by: Zhang, Tianle, et al.
Published: (2024)
by: Zhang, Tianle, et al.
Published: (2024)
Understanding Reward Hacking in Text-to-Image Reinforcement Learning
by: Hong, Yunqi, et al.
Published: (2026)
by: Hong, Yunqi, et al.
Published: (2026)
Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
by: Rao, Yuchen, et al.
Published: (2025)
by: Rao, Yuchen, et al.
Published: (2025)
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
by: Tudosiu, Petru-Daniel, et al.
Published: (2024)
by: Tudosiu, Petru-Daniel, et al.
Published: (2024)
Learning Multi-dimensional Human Preference for Text-to-Image Generation
by: Zhang, Sixian, et al.
Published: (2024)
by: Zhang, Sixian, et al.
Published: (2024)
GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts
by: Kang, Jenna, et al.
Published: (2025)
by: Kang, Jenna, et al.
Published: (2025)
Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA
by: Yu, Ruizhi, et al.
Published: (2026)
by: Yu, Ruizhi, et al.
Published: (2026)
Out-of-Distribution Detection with Prototypical Outlier Proxy
by: Gong, Mingrong, et al.
Published: (2024)
by: Gong, Mingrong, et al.
Published: (2024)
Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition
by: Liu, Mengyuan, et al.
Published: (2024)
by: Liu, Mengyuan, et al.
Published: (2024)
VidText: Towards Comprehensive Evaluation for Video Text Understanding
by: Yang, Zhoufaran, et al.
Published: (2025)
by: Yang, Zhoufaran, et al.
Published: (2025)
InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing
by: Yu, Haoran, et al.
Published: (2025)
by: Yu, Haoran, et al.
Published: (2025)
Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
by: Yang, Huan, et al.
Published: (2024)
by: Yang, Huan, et al.
Published: (2024)
SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization
by: Zhuang, Peiyu, et al.
Published: (2026)
by: Zhuang, Peiyu, et al.
Published: (2026)
Progressive Video Condensation with MLLM Agent for Long-form Video Understanding
by: Yin, Yufei, et al.
Published: (2026)
by: Yin, Yufei, et al.
Published: (2026)
LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations
by: Yang, Zhichao, et al.
Published: (2025)
by: Yang, Zhichao, et al.
Published: (2025)
Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation
by: Eldesokey, Abdelrahman, et al.
Published: (2026)
by: Eldesokey, Abdelrahman, et al.
Published: (2026)
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
by: Peng, Liang, et al.
Published: (2025)
by: Peng, Liang, et al.
Published: (2025)
UNIT: Unifying Image and Text Recognition in One Vision Encoder
by: Zhu, Yi, et al.
Published: (2024)
by: Zhu, Yi, et al.
Published: (2024)
Large-scale Remote Sensing Image Target Recognition and Automatic Annotation
by: Dong, Wuzheng
Published: (2024)
by: Dong, Wuzheng
Published: (2024)
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
by: Lin, Tiancheng, et al.
Published: (2024)
by: Lin, Tiancheng, et al.
Published: (2024)
Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
by: Zhou, Yuchen, et al.
Published: (2024)
by: Zhou, Yuchen, et al.
Published: (2024)
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
by: Pan, Yuchen, et al.
Published: (2024)
by: Pan, Yuchen, et al.
Published: (2024)
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations
by: Li, Zejian, et al.
Published: (2024)
by: Li, Zejian, et al.
Published: (2024)
ECNet: Effective Controllable Text-to-Image Diffusion Models
by: Li, Sicheng, et al.
Published: (2024)
by: Li, Sicheng, et al.
Published: (2024)
Rich Human Feedback for Text-to-Image Generation
by: Liang, Youwei, et al.
Published: (2023)
by: Liang, Youwei, et al.
Published: (2023)
Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models
by: Li, Boheng, et al.
Published: (2024)
by: Li, Boheng, et al.
Published: (2024)
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
by: Tian, Feng, et al.
Published: (2024)
by: Tian, Feng, et al.
Published: (2024)
SNN-Driven Multimodal Human Action Recognition via Sparse Spatial-Temporal Data Fusion
by: Zheng, Naichuan, et al.
Published: (2025)
by: Zheng, Naichuan, et al.
Published: (2025)
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
by: Zhang, Gaoyang, et al.
Published: (2024)
by: Zhang, Gaoyang, et al.
Published: (2024)
LRSAA: Large-scale Remote Sensing Image Target Recognition and Automatic Annotation
by: Dong, Wuzheng, et al.
Published: (2024)
by: Dong, Wuzheng, et al.
Published: (2024)
Similar Items
-
An Intra- and Cross-frame Topological Consistency Scheme for Semi-supervised Atherosclerotic Coronary Plaque Segmentation
by: Zhang, Ziheng, et al.
Published: (2025) -
STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation
by: Shan, Dandan, et al.
Published: (2025) -
Instruction-Guided Scene Text Recognition
by: Du, Yongkun, et al.
Published: (2024) -
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
by: Wu, Xun, et al.
Published: (2024) -
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
by: Luan, Bozhi, et al.
Published: (2024)