Saved in:
| Main Authors: | Li, Daiqing, Kamko, Aleks, Akhgari, Ehsan, Sabet, Ali, Xu, Linmiao, Doshi, Suhail |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.17245 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
by: Liu, Bingchen, et al.
Published: (2024)
by: Liu, Bingchen, et al.
Published: (2024)
Image Aesthetics Assessment using Multi Channel Convolutional Neural Networks
by: Doshi, Nishi, et al.
Published: (2019)
by: Doshi, Nishi, et al.
Published: (2019)
The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers
by: Qi, Daiqing, et al.
Published: (2025)
by: Qi, Daiqing, et al.
Published: (2025)
Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation
by: Ogezi, Michael, et al.
Published: (2024)
by: Ogezi, Michael, et al.
Published: (2024)
Visual IRL for Human-Like Robotic Manipulation
by: Asali, Ehsan, et al.
Published: (2024)
by: Asali, Ehsan, et al.
Published: (2024)
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs
by: Wu, Daiqing, et al.
Published: (2025)
by: Wu, Daiqing, et al.
Published: (2025)
MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation
by: Asali, Ehsan, et al.
Published: (2023)
by: Asali, Ehsan, et al.
Published: (2023)
Aesthetic Image Captioning with Saliency Enhanced MLLMs
by: Tao, Yilin, et al.
Published: (2025)
by: Tao, Yilin, et al.
Published: (2025)
Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
by: Zhang, Yan, et al.
Published: (2024)
by: Zhang, Yan, et al.
Published: (2024)
Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics
by: Wang, Yunlong, et al.
Published: (2026)
by: Wang, Yunlong, et al.
Published: (2026)
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
by: Liu, Zeyu, et al.
Published: (2024)
by: Liu, Zeyu, et al.
Published: (2024)
VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
by: Yang, Lehan, et al.
Published: (2025)
by: Yang, Lehan, et al.
Published: (2025)
AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation
by: Yin, Xuanhua, et al.
Published: (2026)
by: Yin, Xuanhua, et al.
Published: (2026)
Beyond Aesthetics: Cultural Competence in Text-to-Image Models
by: Kannen, Nithish, et al.
Published: (2024)
by: Kannen, Nithish, et al.
Published: (2024)
Aesthetics as Structural Harm: Algorithmic Lookism Across Text-to-Image Generation and Classification
by: Doh, Miriam, et al.
Published: (2026)
by: Doh, Miriam, et al.
Published: (2026)
Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts
by: Xie, Enze, et al.
Published: (2024)
by: Xie, Enze, et al.
Published: (2024)
PEO: Training-Free Aesthetic Quality Enhancement in Pre-Trained Text-to-Image Diffusion Models with Prompt Embedding Optimization
by: Margaryan, Hovhannes, et al.
Published: (2025)
by: Margaryan, Hovhannes, et al.
Published: (2025)
Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency
by: Sun, Wei, et al.
Published: (2024)
by: Sun, Wei, et al.
Published: (2024)
Beyond Detection: A Structure-Aware Framework for Scene Text Tracking
by: Yu, Chenmin, et al.
Published: (2026)
by: Yu, Chenmin, et al.
Published: (2026)
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
by: Wang, Haofan, et al.
Published: (2024)
by: Wang, Haofan, et al.
Published: (2024)
Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art
by: Jin, Zhe, et al.
Published: (2025)
by: Jin, Zhe, et al.
Published: (2025)
Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis
by: Wang, Songping, et al.
Published: (2025)
by: Wang, Songping, et al.
Published: (2025)
Advancing Aesthetic Image Generation via Composition Transfer
by: Zou, Kai, et al.
Published: (2026)
by: Zou, Kai, et al.
Published: (2026)
Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning
by: Liu, Yuti, et al.
Published: (2024)
by: Liu, Yuti, et al.
Published: (2024)
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
by: Zhou, Hantao, et al.
Published: (2024)
by: Zhou, Hantao, et al.
Published: (2024)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts
by: He, Junwen, et al.
Published: (2024)
by: He, Junwen, et al.
Published: (2024)
Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field
by: Tang, Sheyang, et al.
Published: (2026)
by: Tang, Sheyang, et al.
Published: (2026)
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
by: Wu, Guangyang, et al.
Published: (2024)
by: Wu, Guangyang, et al.
Published: (2024)
SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards
by: Gao, Yuan, et al.
Published: (2025)
by: Gao, Yuan, et al.
Published: (2025)
Spectral Image Tokenizer
by: Esteves, Carlos, et al.
Published: (2024)
by: Esteves, Carlos, et al.
Published: (2024)
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
by: Chen, SiXiang, et al.
Published: (2025)
by: Chen, SiXiang, et al.
Published: (2025)
One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment
by: Yin, Wen, et al.
Published: (2026)
by: Yin, Wen, et al.
Published: (2026)
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
by: Li, Mingxing, et al.
Published: (2025)
by: Li, Mingxing, et al.
Published: (2025)
Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception
by: Nan, Xinyu, et al.
Published: (2026)
by: Nan, Xinyu, et al.
Published: (2026)
EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration
by: Wu, Daiqing, et al.
Published: (2025)
by: Wu, Daiqing, et al.
Published: (2025)
G-Refine: A General Quality Refiner for Text-to-Image Generation
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
A Survey on Quality Metrics for Text-to-Image Generation
by: Hartwig, Sebastian, et al.
Published: (2024)
by: Hartwig, Sebastian, et al.
Published: (2024)
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
by: Cao, Shuo, et al.
Published: (2025)
by: Cao, Shuo, et al.
Published: (2025)
Rethinking Position Embedding as a Context Controller for Multi-Reference and Multi-Shot Video Generation
by: Huang, Binyuan, et al.
Published: (2026)
by: Huang, Binyuan, et al.
Published: (2026)
An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation
by: Jin, Xin, et al.
Published: (2024)
by: Jin, Xin, et al.
Published: (2024)
Similar Items
-
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models
by: Liu, Bingchen, et al.
Published: (2024) -
Image Aesthetics Assessment using Multi Channel Convolutional Neural Networks
by: Doshi, Nishi, et al.
Published: (2019) -
The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers
by: Qi, Daiqing, et al.
Published: (2025) -
Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation
by: Ogezi, Michael, et al.
Published: (2024) -
Visual IRL for Human-Like Robotic Manipulation
by: Asali, Ehsan, et al.
Published: (2024)