Saved in:
| Main Authors: | Du, Tianxiang, He, Hulingxiao, Peng, Yuxin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.22126 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Venus: Benchmarking and Empowering Multimodal Large Language Models for Aesthetic Guidance and Cropping
by: Du, Tianxiang, et al.
Published: (2026)
by: Du, Tianxiang, et al.
Published: (2026)
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026)
by: He, Hulingxiao, et al.
Published: (2026)
Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
by: He, Hulingxiao, et al.
Published: (2026)
by: He, Hulingxiao, et al.
Published: (2026)
CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting
by: He, Hulingxiao, et al.
Published: (2024)
by: He, Hulingxiao, et al.
Published: (2024)
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)
by: He, Hulingxiao, et al.
Published: (2025)
AesRM: Improving Video Aesthetics with Expert-Level Feedback
by: Han, Yujin, et al.
Published: (2026)
by: Han, Yujin, et al.
Published: (2026)
AccelAes: Accelerating Diffusion Transformers for Training-Free Aesthetic-Enhanced Image Generation
by: Yin, Xuanhua, et al.
Published: (2026)
by: Yin, Xuanhua, et al.
Published: (2026)
TCI-Former: Thermal Conduction-Inspired Transformer for Infrared Small Target Detection
by: Chen, Tianxiang, et al.
Published: (2024)
by: Chen, Tianxiang, et al.
Published: (2024)
AesCrop: Aesthetic-driven Cropping Guided by Composition
by: Wong, Yen-Hong, et al.
Published: (2025)
by: Wong, Yen-Hong, et al.
Published: (2025)
AesTest: Measuring Aesthetic Intelligence from Perception to Production
by: Wang, Guolong, et al.
Published: (2025)
by: Wang, Guolong, et al.
Published: (2025)
TATTOO: Training-free AesTheTic-aware Outfit recOmmendation
by: Wu, Yuntian, et al.
Published: (2025)
by: Wu, Yuntian, et al.
Published: (2025)
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
VM-BeautyNet: A Synergistic Ensemble of Vision Transformer and Mamba for Facial Beauty Prediction
by: Boukhari, Djamel Eddine
Published: (2025)
by: Boukhari, Djamel Eddine
Published: (2025)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
Aes3D: Aesthetic Assessment in 3D Gaussian Splatting
by: Xu, Chuanzhi, et al.
Published: (2026)
by: Xu, Chuanzhi, et al.
Published: (2026)
AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer
by: Kwon, Joonwoo, et al.
Published: (2023)
by: Kwon, Joonwoo, et al.
Published: (2023)
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
by: He, Chenhang, et al.
Published: (2024)
by: He, Chenhang, et al.
Published: (2024)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
HyPCV-Former: Hyperbolic Spatio-Temporal Transformer for 3D Point Cloud Video Anomaly Detection
by: Cao, Jiaping, et al.
Published: (2025)
by: Cao, Jiaping, et al.
Published: (2025)
CascadeFormer: A Family of Two-stage Cascading Transformers for Skeleton-based Human Action Recognition
by: Peng, Yusen, et al.
Published: (2025)
by: Peng, Yusen, et al.
Published: (2025)
STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection
by: Ma, Xiaowen, et al.
Published: (2024)
by: Ma, Xiaowen, et al.
Published: (2024)
Deceptive Beauty: Evaluating the Impact of Beauty Filters on Deepfake and Morphing Attack Detection
by: Concas, Sara, et al.
Published: (2025)
by: Concas, Sara, et al.
Published: (2025)
HiMemFormer: Hierarchical Memory-Aware Transformer for Multi-Agent Action Anticipation
by: Wang, Zirui, et al.
Published: (2024)
by: Wang, Zirui, et al.
Published: (2024)
WidthFormer: Toward Efficient Transformer-based BEV View Transformation
by: Yang, Chenhongyi, et al.
Published: (2024)
by: Yang, Chenhongyi, et al.
Published: (2024)
OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images
by: Zhao, Jiaqi, et al.
Published: (2024)
by: Zhao, Jiaqi, et al.
Published: (2024)
LoFormer: Local Frequency Transformer for Image Deblurring
by: Mao, Xintian, et al.
Published: (2024)
by: Mao, Xintian, et al.
Published: (2024)
CountFormer: Multi-View Crowd Counting Transformer
by: Mo, Hong, et al.
Published: (2024)
by: Mo, Hong, et al.
Published: (2024)
GridFormer: Point-Grid Transformer for Surface Reconstruction
by: Li, Shengtao, et al.
Published: (2024)
by: Li, Shengtao, et al.
Published: (2024)
GeoFormer: A Multi-Polygon Segmentation Transformer
by: Khomiakov, Maxim, et al.
Published: (2024)
by: Khomiakov, Maxim, et al.
Published: (2024)
MonoFormer: One Transformer for Both Diffusion and Autoregression
by: Zhao, Chuyang, et al.
Published: (2024)
by: Zhao, Chuyang, et al.
Published: (2024)
RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing
by: Li, Jiahang, et al.
Published: (2023)
by: Li, Jiahang, et al.
Published: (2023)
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment
by: Liao, Zhichao, et al.
Published: (2025)
by: Liao, Zhichao, et al.
Published: (2025)
SLAM-Former: Putting SLAM into One Transformer
by: Yuan, Yijun, et al.
Published: (2025)
by: Yuan, Yijun, et al.
Published: (2025)
HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation
by: Alyoussef, Haya, et al.
Published: (2026)
by: Alyoussef, Haya, et al.
Published: (2026)
ModalFormer: Multimodal Transformer for Low-Light Image Enhancement
by: Brateanu, Alexandru, et al.
Published: (2025)
by: Brateanu, Alexandru, et al.
Published: (2025)
CompetitorFormer: Competitor Transformer for 3D Instance Segmentation
by: Wang, Duanchu, et al.
Published: (2024)
by: Wang, Duanchu, et al.
Published: (2024)
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
by: Do, Jeonghyeok, et al.
Published: (2024)
by: Do, Jeonghyeok, et al.
Published: (2024)
Proto-Former: Unified Facial Landmark Detection by Prototype Transformer
by: Hu, Shengkai, et al.
Published: (2025)
by: Hu, Shengkai, et al.
Published: (2025)
MixFormerV2: Efficient Fully Transformer Tracking
by: Cui, Yutao, et al.
Published: (2023)
by: Cui, Yutao, et al.
Published: (2023)
FuseFormer: A Transformer for Visual and Thermal Image Fusion
by: Erdogan, Aytekin, et al.
Published: (2024)
by: Erdogan, Aytekin, et al.
Published: (2024)
Similar Items
-
Venus: Benchmarking and Empowering Multimodal Large Language Models for Aesthetic Guidance and Cropping
by: Du, Tianxiang, et al.
Published: (2026) -
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026) -
Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
by: He, Hulingxiao, et al.
Published: (2026) -
CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting
by: He, Hulingxiao, et al.
Published: (2024) -
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)