:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Tianyang, Su, Junhao, Hu, Junjie, Yang, Peizhen, Shi, Hengyu, Luo, Junfeng, Gao, Jialin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.18271
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
by: Shi, Hengyu, et al.
Published: (2025)

MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks
by: Su, Junhao, et al.
Published: (2025)

Advancing Supervised Local Learning Beyond Classification with Long-term Feature Bank
by: Zhu, Feiyu, et al.
Published: (2024)

Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
by: Su, Junhao, et al.
Published: (2025)

Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
by: Han, Tianyang, et al.
Published: (2026)

PositionIC: Unified Position and Identity Consistency for Image Customization
by: Hu, Junjie, et al.
Published: (2025)

Replacement Learning: Training Neural Networks with Fewer Parameters
by: Zhang, Yuming, et al.
Published: (2026)

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
by: Chen, Sixiang, et al.
Published: (2026)

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
by: Zhao, Xiangyu, et al.
Published: (2025)

Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
by: Shi, Hengyu, et al.
Published: (2026)

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation
by: Zhang, Daoan, et al.
Published: (2025)

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
by: Sun, Haoze, et al.
Published: (2024)

From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models
by: Yang, Cheng, et al.
Published: (2026)

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
by: HunyuanWorld Team, et al.
Published: (2025)

PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
by: Chen, SiXiang, et al.
Published: (2025)

RiskCueBench: Benchmarking Anticipatory Reasoning from Early Risk Cues in Video-Language Models
by: Luo, Sha, et al.
Published: (2026)

Beyond Pixels: Medical Image Quality Assessment with Implicit Neural Representations
by: Özer, Caner, et al.
Published: (2025)

PixelFlow: Pixel-Space Generative Models with Flow
by: Chen, Shoufa, et al.
Published: (2025)

Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
by: Li, Hongyu, et al.
Published: (2025)

Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models
by: Han, Xudong, et al.
Published: (2025)

Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix
by: Pang, Junbiao, et al.
Published: (2025)

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
by: Wang, Haozhe, et al.
Published: (2025)

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
by: Zhang, David Junhao, et al.
Published: (2023)

Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models
by: Shi, Yuyan, et al.
Published: (2024)

Reinforcing Diffusion Models by Direct Group Preference Optimization
by: Luo, Yihong, et al.
Published: (2025)

Beyond Pixel Histories: World Models with Persistent 3D State
by: Garcin, Samuel, et al.
Published: (2026)

PixelLM: Pixel Reasoning with Large Multimodal Model
by: Ren, Zhongwei, et al.
Published: (2023)

Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning
by: Xu, Yu, et al.
Published: (2026)

From Pixels to Words -- Towards Native One-Vision Models at Scale
by: Diao, Haiwen, et al.
Published: (2026)

FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning
by: Jiang, Yue, et al.
Published: (2025)

Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs
by: Gao, Xin, et al.
Published: (2026)

Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning
by: Li, Xuchen, et al.
Published: (2025)

Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
by: Luo, Weijian, et al.
Published: (2023)

Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
by: Zhang, Wenchao, et al.
Published: (2025)

TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
by: Luo, Yihong, et al.
Published: (2026)

Compression Beyond Pixels: Semantic Compression with Multimodal Foundation Models
by: Shen, Ruiqi, et al.
Published: (2025)

T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
by: Chen, Yubin, et al.
Published: (2025)

Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark
by: Hu, Jinpeng, et al.
Published: (2025)

Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
by: Luo, Yihong, et al.
Published: (2025)

Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control
by: Han, Minghao, et al.
Published: (2025)