Saved in:
| Main Authors: | Han, Tianyang, Su, Junhao, Hu, Junjie, Yang, Peizhen, Shi, Hengyu, Luo, Junfeng, Gao, Jialin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.18271 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
by: Shi, Hengyu, et al.
Published: (2025)
by: Shi, Hengyu, et al.
Published: (2025)
MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks
by: Su, Junhao, et al.
Published: (2025)
by: Su, Junhao, et al.
Published: (2025)
Advancing Supervised Local Learning Beyond Classification with Long-term Feature Bank
by: Zhu, Feiyu, et al.
Published: (2024)
by: Zhu, Feiyu, et al.
Published: (2024)
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
by: Su, Junhao, et al.
Published: (2025)
by: Su, Junhao, et al.
Published: (2025)
Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
by: Han, Tianyang, et al.
Published: (2026)
by: Han, Tianyang, et al.
Published: (2026)
PositionIC: Unified Position and Identity Consistency for Image Customization
by: Hu, Junjie, et al.
Published: (2025)
by: Hu, Junjie, et al.
Published: (2025)
Replacement Learning: Training Neural Networks with Fewer Parameters
by: Zhang, Yuming, et al.
Published: (2026)
by: Zhang, Yuming, et al.
Published: (2026)
PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
by: Chen, Sixiang, et al.
Published: (2026)
by: Chen, Sixiang, et al.
Published: (2026)
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
by: Zhao, Xiangyu, et al.
Published: (2025)
by: Zhao, Xiangyu, et al.
Published: (2025)
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
by: Shi, Hengyu, et al.
Published: (2026)
by: Shi, Hengyu, et al.
Published: (2026)
WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation
by: Zhang, Daoan, et al.
Published: (2025)
by: Zhang, Daoan, et al.
Published: (2025)
Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
by: Sun, Haoze, et al.
Published: (2024)
by: Sun, Haoze, et al.
Published: (2024)
From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models
by: Yang, Cheng, et al.
Published: (2026)
by: Yang, Cheng, et al.
Published: (2026)
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
by: HunyuanWorld Team, et al.
Published: (2025)
by: HunyuanWorld Team, et al.
Published: (2025)
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
by: Chen, SiXiang, et al.
Published: (2025)
by: Chen, SiXiang, et al.
Published: (2025)
RiskCueBench: Benchmarking Anticipatory Reasoning from Early Risk Cues in Video-Language Models
by: Luo, Sha, et al.
Published: (2026)
by: Luo, Sha, et al.
Published: (2026)
Beyond Pixels: Medical Image Quality Assessment with Implicit Neural Representations
by: Özer, Caner, et al.
Published: (2025)
by: Özer, Caner, et al.
Published: (2025)
PixelFlow: Pixel-Space Generative Models with Flow
by: Chen, Shoufa, et al.
Published: (2025)
by: Chen, Shoufa, et al.
Published: (2025)
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
by: Li, Hongyu, et al.
Published: (2025)
by: Li, Hongyu, et al.
Published: (2025)
Towards Alignment-Centric Paradigm: A Survey of Instruction Tuning in Large Language Models
by: Han, Xudong, et al.
Published: (2025)
by: Han, Xudong, et al.
Published: (2025)
Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix
by: Pang, Junbiao, et al.
Published: (2025)
by: Pang, Junbiao, et al.
Published: (2025)
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
by: Wang, Haozhe, et al.
Published: (2025)
by: Wang, Haozhe, et al.
Published: (2025)
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
by: Zhang, David Junhao, et al.
Published: (2023)
by: Zhang, David Junhao, et al.
Published: (2023)
Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models
by: Shi, Yuyan, et al.
Published: (2024)
by: Shi, Yuyan, et al.
Published: (2024)
Reinforcing Diffusion Models by Direct Group Preference Optimization
by: Luo, Yihong, et al.
Published: (2025)
by: Luo, Yihong, et al.
Published: (2025)
Beyond Pixel Histories: World Models with Persistent 3D State
by: Garcin, Samuel, et al.
Published: (2026)
by: Garcin, Samuel, et al.
Published: (2026)
PixelLM: Pixel Reasoning with Large Multimodal Model
by: Ren, Zhongwei, et al.
Published: (2023)
by: Ren, Zhongwei, et al.
Published: (2023)
Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning
by: Xu, Yu, et al.
Published: (2026)
by: Xu, Yu, et al.
Published: (2026)
From Pixels to Words -- Towards Native One-Vision Models at Scale
by: Diao, Haiwen, et al.
Published: (2026)
by: Diao, Haiwen, et al.
Published: (2026)
FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning
by: Jiang, Yue, et al.
Published: (2025)
by: Jiang, Yue, et al.
Published: (2025)
Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs
by: Gao, Xin, et al.
Published: (2026)
by: Gao, Xin, et al.
Published: (2026)
Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning
by: Li, Xuchen, et al.
Published: (2025)
by: Li, Xuchen, et al.
Published: (2025)
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
by: Luo, Weijian, et al.
Published: (2023)
by: Luo, Weijian, et al.
Published: (2023)
Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
by: Zhang, Wenchao, et al.
Published: (2025)
by: Zhang, Wenchao, et al.
Published: (2025)
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
by: Luo, Yihong, et al.
Published: (2026)
by: Luo, Yihong, et al.
Published: (2026)
Compression Beyond Pixels: Semantic Compression with Multimodal Foundation Models
by: Shen, Ruiqi, et al.
Published: (2025)
by: Shen, Ruiqi, et al.
Published: (2025)
T2VWorldBench: A Benchmark for Evaluating World Knowledge in Text-to-Video Generation
by: Chen, Yubin, et al.
Published: (2025)
by: Chen, Yubin, et al.
Published: (2025)
Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark
by: Hu, Jinpeng, et al.
Published: (2025)
by: Hu, Jinpeng, et al.
Published: (2025)
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
by: Luo, Yihong, et al.
Published: (2025)
by: Luo, Yihong, et al.
Published: (2025)
Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control
by: Han, Minghao, et al.
Published: (2025)
by: Han, Minghao, et al.
Published: (2025)
Similar Items
-
LayoutCoT: Unleashing the Deep Reasoning Potential of Large Language Models for Layout Generation
by: Shi, Hengyu, et al.
Published: (2025) -
MAN++: Scaling Momentum Auxiliary Network for Supervised Local Learning in Vision Tasks
by: Su, Junhao, et al.
Published: (2025) -
Advancing Supervised Local Learning Beyond Classification with Long-term Feature Bank
by: Zhu, Feiyu, et al.
Published: (2024) -
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
by: Su, Junhao, et al.
Published: (2025) -
Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
by: Han, Tianyang, et al.
Published: (2026)