:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Peiyu, Peng, Yi, Gan, Yimeng, Hu, Liang, Xie, Tianyidan, Wang, Xiaokun, Wei, Yichen, Tang, Chuanxin, Zhu, Bo, Li, Changshi, Wei, Hongyang, Li, Eric, Song, Xuchen, Liu, Yang, Zhou, Yahui
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.03320
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model
by: Wei, Hongyang, et al.
Published: (2025)

Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling
by: Wei, Hongyang, et al.
Published: (2026)

Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
by: Wang, Xiaokun, et al.
Published: (2025)

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
by: Wang, Peiyu, et al.
Published: (2025)

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
by: Peng, Yi, et al.
Published: (2025)

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
by: Zhang, Yifan, et al.
Published: (2025)

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
by: Zeng, Liang, et al.
Published: (2025)

Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025)

LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
by: Zhao, Liang, et al.
Published: (2024)

CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
by: Jian, Ai, et al.
Published: (2025)

Skywork Open Reasoner 1 Technical Report
by: He, Jujie, et al.
Published: (2025)

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
by: Wei, Tianwen, et al.
Published: (2024)

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation
by: Zhang, Ruiheng, et al.
Published: (2026)

UniShield: Unified Face Attack Detection via KG-Informed Multimodal Reasoning
by: Li, Hongrui, et al.
Published: (2026)

UniVL: Unified Vision-Language Embedding for Spatially Grounded Contextual Image Generation
by: Wang, Jiayun, et al.
Published: (2026)

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
by: Liu, Chris Yuhao, et al.
Published: (2024)

UniVBench: Towards Unified Evaluation for Video Foundation Models
by: Wei, Jianhui, et al.
Published: (2026)

UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation
by: Zhang, Chi, et al.
Published: (2025)

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
by: Wang, Dianyi, et al.
Published: (2025)

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
by: Li, Yi, et al.
Published: (2025)

UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)

Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding
by: Ma, Chenglong, et al.
Published: (2025)

UniT: Unified Geometry Learning with Group Autoregressive Transformer
by: Wang, Haotian, et al.
Published: (2026)

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
by: Guan, Wenhao, et al.
Published: (2025)

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
by: Liu, Chris Yuhao, et al.
Published: (2025)

Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
by: Fan, Lijie, et al.
Published: (2025)

UniARM: Towards a Unified Autoregressive Reward Model for Multi-Objective Test-Time Alignment
by: Xie, Hongyan, et al.
Published: (2026)

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
by: Yue, Zhengrong, et al.
Published: (2025)

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On
by: Zeng, Liang, et al.
Published: (2024)

OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation
by: Wu, Size, et al.
Published: (2025)

UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
by: Tang, Hao, et al.
Published: (2025)

UniECG: Understanding and Generating ECG in One Unified Model
by: Jin, Jiarui, et al.
Published: (2025)

Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models
by: Wei, Hongyang, et al.
Published: (2025)

PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement
by: Xie, Tianyidan, et al.
Published: (2026)

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
by: Lin, Bin, et al.
Published: (2025)

UniQueR: Unified Query-based Feedforward 3D Reconstruction
by: Peng, Chensheng, et al.
Published: (2026)

UniMo: Unified Motion Generation and Understanding with Chain of Thought
by: Wang, Guocun, et al.
Published: (2026)

UniCompress: Token Compression for Unified Vision-Language Understanding and Generation
by: Wang, Ziyao, et al.
Published: (2026)

UniTok: A Unified Tokenizer for Visual Generation and Understanding
by: Ma, Chuofan, et al.
Published: (2025)

UniFormer: Unifying Convolution and Self-attention for Visual Recognition
by: Li, Kunchang, et al.
Published: (2022)