:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Marco, Qi, Xianbiao, He, Yelin, Ye, Jiaquan, Xiao, Rong
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01212
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
by: Qi, Xianbiao, et al.
Published: (2025)

Taming Transformer Without Using Learning Rate Warmup
by: Qi, Xianbiao, et al.
Published: (2025)

Delving into Muon and Beyond: Deep Analysis and Extensions
by: Qi, Xianbiao, et al.
Published: (2026)

BodyShapeGPT: SMPL Body Shape Manipulation with LLMs
by: Árbol, Baldomero R., et al.
Published: (2024)

Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4
by: Lee, Messi H. J.
Published: (2025)

Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering
by: Beliaev, Mark, et al.
Published: (2025)

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)

Simple ReFlow: Improved Techniques for Fast Flow Models
by: Kim, Beomsu, et al.
Published: (2024)

Exploring a Principled Framework for Deep Subspace Clustering
by: Meng, Xianghan, et al.
Published: (2025)

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
by: Zhan, Jun, et al.
Published: (2024)

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
by: Chen, Junying, et al.
Published: (2025)

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
by: Huang, Yukun, et al.
Published: (2023)

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
by: Zala, Abhay, et al.
Published: (2023)

VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
by: Gado, Mohamed, et al.
Published: (2025)

VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
by: Lin, Han, et al.
Published: (2023)

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
by: Chen, Junying, et al.
Published: (2024)

Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification
by: Li, Ming, et al.
Published: (2024)

Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration
by: Lin, Chenwei, et al.
Published: (2024)

Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
by: Saporta, Adriel, et al.
Published: (2024)

RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models
by: Zhu, Wenhui, et al.
Published: (2025)

Transformers without Normalization
by: Zhu, Jiachen, et al.
Published: (2025)

Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models
by: Han, Zongbo, et al.
Published: (2024)

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
by: Tang, Yuan, et al.
Published: (2024)

Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding
by: Sun, Qi, et al.
Published: (2024)

GPT-4o System Card
by: OpenAI, et al.
Published: (2024)

TrackletGPT: A Language-like GPT Framework for White Matter Tract Segmentation
by: Goel, Anoushkrit, et al.
Published: (2026)

Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning
by: Tang, Jia, et al.
Published: (2025)

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2025)

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
by: Ma, Xingjun, et al.
Published: (2026)

Stronger Normalization-Free Transformers
by: Chen, Mingzhi, et al.
Published: (2025)

Simple Vision-Language Math Reasoning via Rendered Text
by: Skripkin, Matvey, et al.
Published: (2025)

Set-CLIP: Exploring Aligned Semantic From Low-Alignment Multimodal Data Through A Distribution View
by: Song, Zijia, et al.
Published: (2024)

Sparsity Hurts: Simple Linear Adapter Can Boost Generalized Category Discovery
by: Ye, Bo, et al.
Published: (2026)

Improving Language Understanding from Screenshots
by: Gao, Tianyu, et al.
Published: (2024)

LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
by: Le-Duc, Khai, et al.
Published: (2024)

SUTrack: Towards Simple and Unified Single Object Tracking
by: Chen, Xin, et al.
Published: (2024)

Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers
by: Liao, Zhu, et al.
Published: (2024)

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
by: Huang, Wei, et al.
Published: (2025)

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
by: Yuan, Zhengqing, et al.
Published: (2023)

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)