Saved in:
| Main Authors: | Chen, Marco, Qi, Xianbiao, He, Yelin, Ye, Jiaquan, Xiao, Rong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01212 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
by: Qi, Xianbiao, et al.
Published: (2025)
by: Qi, Xianbiao, et al.
Published: (2025)
Taming Transformer Without Using Learning Rate Warmup
by: Qi, Xianbiao, et al.
Published: (2025)
by: Qi, Xianbiao, et al.
Published: (2025)
Delving into Muon and Beyond: Deep Analysis and Extensions
by: Qi, Xianbiao, et al.
Published: (2026)
by: Qi, Xianbiao, et al.
Published: (2026)
BodyShapeGPT: SMPL Body Shape Manipulation with LLMs
by: Árbol, Baldomero R., et al.
Published: (2024)
by: Árbol, Baldomero R., et al.
Published: (2024)
Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4
by: Lee, Messi H. J.
Published: (2025)
by: Lee, Messi H. J.
Published: (2025)
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering
by: Beliaev, Mark, et al.
Published: (2025)
by: Beliaev, Mark, et al.
Published: (2025)
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)
by: Li, Yanghao, et al.
Published: (2025)
Simple ReFlow: Improved Techniques for Fast Flow Models
by: Kim, Beomsu, et al.
Published: (2024)
by: Kim, Beomsu, et al.
Published: (2024)
Exploring a Principled Framework for Deep Subspace Clustering
by: Meng, Xianghan, et al.
Published: (2025)
by: Meng, Xianghan, et al.
Published: (2025)
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
by: Zhan, Jun, et al.
Published: (2024)
by: Zhan, Jun, et al.
Published: (2024)
ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
by: Chen, Junying, et al.
Published: (2025)
by: Chen, Junying, et al.
Published: (2025)
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
by: Huang, Yukun, et al.
Published: (2023)
by: Huang, Yukun, et al.
Published: (2023)
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
by: Zala, Abhay, et al.
Published: (2023)
by: Zala, Abhay, et al.
Published: (2023)
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
by: Gado, Mohamed, et al.
Published: (2025)
by: Gado, Mohamed, et al.
Published: (2025)
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
by: Lin, Han, et al.
Published: (2023)
by: Lin, Han, et al.
Published: (2023)
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
by: Chen, Junying, et al.
Published: (2024)
by: Chen, Junying, et al.
Published: (2024)
Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration
by: Lin, Chenwei, et al.
Published: (2024)
by: Lin, Chenwei, et al.
Published: (2024)
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
by: Saporta, Adriel, et al.
Published: (2024)
by: Saporta, Adriel, et al.
Published: (2024)
RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models
by: Zhu, Wenhui, et al.
Published: (2025)
by: Zhu, Wenhui, et al.
Published: (2025)
Transformers without Normalization
by: Zhu, Jiachen, et al.
Published: (2025)
by: Zhu, Jiachen, et al.
Published: (2025)
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models
by: Han, Zongbo, et al.
Published: (2024)
by: Han, Zongbo, et al.
Published: (2024)
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors
by: Tang, Yuan, et al.
Published: (2024)
by: Tang, Yuan, et al.
Published: (2024)
Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding
by: Sun, Qi, et al.
Published: (2024)
by: Sun, Qi, et al.
Published: (2024)
GPT-4o System Card
by: OpenAI, et al.
Published: (2024)
by: OpenAI, et al.
Published: (2024)
TrackletGPT: A Language-like GPT Framework for White Matter Tract Segmentation
by: Goel, Anoushkrit, et al.
Published: (2026)
by: Goel, Anoushkrit, et al.
Published: (2026)
Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning
by: Tang, Jia, et al.
Published: (2025)
by: Tang, Jia, et al.
Published: (2025)
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
by: Xiao, Han, et al.
Published: (2025)
by: Xiao, Han, et al.
Published: (2025)
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
by: Ma, Xingjun, et al.
Published: (2026)
by: Ma, Xingjun, et al.
Published: (2026)
Stronger Normalization-Free Transformers
by: Chen, Mingzhi, et al.
Published: (2025)
by: Chen, Mingzhi, et al.
Published: (2025)
Simple Vision-Language Math Reasoning via Rendered Text
by: Skripkin, Matvey, et al.
Published: (2025)
by: Skripkin, Matvey, et al.
Published: (2025)
Set-CLIP: Exploring Aligned Semantic From Low-Alignment Multimodal Data Through A Distribution View
by: Song, Zijia, et al.
Published: (2024)
by: Song, Zijia, et al.
Published: (2024)
Sparsity Hurts: Simple Linear Adapter Can Boost Generalized Category Discovery
by: Ye, Bo, et al.
Published: (2026)
by: Ye, Bo, et al.
Published: (2026)
Improving Language Understanding from Screenshots
by: Gao, Tianyu, et al.
Published: (2024)
by: Gao, Tianyu, et al.
Published: (2024)
LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
by: Le-Duc, Khai, et al.
Published: (2024)
by: Le-Duc, Khai, et al.
Published: (2024)
SUTrack: Towards Simple and Unified Single Object Tracking
by: Chen, Xin, et al.
Published: (2024)
by: Chen, Xin, et al.
Published: (2024)
Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers
by: Liao, Zhu, et al.
Published: (2024)
by: Liao, Zhu, et al.
Published: (2024)
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
by: Yuan, Zhengqing, et al.
Published: (2023)
by: Yuan, Zhengqing, et al.
Published: (2023)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
by: Chen, Yangyi, et al.
Published: (2023)
by: Chen, Yangyi, et al.
Published: (2023)
Similar Items
-
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
by: Qi, Xianbiao, et al.
Published: (2025) -
Taming Transformer Without Using Learning Rate Warmup
by: Qi, Xianbiao, et al.
Published: (2025) -
Delving into Muon and Beyond: Deep Analysis and Extensions
by: Qi, Xianbiao, et al.
Published: (2026) -
BodyShapeGPT: SMPL Body Shape Manipulation with LLMs
by: Árbol, Baldomero R., et al.
Published: (2024) -
Examining the Robustness of Homogeneity Bias to Hyperparameter Adjustments in GPT-4
by: Lee, Messi H. J.
Published: (2025)