Saved in:
| Main Authors: | Haris, Themistoklis, Zhang, Zihan, Yoshida, Yuichi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08287 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compression Barriers for Autoregressive Transformers
by: Haris, Themistoklis, et al.
Published: (2025)
by: Haris, Themistoklis, et al.
Published: (2025)
Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification
by: Lutz, Patrick, et al.
Published: (2026)
by: Lutz, Patrick, et al.
Published: (2026)
$k$NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
by: Haris, Themistoklis
Published: (2024)
by: Haris, Themistoklis
Published: (2024)
Is Monotonic Sampling Necessary in Diffusion Models?
by: Khan, Muhammad Haris
Published: (2026)
by: Khan, Muhammad Haris
Published: (2026)
NoiseFormer -- Noise Diffused Symmetric Attention Transformer
by: Kumar, Phani, et al.
Published: (2026)
by: Kumar, Phani, et al.
Published: (2026)
NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models
by: Li, Zeming, et al.
Published: (2025)
by: Li, Zeming, et al.
Published: (2025)
From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers
by: Suresh, Praneet, et al.
Published: (2025)
by: Suresh, Praneet, et al.
Published: (2025)
Stability and Generalization in Looped Transformers
by: Labovich, Asher
Published: (2026)
by: Labovich, Asher
Published: (2026)
SafeBench-Seq: A Homology-Clustered, CPU-Only Baseline for Protein Hazard Screening with Physicochemical/Composition Features and Cluster-Aware Confidence Intervals
by: Khan, Muhammad Haris
Published: (2025)
by: Khan, Muhammad Haris
Published: (2025)
The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
by: Pengmei, Zihan, et al.
Published: (2025)
by: Pengmei, Zihan, et al.
Published: (2025)
Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
by: Saukh, Olga, et al.
Published: (2026)
by: Saukh, Olga, et al.
Published: (2026)
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning
by: Ding, Zihan, et al.
Published: (2024)
by: Ding, Zihan, et al.
Published: (2024)
Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping
by: Li, Jiaxing, et al.
Published: (2024)
by: Li, Jiaxing, et al.
Published: (2024)
Policy Filtration for RLHF to Mitigate Noise in Reward Models
by: Zhang, Chuheng, et al.
Published: (2024)
by: Zhang, Chuheng, et al.
Published: (2024)
Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise
by: Long, Bo, et al.
Published: (2026)
by: Long, Bo, et al.
Published: (2026)
Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition
by: Khan, Haris, et al.
Published: (2025)
by: Khan, Haris, et al.
Published: (2025)
Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)
by: Emadi, Seyed Morteza
Published: (2026)
When Is Rank-1 Steering Cheap? Geometry, Granularity, and Budgeted Search
by: Robertson, John T., et al.
Published: (2026)
by: Robertson, John T., et al.
Published: (2026)
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
by: Wang, Dong, et al.
Published: (2025)
by: Wang, Dong, et al.
Published: (2025)
Self-Discovered Intention-aware Transformer for Multi-modal Vehicle Trajectory Prediction
by: Liu, Diyi, et al.
Published: (2026)
by: Liu, Diyi, et al.
Published: (2026)
Stability of Transformers under Layer Normalization
by: Kan, Kelvin, et al.
Published: (2025)
by: Kan, Kelvin, et al.
Published: (2025)
Unlocking Emergent Modularity in Large Language Models
by: Qiu, Zihan, et al.
Published: (2023)
by: Qiu, Zihan, et al.
Published: (2023)
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
by: Yoshihara, Hiroshi, et al.
Published: (2025)
by: Yoshihara, Hiroshi, et al.
Published: (2025)
Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization
by: Zhang, Kuan, et al.
Published: (2025)
by: Zhang, Kuan, et al.
Published: (2025)
Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation
by: Zhang, Zhiyang, et al.
Published: (2025)
by: Zhang, Zhiyang, et al.
Published: (2025)
Hide and Seek in Noise Labels: Noise-Robust Collaborative Active Learning with LLM-Powered Assistance
by: Yuan, Bo, et al.
Published: (2025)
by: Yuan, Bo, et al.
Published: (2025)
From Static Analysis to Audience Dissemination: A Training-Free Multimodal Controversy Detection Multi-Agent Framework
by: Ding, Zihan, et al.
Published: (2026)
by: Ding, Zihan, et al.
Published: (2026)
PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework
by: Ding, Zihan, et al.
Published: (2026)
by: Ding, Zihan, et al.
Published: (2026)
On Some Tunable Multi-fidelity Bayesian Optimization Frameworks
by: Manoj, Arjun, et al.
Published: (2025)
by: Manoj, Arjun, et al.
Published: (2025)
Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation
by: Riaz, Haris, et al.
Published: (2025)
by: Riaz, Haris, et al.
Published: (2025)
Federated Self-Supervised Learning for Automatic Modulation Classification under Non-IID and Class-Imbalanced Data
by: Akram, Usman, et al.
Published: (2025)
by: Akram, Usman, et al.
Published: (2025)
MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction
by: Djuhera, Aladin, et al.
Published: (2026)
by: Djuhera, Aladin, et al.
Published: (2026)
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
by: Naganuma, Hiroki, et al.
Published: (2023)
by: Naganuma, Hiroki, et al.
Published: (2023)
A Comprehensive Review on Noise Control of Diffusion Model
by: Guo, Zhehao, et al.
Published: (2025)
by: Guo, Zhehao, et al.
Published: (2025)
CCS: Controllable and Constrained Sampling with Diffusion Models via Initial Noise Perturbation
by: Song, Bowen, et al.
Published: (2025)
by: Song, Bowen, et al.
Published: (2025)
Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning
by: Bozkurt, Alper Kamil, et al.
Published: (2026)
by: Bozkurt, Alper Kamil, et al.
Published: (2026)
Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback
by: Zhang, Haoran, et al.
Published: (2026)
by: Zhang, Haoran, et al.
Published: (2026)
Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization
by: Kim, Dongjun, et al.
Published: (2026)
by: Kim, Dongjun, et al.
Published: (2026)
Single-stream Policy Optimization
by: Xu, Zhongwen, et al.
Published: (2025)
by: Xu, Zhongwen, et al.
Published: (2025)
Dynamic Influence Tracker: Measuring Time-Varying Sample Influence During Training
by: Xu, Jie, et al.
Published: (2025)
by: Xu, Jie, et al.
Published: (2025)
Similar Items
-
Compression Barriers for Autoregressive Transformers
by: Haris, Themistoklis, et al.
Published: (2025) -
Symmetry Reveals Layerwise Dynamics: How Transformers Perform In-Context Classification
by: Lutz, Patrick, et al.
Published: (2026) -
$k$NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
by: Haris, Themistoklis
Published: (2024) -
Is Monotonic Sampling Necessary in Diffusion Models?
by: Khan, Muhammad Haris
Published: (2026) -
NoiseFormer -- Noise Diffused Symmetric Attention Transformer
by: Kumar, Phani, et al.
Published: (2026)