Saved in:
| Main Authors: | Kong, Chaerin, Jang, Jiho, Kwak, Nojun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.16333 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
by: Han, Donghoon, et al.
Published: (2023)
by: Han, Donghoon, et al.
Published: (2023)
Conservative Generator, Progressive Discriminator: Coordination of Adversaries in Few-shot Incremental Image Synthesis
by: Kong, Chaerin, et al.
Published: (2022)
by: Kong, Chaerin, et al.
Published: (2022)
Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
by: Choi, Hahyeon, et al.
Published: (2026)
by: Choi, Hahyeon, et al.
Published: (2026)
Practical Dataset Distillation Based on Deep Support Vectors
by: Lee, Hyunho, et al.
Published: (2024)
by: Lee, Hyunho, et al.
Published: (2024)
Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
by: Lee, Dongkwan, et al.
Published: (2025)
by: Lee, Dongkwan, et al.
Published: (2025)
Multi-dimensional Preference Alignment by Conditioning Reward Itself
by: Jang, Jiho, et al.
Published: (2025)
by: Jang, Jiho, et al.
Published: (2025)
Deep Support Vectors
by: Lee, Junhoo, et al.
Published: (2024)
by: Lee, Junhoo, et al.
Published: (2024)
Any-Way Meta Learning
by: Lee, Junhoo, et al.
Published: (2024)
by: Lee, Junhoo, et al.
Published: (2024)
Mitigating the Bias in the Model for Continual Test-Time Adaptation
by: Chung, Inseop, et al.
Published: (2024)
by: Chung, Inseop, et al.
Published: (2024)
The Role of Teacher Calibration in Knowledge Distillation
by: Kim, Suyoung, et al.
Published: (2025)
by: Kim, Suyoung, et al.
Published: (2025)
Coreset Selection for Object Detection
by: Lee, Hojun, et al.
Published: (2024)
by: Lee, Hojun, et al.
Published: (2024)
Self-supervised Pretraining for Partial Differential Equations
by: Madhavan, Varun, et al.
Published: (2024)
by: Madhavan, Varun, et al.
Published: (2024)
Towards Understanding Self-Pretraining for Sequence Classification
by: Coser, Omar, et al.
Published: (2026)
by: Coser, Omar, et al.
Published: (2026)
Adaptive Pruning of Pretrained Transformer via Differential Inclusions
by: Ding, Yizhuo, et al.
Published: (2025)
by: Ding, Yizhuo, et al.
Published: (2025)
Differential Gated Self-Attention
by: Lygizou, Elpiniki Maria, et al.
Published: (2025)
by: Lygizou, Elpiniki Maria, et al.
Published: (2025)
Bootstrapping Top-down Information for Self-modulating Slot Attention
by: Kim, Dongwon, et al.
Published: (2024)
by: Kim, Dongwon, et al.
Published: (2024)
Gauge-Equivariant Graph Networks via Self-Interference Cancellation
by: Choi, Yoonhyuk, et al.
Published: (2025)
by: Choi, Yoonhyuk, et al.
Published: (2025)
DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory
by: Zhou, Wenxuan, et al.
Published: (2025)
by: Zhou, Wenxuan, et al.
Published: (2025)
Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
by: Kwak, Minseo, et al.
Published: (2026)
by: Kwak, Minseo, et al.
Published: (2026)
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
by: Wang, Hanzhao, et al.
Published: (2024)
by: Wang, Hanzhao, et al.
Published: (2024)
Bi-ICE: An Inner Interpretable Framework for Image Classification via Bi-directional Interactions between Concept and Input Embeddings
by: Hong, Jinyung, et al.
Published: (2024)
by: Hong, Jinyung, et al.
Published: (2024)
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
by: Mamtani, Sumit, et al.
Published: (2025)
by: Mamtani, Sumit, et al.
Published: (2025)
An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning
by: Jiao, Rui, et al.
Published: (2024)
by: Jiao, Rui, et al.
Published: (2024)
Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
by: Takahashi, Tsubasa, et al.
Published: (2025)
by: Takahashi, Tsubasa, et al.
Published: (2025)
First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
by: Kim, Gyudong, et al.
Published: (2025)
by: Kim, Gyudong, et al.
Published: (2025)
Better Not to Propagate: Understanding Edge Uncertainty and Over-smoothing in Signed Graph Neural Networks
by: Choi, Yoonhyuk, et al.
Published: (2024)
by: Choi, Yoonhyuk, et al.
Published: (2024)
Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)
by: Choi, Jeongwhan, et al.
Published: (2023)
Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)
by: Vasudeva, Bhavya, et al.
Published: (2026)
Multistability of Self-Attention Dynamics in Transformers
by: Altafini, Claudio
Published: (2025)
by: Altafini, Claudio
Published: (2025)
Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)
by: Roy, Debjyoti Saha, et al.
Published: (2024)
Cross-Attention Message-Passing Transformers for Code-Agnostic Decoding in 6G Networks
by: Park, Seong-Joon, et al.
Published: (2025)
by: Park, Seong-Joon, et al.
Published: (2025)
Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
by: Zhang, Haobo, et al.
Published: (2026)
by: Zhang, Haobo, et al.
Published: (2026)
Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs
by: Rahman, Md Ashiqur, et al.
Published: (2024)
by: Rahman, Md Ashiqur, et al.
Published: (2024)
TPTT: Transforming Pretrained Transformers into Titans
by: Furfaro, Fabien
Published: (2025)
by: Furfaro, Fabien
Published: (2025)
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
by: Hsu, Alexander, et al.
Published: (2026)
by: Hsu, Alexander, et al.
Published: (2026)
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
by: Yu, Sihyun, et al.
Published: (2024)
by: Yu, Sihyun, et al.
Published: (2024)
Multi-User Contextual Cascading Bandits for Personalized Recommendation
by: Park, Jiho, et al.
Published: (2025)
by: Park, Jiho, et al.
Published: (2025)
Differentially Private Conformal Prediction
by: Wu, Jiamei, et al.
Published: (2026)
by: Wu, Jiamei, et al.
Published: (2026)
Quantum Adaptive Self-Attention for Quantum Transformer Models
by: Chen, Chi-Sheng, et al.
Published: (2025)
by: Chen, Chi-Sheng, et al.
Published: (2025)
Similar Items
-
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
by: Han, Donghoon, et al.
Published: (2023) -
Conservative Generator, Progressive Discriminator: Coordination of Adversaries in Few-shot Incremental Image Synthesis
by: Kong, Chaerin, et al.
Published: (2022) -
Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
by: Choi, Hahyeon, et al.
Published: (2026) -
Practical Dataset Distillation Based on Deep Support Vectors
by: Lee, Hyunho, et al.
Published: (2024) -
Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
by: Lee, Dongkwan, et al.
Published: (2025)