:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kong, Chaerin, Jang, Jiho, Kwak, Nojun
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.16333
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ConcatPlexer: Additional Dim1 Batching for Faster ViTs
by: Han, Donghoon, et al.
Published: (2023)

Conservative Generator, Progressive Discriminator: Coordination of Adversaries in Few-shot Incremental Image Synthesis
by: Kong, Chaerin, et al.
Published: (2022)

Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
by: Choi, Hahyeon, et al.
Published: (2026)

Practical Dataset Distillation Based on Deep Support Vectors
by: Lee, Hyunho, et al.
Published: (2024)

Deep Edge Filter: Return of the Human-Crafted Layer in Deep Learning
by: Lee, Dongkwan, et al.
Published: (2025)

Multi-dimensional Preference Alignment by Conditioning Reward Itself
by: Jang, Jiho, et al.
Published: (2025)

Deep Support Vectors
by: Lee, Junhoo, et al.
Published: (2024)

Any-Way Meta Learning
by: Lee, Junhoo, et al.
Published: (2024)

Mitigating the Bias in the Model for Continual Test-Time Adaptation
by: Chung, Inseop, et al.
Published: (2024)

The Role of Teacher Calibration in Knowledge Distillation
by: Kim, Suyoung, et al.
Published: (2025)

Coreset Selection for Object Detection
by: Lee, Hojun, et al.
Published: (2024)

Self-supervised Pretraining for Partial Differential Equations
by: Madhavan, Varun, et al.
Published: (2024)

Towards Understanding Self-Pretraining for Sequence Classification
by: Coser, Omar, et al.
Published: (2026)

Adaptive Pruning of Pretrained Transformer via Differential Inclusions
by: Ding, Yizhuo, et al.
Published: (2025)

Differential Gated Self-Attention
by: Lygizou, Elpiniki Maria, et al.
Published: (2025)

Bootstrapping Top-down Information for Self-modulating Slot Attention
by: Kim, Dongwon, et al.
Published: (2024)

Gauge-Equivariant Graph Networks via Self-Interference Cancellation
by: Choi, Yoonhyuk, et al.
Published: (2025)

DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory
by: Zhou, Wenxuan, et al.
Published: (2025)

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
by: Kwak, Minseo, et al.
Published: (2026)

Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
by: Wang, Hanzhao, et al.
Published: (2024)

Bi-ICE: An Inner Interpretable Framework for Image Classification via Bi-directional Interactions between Concept and Input Embeddings
by: Hong, Jinyung, et al.
Published: (2024)

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
by: Mamtani, Sumit, et al.
Published: (2025)

An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning
by: Jiao, Rui, et al.
Published: (2024)

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
by: Takahashi, Tsubasa, et al.
Published: (2025)

First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
by: Kim, Gyudong, et al.
Published: (2025)

Better Not to Propagate: Understanding Edge Uncertainty and Over-smoothing in Signed Graph Neural Networks
by: Choi, Yoonhyuk, et al.
Published: (2024)

Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)

Multistability of Self-Attention Dynamics in Transformers
by: Altafini, Claudio
Published: (2025)

Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
by: Roy, Debjyoti Saha, et al.
Published: (2024)

Cross-Attention Message-Passing Transformers for Code-Agnostic Decoding in 6G Networks
by: Park, Seong-Joon, et al.
Published: (2025)

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers
by: Xu, Yongzhong
Published: (2026)

AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
by: Zhang, Haobo, et al.
Published: (2026)

Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs
by: Rahman, Md Ashiqur, et al.
Published: (2024)

TPTT: Transforming Pretrained Transformers into Titans
by: Furfaro, Fabien
Published: (2025)

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
by: Hsu, Alexander, et al.
Published: (2026)

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
by: Yu, Sihyun, et al.
Published: (2024)

Multi-User Contextual Cascading Bandits for Personalized Recommendation
by: Park, Jiho, et al.
Published: (2025)

Differentially Private Conformal Prediction
by: Wu, Jiamei, et al.
Published: (2026)

Quantum Adaptive Self-Attention for Quantum Transformer Models
by: Chen, Chi-Sheng, et al.
Published: (2025)