:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Opper, Mattia, Fernandez, Roland, Smolensky, Paul, Gao, Jianfeng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2503.23174
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Compositional Generalization Across Distributional Shifts with Sparse Tree Operations
by: Soulos, Paul, et al.
Published: (2024)

Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks
by: Smolensky, Paul, et al.
Published: (2024)

StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
by: Opper, Mattia, et al.
Published: (2023)

Do Generalisation Results Generalise?
by: Boglioni, Matteo, et al.
Published: (2025)

You Need Better Attention Priors
by: Litman, Elon, et al.
Published: (2026)

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
by: Gao, Bo, et al.
Published: (2025)

Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
by: Cai, Jianfeng, et al.
Published: (2025)

Banyan: Improved Representation Learning with Explicit Structure
by: Opper, Mattia, et al.
Published: (2024)

Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn More With Less
by: Opper, Mattia, et al.
Published: (2024)

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)

Hierarchical Attention Generates Better Proofs
by: Chen, Jianlong, et al.
Published: (2025)

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
by: He, Zhenyu, et al.
Published: (2024)

Towards Generalising Neural Topical Representations
by: Yang, Xiaohao, et al.
Published: (2023)

Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
by: Dong, Jiancheng, et al.
Published: (2024)

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
by: Leng, Jiaqi, et al.
Published: (2025)

Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
by: Bai, Xueying, et al.
Published: (2024)

Diffusion Language Models Are Natively Length-Aware
by: Rossi, Vittorio, et al.
Published: (2026)

$p1$: Better Prompt Optimization with Fewer Prompts
by: Gao, Zhaolin, et al.
Published: (2026)

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning
by: Li, Zichao, et al.
Published: (2026)

Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
by: Jung, Minseok, et al.
Published: (2025)

Rethinking the Evaluation of Alignment Methods: Insights into Diversity, Generalisation, and Safety
by: Janiak, Denis, et al.
Published: (2025)

A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
by: Meadows, Jordan, et al.
Published: (2023)

Bootstrapping Embeddings for Low Resource Languages
by: Basoz, Merve, et al.
Published: (2026)

Rethinking Perplexity: Revealing the Impact of Input Length on Perplexity Evaluation in LLMs
by: Cheng, Letian, et al.
Published: (2026)

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
by: Ruan, Jie, et al.
Published: (2024)

Towards Better Multi-head Attention via Channel-wise Sample Permutation
by: Yuan, Shen, et al.
Published: (2024)

Smaller Language Models are Better Black-box Machine-Generated Text Detectors
by: Mireshghallah, Niloofar, et al.
Published: (2023)

Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)

On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)

Understanding the Effects of RLHF on LLM Generalisation and Diversity
by: Kirk, Robert, et al.
Published: (2023)

Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)

Training Large Reasoning Models Efficiently via Progressive Thought Encoding
by: Zhang, Zeliang, et al.
Published: (2026)

Synthetic Computers at Scale for Long-Horizon Productivity Simulation
by: Ge, Tao, et al.
Published: (2026)

Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation
by: Conklin, Henry
Published: (2025)

Intrinsic Entropy of Context Length Scaling in LLMs
by: Shi, Jingzhe, et al.
Published: (2025)

Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models
by: Yao, Kai, et al.
Published: (2024)

SPOT: Text Source Prediction from Originality Score Thresholding
by: Yvinec, Edouard, et al.
Published: (2024)

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
by: Lee, Donghyun, et al.
Published: (2024)

Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging
by: Khattab, Sameh, et al.
Published: (2026)