Saved in:
| Main Authors: | Opper, Mattia, Fernandez, Roland, Smolensky, Paul, Gao, Jianfeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.23174 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compositional Generalization Across Distributional Shifts with Sparse Tree Operations
by: Soulos, Paul, et al.
Published: (2024)
by: Soulos, Paul, et al.
Published: (2024)
Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks
by: Smolensky, Paul, et al.
Published: (2024)
by: Smolensky, Paul, et al.
Published: (2024)
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
by: Opper, Mattia, et al.
Published: (2023)
by: Opper, Mattia, et al.
Published: (2023)
Do Generalisation Results Generalise?
by: Boglioni, Matteo, et al.
Published: (2025)
by: Boglioni, Matteo, et al.
Published: (2025)
You Need Better Attention Priors
by: Litman, Elon, et al.
Published: (2026)
by: Litman, Elon, et al.
Published: (2026)
Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
by: Gao, Bo, et al.
Published: (2025)
by: Gao, Bo, et al.
Published: (2025)
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
by: Cai, Jianfeng, et al.
Published: (2025)
by: Cai, Jianfeng, et al.
Published: (2025)
Banyan: Improved Representation Learning with Explicit Structure
by: Opper, Mattia, et al.
Published: (2024)
by: Opper, Mattia, et al.
Published: (2024)
Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn More With Less
by: Opper, Mattia, et al.
Published: (2024)
by: Opper, Mattia, et al.
Published: (2024)
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
by: Zhang, Qingru, et al.
Published: (2023)
by: Zhang, Qingru, et al.
Published: (2023)
Hierarchical Attention Generates Better Proofs
by: Chen, Jianlong, et al.
Published: (2025)
by: Chen, Jianlong, et al.
Published: (2025)
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
by: He, Zhenyu, et al.
Published: (2024)
by: He, Zhenyu, et al.
Published: (2024)
Towards Generalising Neural Topical Representations
by: Yang, Xiaohao, et al.
Published: (2023)
by: Yang, Xiaohao, et al.
Published: (2023)
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
by: Dong, Jiancheng, et al.
Published: (2024)
by: Dong, Jiancheng, et al.
Published: (2024)
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
by: Leng, Jiaqi, et al.
Published: (2025)
by: Leng, Jiaqi, et al.
Published: (2025)
Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
by: Bai, Xueying, et al.
Published: (2024)
by: Bai, Xueying, et al.
Published: (2024)
Diffusion Language Models Are Natively Length-Aware
by: Rossi, Vittorio, et al.
Published: (2026)
by: Rossi, Vittorio, et al.
Published: (2026)
$p1$: Better Prompt Optimization with Fewer Prompts
by: Gao, Zhaolin, et al.
Published: (2026)
by: Gao, Zhaolin, et al.
Published: (2026)
Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning
by: Li, Zichao, et al.
Published: (2026)
by: Li, Zichao, et al.
Published: (2026)
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection
by: Jung, Minseok, et al.
Published: (2025)
by: Jung, Minseok, et al.
Published: (2025)
Rethinking the Evaluation of Alignment Methods: Insights into Diversity, Generalisation, and Safety
by: Janiak, Denis, et al.
Published: (2025)
by: Janiak, Denis, et al.
Published: (2025)
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
by: Meadows, Jordan, et al.
Published: (2023)
by: Meadows, Jordan, et al.
Published: (2023)
Bootstrapping Embeddings for Low Resource Languages
by: Basoz, Merve, et al.
Published: (2026)
by: Basoz, Merve, et al.
Published: (2026)
Rethinking Perplexity: Revealing the Impact of Input Length on Perplexity Evaluation in LLMs
by: Cheng, Letian, et al.
Published: (2026)
by: Cheng, Letian, et al.
Published: (2026)
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
by: Ruan, Jie, et al.
Published: (2024)
by: Ruan, Jie, et al.
Published: (2024)
Towards Better Multi-head Attention via Channel-wise Sample Permutation
by: Yuan, Shen, et al.
Published: (2024)
by: Yuan, Shen, et al.
Published: (2024)
Smaller Language Models are Better Black-box Machine-Generated Text Detectors
by: Mireshghallah, Niloofar, et al.
Published: (2023)
by: Mireshghallah, Niloofar, et al.
Published: (2023)
Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)
by: Cheng, Zicong, et al.
Published: (2026)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)
by: Ahuja, Kartik, et al.
Published: (2024)
Understanding the Effects of RLHF on LLM Generalisation and Diversity
by: Kirk, Robert, et al.
Published: (2023)
by: Kirk, Robert, et al.
Published: (2023)
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
by: Zhang, Zeliang, et al.
Published: (2026)
by: Zhang, Zeliang, et al.
Published: (2026)
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
by: Ge, Tao, et al.
Published: (2026)
by: Ge, Tao, et al.
Published: (2026)
Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation
by: Conklin, Henry
Published: (2025)
by: Conklin, Henry
Published: (2025)
Intrinsic Entropy of Context Length Scaling in LLMs
by: Shi, Jingzhe, et al.
Published: (2025)
by: Shi, Jingzhe, et al.
Published: (2025)
Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models
by: Yao, Kai, et al.
Published: (2024)
by: Yao, Kai, et al.
Published: (2024)
SPOT: Text Source Prediction from Originality Score Thresholding
by: Yvinec, Edouard, et al.
Published: (2024)
by: Yvinec, Edouard, et al.
Published: (2024)
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
by: Lee, Donghyun, et al.
Published: (2024)
by: Lee, Donghyun, et al.
Published: (2024)
Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging
by: Khattab, Sameh, et al.
Published: (2026)
by: Khattab, Sameh, et al.
Published: (2026)
Similar Items
-
Compositional Generalization Across Distributional Shifts with Sparse Tree Operations
by: Soulos, Paul, et al.
Published: (2024) -
Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks
by: Smolensky, Paul, et al.
Published: (2024) -
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
by: Opper, Mattia, et al.
Published: (2023) -
Do Generalisation Results Generalise?
by: Boglioni, Matteo, et al.
Published: (2025) -
You Need Better Attention Priors
by: Litman, Elon, et al.
Published: (2026)