Saved in:
| Main Authors: | Ruscio, Valeria, Nanni, Umberto, Silvestri, Fabrizio |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.02546 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Position: the emergence of wavelet-like properties in Transformers
by: Ruscio, Valeria, et al.
Published: (2024)
by: Ruscio, Valeria, et al.
Published: (2024)
The Phenomenology of Hallucinations
by: Ruscio, Valeria, et al.
Published: (2026)
by: Ruscio, Valeria, et al.
Published: (2026)
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space
by: Ruscio, Valeria, et al.
Published: (2026)
by: Ruscio, Valeria, et al.
Published: (2026)
$\nabla τ$: Gradient-based and Task-Agnostic machine Unlearning
by: Trippa, Daniel, et al.
Published: (2024)
by: Trippa, Daniel, et al.
Published: (2024)
HOP to the Next Tasks and Domains for Continual Learning in NLP
by: Michieli, Umberto, et al.
Published: (2024)
by: Michieli, Umberto, et al.
Published: (2024)
TransformerFAM: Feedback attention is working memory
by: Hwang, Dongseong, et al.
Published: (2024)
by: Hwang, Dongseong, et al.
Published: (2024)
Large Language Models aren't all that you need
by: Holla, Kiran Voderhobli, et al.
Published: (2024)
by: Holla, Kiran Voderhobli, et al.
Published: (2024)
Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
by: Chen, Yida, et al.
Published: (2025)
by: Chen, Yida, et al.
Published: (2025)
Think before you speak: Training Language Models With Pause Tokens
by: Goyal, Sachin, et al.
Published: (2023)
by: Goyal, Sachin, et al.
Published: (2023)
2SSP: A Two-Stage Framework for Structured Pruning of LLMs
by: Sandri, Fabrizio, et al.
Published: (2025)
by: Sandri, Fabrizio, et al.
Published: (2025)
Clustering-driven Memory Compression for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2026)
by: Bohdal, Ondrej, et al.
Published: (2026)
Deep sequence models tend to memorize geometrically; it is unclear why
by: Noroozizadeh, Shahriar, et al.
Published: (2025)
by: Noroozizadeh, Shahriar, et al.
Published: (2025)
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
by: Nagarajan, Vaishnavh, et al.
Published: (2025)
by: Nagarajan, Vaishnavh, et al.
Published: (2025)
K-Merge: Online Continual Merging of Adapters for On-device Large Language Models
by: Shenaj, Donald, et al.
Published: (2025)
by: Shenaj, Donald, et al.
Published: (2025)
Efficient Compositional Multi-tasking for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2025)
by: Bohdal, Ondrej, et al.
Published: (2025)
100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances
by: Pacchiardi, Lorenzo, et al.
Published: (2024)
by: Pacchiardi, Lorenzo, et al.
Published: (2024)
Data-driven Clustering and Merging of Adapters for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2026)
by: Bohdal, Ondrej, et al.
Published: (2026)
Detecting mental disorder on social media: a ChatGPT-augmented explainable approach
by: Belcastro, Loris, et al.
Published: (2024)
by: Belcastro, Loris, et al.
Published: (2024)
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
by: Verdini, Francesco, et al.
Published: (2024)
by: Verdini, Francesco, et al.
Published: (2024)
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
by: Munkhdalai, Tsendsuren, et al.
Published: (2024)
by: Munkhdalai, Tsendsuren, et al.
Published: (2024)
Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books
by: Papoudakis, Argyrios, et al.
Published: (2026)
by: Papoudakis, Argyrios, et al.
Published: (2026)
More Compute Is What You Need
by: Guo, Zhen
Published: (2024)
by: Guo, Zhen
Published: (2024)
What Matters for Model Merging at Scale?
by: Yadav, Prateek, et al.
Published: (2024)
by: Yadav, Prateek, et al.
Published: (2024)
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
by: Hammoud, Hasan Abed Al Kader, et al.
Published: (2024)
by: Hammoud, Hasan Abed Al Kader, et al.
Published: (2024)
What Scales in Cross-Entropy Scaling Law?
by: Yan, Junxi, et al.
Published: (2025)
by: Yan, Junxi, et al.
Published: (2025)
What Matters in Transformers? Not All Attention is Needed
by: He, Shwai, et al.
Published: (2024)
by: He, Shwai, et al.
Published: (2024)
What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
by: Cheng, Stephen, et al.
Published: (2026)
by: Cheng, Stephen, et al.
Published: (2026)
Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)
by: Miyashita, Hisashi
Published: (2026)
by: Miyashita, Hisashi
Published: (2026)
Reasoning Models Don't Always Say What They Think
by: Chen, Yanda, et al.
Published: (2025)
by: Chen, Yanda, et al.
Published: (2025)
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
by: Gopalakrishnan, Anand, et al.
Published: (2025)
by: Gopalakrishnan, Anand, et al.
Published: (2025)
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
by: Fan, Hehe, et al.
Published: (2025)
by: Fan, Hehe, et al.
Published: (2025)
What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics
by: Bird, Jordan J.
Published: (2024)
by: Bird, Jordan J.
Published: (2024)
Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)
by: Zhu, Jinhua, et al.
Published: (2024)
What is it for a Machine Learning Model to Have a Capability?
by: Harding, Jacqueline, et al.
Published: (2024)
by: Harding, Jacqueline, et al.
Published: (2024)
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
by: Li, Ming, et al.
Published: (2024)
by: Li, Ming, et al.
Published: (2024)
What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
by: Lv, Keyu, et al.
Published: (2026)
by: Lv, Keyu, et al.
Published: (2026)
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
by: Liu, Wei, et al.
Published: (2023)
by: Liu, Wei, et al.
Published: (2023)
Gate-level boolean evolutionary geometric attention neural networks
by: Shi, Xianshuai, et al.
Published: (2025)
by: Shi, Xianshuai, et al.
Published: (2025)
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
PairBench: Are Vision-Language Models Reliable at Comparing What They See?
by: Feizi, Aarash, et al.
Published: (2025)
by: Feizi, Aarash, et al.
Published: (2025)
Similar Items
-
Beyond Position: the emergence of wavelet-like properties in Transformers
by: Ruscio, Valeria, et al.
Published: (2024) -
The Phenomenology of Hallucinations
by: Ruscio, Valeria, et al.
Published: (2026) -
Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space
by: Ruscio, Valeria, et al.
Published: (2026) -
$\nabla τ$: Gradient-based and Task-Agnostic machine Unlearning
by: Trippa, Daniel, et al.
Published: (2024) -
HOP to the Next Tasks and Domains for Continual Learning in NLP
by: Michieli, Umberto, et al.
Published: (2024)