:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ruscio, Valeria, Nanni, Umberto, Silvestri, Fabrizio
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2508.02546
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Beyond Position: the emergence of wavelet-like properties in Transformers
by: Ruscio, Valeria, et al.
Published: (2024)

The Phenomenology of Hallucinations
by: Ruscio, Valeria, et al.
Published: (2026)

Where Pretraining writes and Alignment reads: the asymmetry of Transformer weight space
by: Ruscio, Valeria, et al.
Published: (2026)

$\nabla τ$: Gradient-based and Task-Agnostic machine Unlearning
by: Trippa, Daniel, et al.
Published: (2024)

HOP to the Next Tasks and Domains for Continual Learning in NLP
by: Michieli, Umberto, et al.
Published: (2024)

TransformerFAM: Feedback attention is working memory
by: Hwang, Dongseong, et al.
Published: (2024)

Large Language Models aren't all that you need
by: Holla, Kiran Voderhobli, et al.
Published: (2024)

Your thoughts tell who you are: Characterize the reasoning patterns of LRMs
by: Chen, Yida, et al.
Published: (2025)

Think before you speak: Training Language Models With Pause Tokens
by: Goyal, Sachin, et al.
Published: (2023)

2SSP: A Two-Stage Framework for Structured Pruning of LLMs
by: Sandri, Fabrizio, et al.
Published: (2025)

Clustering-driven Memory Compression for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2026)

Deep sequence models tend to memorize geometrically; it is unclear why
by: Noroozizadeh, Shahriar, et al.
Published: (2025)

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
by: Nagarajan, Vaishnavh, et al.
Published: (2025)

K-Merge: Online Continual Merging of Adapters for On-device Large Language Models
by: Shenaj, Donald, et al.
Published: (2025)

Efficient Compositional Multi-tasking for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2025)

100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances
by: Pacchiardi, Lorenzo, et al.
Published: (2024)

Data-driven Clustering and Merging of Adapters for On-device Large Language Models
by: Bohdal, Ondrej, et al.
Published: (2026)

Detecting mental disorder on social media: a ChatGPT-augmented explainable approach
by: Belcastro, Loris, et al.
Published: (2024)

How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
by: Verdini, Francesco, et al.
Published: (2024)

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
by: Munkhdalai, Tsendsuren, et al.
Published: (2024)

Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books
by: Papoudakis, Argyrios, et al.
Published: (2026)

More Compute Is What You Need
by: Guo, Zhen
Published: (2024)

What Matters for Model Merging at Scale?
by: Yadav, Prateek, et al.
Published: (2024)

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
by: Hammoud, Hasan Abed Al Kader, et al.
Published: (2024)

What Scales in Cross-Entropy Scaling Law?
by: Yan, Junxi, et al.
Published: (2025)

What Matters in Transformers? Not All Attention is Needed
by: He, Shwai, et al.
Published: (2024)

What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
by: Cheng, Stephen, et al.
Published: (2026)

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)
by: Miyashita, Hisashi
Published: (2026)

Reasoning Models Don't Always Say What They Think
by: Chen, Yanda, et al.
Published: (2025)

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
by: Gopalakrishnan, Anand, et al.
Published: (2025)

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
by: Fan, Hehe, et al.
Published: (2025)

What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics
by: Bird, Jordan J.
Published: (2024)

Concurrent Linguistic Error Detection (CLED): a New Methodology for Error Detection in Large Language Models
by: Zhu, Jinhua, et al.
Published: (2024)

What is it for a Machine Learning Model to Have a Capability?
by: Harding, Jacqueline, et al.
Published: (2024)

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
by: Li, Ming, et al.
Published: (2024)

What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study
by: Lv, Keyu, et al.
Published: (2026)

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
by: Liu, Wei, et al.
Published: (2023)

Gate-level boolean evolutionary geometric attention neural networks
by: Shi, Xianshuai, et al.
Published: (2025)

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)

PairBench: Are Vision-Language Models Reliable at Comparing What They See?
by: Feizi, Aarash, et al.
Published: (2025)