:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Merad, Ibrahim, Wolf, Amos, Mazzawi, Ziad, Léo, Yannick
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2412.13924
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning without training: The implicit dynamics of in-context learning
by: Dherin, Benoit, et al.
Published: (2025)

Robust Stochastic Optimization via Gradient Quantile Clipping
by: Merad, Ibrahim, et al.
Published: (2023)

Convergence and concentration properties of constant step-size SGD through Markov chains
by: Merad, Ibrahim, et al.
Published: (2023)

Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
by: Amos, Ido, et al.
Published: (2023)

Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)

Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases
by: Carbonari, Valentina, et al.
Published: (2025)

Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
by: Tyukin, Georgy, et al.
Published: (2024)

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models
by: Ali, Ameen, et al.
Published: (2025)

ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages
by: Kundu, Swastika, et al.
Published: (2025)

MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
by: Page, Saurabh, et al.
Published: (2025)

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
by: Zheng, Qinqing, et al.
Published: (2024)

All Language Models Large and Small
by: Chen, Zhixun, et al.
Published: (2024)

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
by: Kong, Lecheng, et al.
Published: (2024)

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
by: Wang, Hongyu, et al.
Published: (2024)

Open Implementation and Study of BEST-RQ for Speech Processing
by: Whetten, Ryan, et al.
Published: (2024)

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
by: Ma, Shuming, et al.
Published: (2024)

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
by: Liu, Zirui, et al.
Published: (2023)

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
by: Sedykh, Ivan, et al.
Published: (2026)

Cross-Entropy Attacks to Language Models via Rare Event Simulation
by: Ni, Mingze, et al.
Published: (2025)

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation
by: Martinon, Grégoire, et al.
Published: (2026)

Institutional-Level Monitoring of Immune Checkpoint Inhibitor IrAEs Using a Novel Natural Language Processing Algorithmic Pipeline
by: Shapiro, Michael, et al.
Published: (2024)

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)

Improving Rare Word Translation With Dictionaries and Attention Masking
by: Sible, Kenneth J., et al.
Published: (2024)

Compressing LLMs: The Truth is Rarely Pure and Never Simple
by: Jaiswal, Ajay, et al.
Published: (2023)

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
by: Lu, Keming, et al.
Published: (2024)

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models
by: Zhang, Zheyu, et al.
Published: (2025)

All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
by: Guo, Shiyuan, et al.
Published: (2025)

Adjoint sharding for very long context training of state space models
by: Xu, Xingzi, et al.
Published: (2025)

One Model for All: Multi-Objective Controllable Language Models
by: He, Qiang, et al.
Published: (2026)

Mitigating Copy Bias in In-Context Learning through Neuron Pruning
by: Ali, Ameen, et al.
Published: (2024)

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
by: Hu, Zhiyuan, et al.
Published: (2026)

ABCD: All Biases Come Disguised
by: Nowak, Mateusz, et al.
Published: (2026)

Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
by: Gupta, Akash, et al.
Published: (2025)

Language is All a Graph Needs
by: Ye, Ruosong, et al.
Published: (2023)

Leveraging Large Language Models for Solving Rare MIP Challenges
by: Wang, Teng, et al.
Published: (2024)

FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies
by: Cho, Seonglae, et al.
Published: (2025)

Evidence Is All You Need: Ordering Imaging Studies via Language Model Alignment with the ACR Appropriateness Criteria
by: Yao, Michael S., et al.
Published: (2024)

Proofread: Fixes All Errors with One Tap
by: Liu, Renjie, et al.
Published: (2024)

Putting It All into Context: Simplifying Agents with LCLMs
by: Jiang, Mingjian, et al.
Published: (2025)