Saved in:
| Main Authors: | Merad, Ibrahim, Wolf, Amos, Mazzawi, Ziad, Léo, Yannick |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.13924 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning without training: The implicit dynamics of in-context learning
by: Dherin, Benoit, et al.
Published: (2025)
by: Dherin, Benoit, et al.
Published: (2025)
Robust Stochastic Optimization via Gradient Quantile Clipping
by: Merad, Ibrahim, et al.
Published: (2023)
by: Merad, Ibrahim, et al.
Published: (2023)
Convergence and concentration properties of constant step-size SGD through Markov chains
by: Merad, Ibrahim, et al.
Published: (2023)
by: Merad, Ibrahim, et al.
Published: (2023)
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
by: Amos, Ido, et al.
Published: (2023)
by: Amos, Ido, et al.
Published: (2023)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)
by: Phan, Buu, et al.
Published: (2024)
Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases
by: Carbonari, Valentina, et al.
Published: (2025)
by: Carbonari, Valentina, et al.
Published: (2025)
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
by: Tyukin, Georgy, et al.
Published: (2024)
by: Tyukin, Georgy, et al.
Published: (2024)
Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models
by: Ali, Ameen, et al.
Published: (2025)
by: Ali, Ameen, et al.
Published: (2025)
ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages
by: Kundu, Swastika, et al.
Published: (2025)
by: Kundu, Swastika, et al.
Published: (2025)
MuonAll: Muon Variant for Efficient Finetuning of Large Language Models
by: Page, Saurabh, et al.
Published: (2025)
by: Page, Saurabh, et al.
Published: (2025)
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
by: Zheng, Qinqing, et al.
Published: (2024)
by: Zheng, Qinqing, et al.
Published: (2024)
All Language Models Large and Small
by: Chen, Zhixun, et al.
Published: (2024)
by: Chen, Zhixun, et al.
Published: (2024)
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
by: Kong, Lecheng, et al.
Published: (2024)
by: Kong, Lecheng, et al.
Published: (2024)
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
Open Implementation and Study of BEST-RQ for Speech Processing
by: Whetten, Ryan, et al.
Published: (2024)
by: Whetten, Ryan, et al.
Published: (2024)
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
by: Ma, Shuming, et al.
Published: (2024)
by: Ma, Shuming, et al.
Published: (2024)
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)
by: Li, Pengyi, et al.
Published: (2025)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
by: Liu, Zirui, et al.
Published: (2023)
by: Liu, Zirui, et al.
Published: (2023)
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
by: Sedykh, Ivan, et al.
Published: (2026)
by: Sedykh, Ivan, et al.
Published: (2026)
Cross-Entropy Attacks to Language Models via Rare Event Simulation
by: Ni, Mingze, et al.
Published: (2025)
by: Ni, Mingze, et al.
Published: (2025)
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation
by: Martinon, Grégoire, et al.
Published: (2026)
by: Martinon, Grégoire, et al.
Published: (2026)
Institutional-Level Monitoring of Immune Checkpoint Inhibitor IrAEs Using a Novel Natural Language Processing Algorithmic Pipeline
by: Shapiro, Michael, et al.
Published: (2024)
by: Shapiro, Michael, et al.
Published: (2024)
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
by: Katz, Shahar, et al.
Published: (2024)
by: Katz, Shahar, et al.
Published: (2024)
Improving Rare Word Translation With Dictionaries and Attention Masking
by: Sible, Kenneth J., et al.
Published: (2024)
by: Sible, Kenneth J., et al.
Published: (2024)
Compressing LLMs: The Truth is Rarely Pure and Never Simple
by: Jaiswal, Ajay, et al.
Published: (2023)
by: Jaiswal, Ajay, et al.
Published: (2023)
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
by: Lu, Keming, et al.
Published: (2024)
by: Lu, Keming, et al.
Published: (2024)
Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models
by: Zhang, Zheyu, et al.
Published: (2025)
by: Zhang, Zheyu, et al.
Published: (2025)
All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
by: Guo, Shiyuan, et al.
Published: (2025)
by: Guo, Shiyuan, et al.
Published: (2025)
Adjoint sharding for very long context training of state space models
by: Xu, Xingzi, et al.
Published: (2025)
by: Xu, Xingzi, et al.
Published: (2025)
One Model for All: Multi-Objective Controllable Language Models
by: He, Qiang, et al.
Published: (2026)
by: He, Qiang, et al.
Published: (2026)
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
by: Ali, Ameen, et al.
Published: (2024)
by: Ali, Ameen, et al.
Published: (2024)
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
by: Hu, Zhiyuan, et al.
Published: (2026)
by: Hu, Zhiyuan, et al.
Published: (2026)
ABCD: All Biases Come Disguised
by: Nowak, Mateusz, et al.
Published: (2026)
by: Nowak, Mateusz, et al.
Published: (2026)
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
by: Gupta, Akash, et al.
Published: (2025)
by: Gupta, Akash, et al.
Published: (2025)
Language is All a Graph Needs
by: Ye, Ruosong, et al.
Published: (2023)
by: Ye, Ruosong, et al.
Published: (2023)
Leveraging Large Language Models for Solving Rare MIP Challenges
by: Wang, Teng, et al.
Published: (2024)
by: Wang, Teng, et al.
Published: (2024)
FaithfulSAE: Towards Capturing Faithful Features with Sparse Autoencoders without External Dataset Dependencies
by: Cho, Seonglae, et al.
Published: (2025)
by: Cho, Seonglae, et al.
Published: (2025)
Evidence Is All You Need: Ordering Imaging Studies via Language Model Alignment with the ACR Appropriateness Criteria
by: Yao, Michael S., et al.
Published: (2024)
by: Yao, Michael S., et al.
Published: (2024)
Proofread: Fixes All Errors with One Tap
by: Liu, Renjie, et al.
Published: (2024)
by: Liu, Renjie, et al.
Published: (2024)
Putting It All into Context: Simplifying Agents with LCLMs
by: Jiang, Mingjian, et al.
Published: (2025)
by: Jiang, Mingjian, et al.
Published: (2025)
Similar Items
-
Learning without training: The implicit dynamics of in-context learning
by: Dherin, Benoit, et al.
Published: (2025) -
Robust Stochastic Optimization via Gradient Quantile Clipping
by: Merad, Ibrahim, et al.
Published: (2023) -
Convergence and concentration properties of constant step-size SGD through Markov chains
by: Merad, Ibrahim, et al.
Published: (2023) -
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
by: Amos, Ido, et al.
Published: (2023) -
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)