Saved in:
| Main Authors: | Kumar, Abhay, Owen, Louis, Chowdhury, Nilabhra Roy, Güra, Fabian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.02507 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Variance Control via Weight Rescaling in LLM Pre-training
by: Owen, Louis, et al.
Published: (2025)
by: Owen, Louis, et al.
Published: (2025)
A Refined Analysis of Massive Activations in LLMs
by: Owen, Louis, et al.
Published: (2025)
by: Owen, Louis, et al.
Published: (2025)
Domain-Adaptive Continued Pre-Training of Small Language Models
by: Faroz, Salman
Published: (2025)
by: Faroz, Salman
Published: (2025)
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
by: Zhu, Rui-Jie, et al.
Published: (2023)
by: Zhu, Rui-Jie, et al.
Published: (2023)
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)
by: Huang, Tianjin, et al.
Published: (2025)
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
by: Yang, Jaewoo, et al.
Published: (2024)
by: Yang, Jaewoo, et al.
Published: (2024)
On Limitations of LLM as Annotator for Low Resource Languages
by: Jadhav, Suramya, et al.
Published: (2024)
by: Jadhav, Suramya, et al.
Published: (2024)
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
by: Xing, Xingrun, et al.
Published: (2024)
by: Xing, Xingrun, et al.
Published: (2024)
CURE: Controlled Unlearning for Robust Embeddings -- Mitigating Conceptual Shortcuts in Pre-Trained Language Models
by: Kocak, Aysenur, et al.
Published: (2025)
by: Kocak, Aysenur, et al.
Published: (2025)
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
by: Zhuo, Le, et al.
Published: (2024)
by: Zhuo, Le, et al.
Published: (2024)
On Mitigating Code LLM Hallucinations with API Documentation
by: Jain, Nihal, et al.
Published: (2024)
by: Jain, Nihal, et al.
Published: (2024)
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
by: Thangarasa, Vithursan, et al.
Published: (2023)
by: Thangarasa, Vithursan, et al.
Published: (2023)
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging
by: Ju, Yiming, et al.
Published: (2024)
by: Ju, Yiming, et al.
Published: (2024)
BED: Bi-Encoder-Based Detectors for Out-of-Distribution Detection
by: Owen, Louis, et al.
Published: (2023)
by: Owen, Louis, et al.
Published: (2023)
Interpreting and Mitigating Unwanted Uncertainty in LLMs
by: Roy, Tiasa Singha, et al.
Published: (2025)
by: Roy, Tiasa Singha, et al.
Published: (2025)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
by: Zala, Abhay, et al.
Published: (2024)
by: Zala, Abhay, et al.
Published: (2024)
DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
by: Fu, Xue-Yong, et al.
Published: (2025)
by: Fu, Xue-Yong, et al.
Published: (2025)
PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction
by: Shrimal, Anubhav, et al.
Published: (2025)
by: Shrimal, Anubhav, et al.
Published: (2025)
Pre-Trained Policy Discriminators are General Reward Models
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation
by: Nguyen, Manh, et al.
Published: (2025)
by: Nguyen, Manh, et al.
Published: (2025)
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
by: Kumar, Avinash, et al.
Published: (2025)
by: Kumar, Avinash, et al.
Published: (2025)
$\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality
by: Li, Pengyu, et al.
Published: (2026)
by: Li, Pengyu, et al.
Published: (2026)
Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing
by: Liu, Tianci, et al.
Published: (2025)
by: Liu, Tianci, et al.
Published: (2025)
Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints
by: Cui, Xiaodong, et al.
Published: (2025)
by: Cui, Xiaodong, et al.
Published: (2025)
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale
by: Patel, Ajay, et al.
Published: (2026)
by: Patel, Ajay, et al.
Published: (2026)
Development of Pre-Trained Transformer-based Models for the Nepali Language
by: Thapa, Prajwal, et al.
Published: (2024)
by: Thapa, Prajwal, et al.
Published: (2024)
MultiGPrompt for Multi-Task Pre-Training and Prompting on Graphs
by: Yu, Xingtong, et al.
Published: (2023)
by: Yu, Xingtong, et al.
Published: (2023)
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
by: Chen, Sanxing, et al.
Published: (2025)
by: Chen, Sanxing, et al.
Published: (2025)
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
by: Pouransari, Hadi, et al.
Published: (2024)
by: Pouransari, Hadi, et al.
Published: (2024)
Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset
by: Chowdhury, Mohammed Nowshad Ruhani, et al.
Published: (2026)
by: Chowdhury, Mohammed Nowshad Ruhani, et al.
Published: (2026)
Reinforcement Learning on Pre-Training Data
by: Li, Siheng, et al.
Published: (2025)
by: Li, Siheng, et al.
Published: (2025)
Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training
by: Reynolds, John Graham
Published: (2025)
by: Reynolds, John Graham
Published: (2025)
Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
by: Belenki, Lior, et al.
Published: (2025)
by: Belenki, Lior, et al.
Published: (2025)
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
by: Opper, Mattia, et al.
Published: (2023)
by: Opper, Mattia, et al.
Published: (2023)
The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States
by: Ridder, Fabian, et al.
Published: (2024)
by: Ridder, Fabian, et al.
Published: (2024)
Leveraging User-Generated Reviews for Recommender Systems with Dynamic Headers
by: Vashishtha, Shanu, et al.
Published: (2024)
by: Vashishtha, Shanu, et al.
Published: (2024)
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
by: Zala, Abhay, et al.
Published: (2023)
by: Zala, Abhay, et al.
Published: (2023)
EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training
by: Pan, Chengjun, et al.
Published: (2026)
by: Pan, Chengjun, et al.
Published: (2026)
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
by: Nguyen, Hieu, et al.
Published: (2025)
by: Nguyen, Hieu, et al.
Published: (2025)
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
by: Lin, Han, et al.
Published: (2023)
by: Lin, Han, et al.
Published: (2023)
Similar Items
-
Variance Control via Weight Rescaling in LLM Pre-training
by: Owen, Louis, et al.
Published: (2025) -
A Refined Analysis of Massive Activations in LLMs
by: Owen, Louis, et al.
Published: (2025) -
Domain-Adaptive Continued Pre-Training of Small Language Models
by: Faroz, Salman
Published: (2025) -
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
by: Zhu, Rui-Jie, et al.
Published: (2023) -
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
by: Huang, Tianjin, et al.
Published: (2025)