Saved in:
| Main Authors: | Zhang, Jifan, Luo, Ziyue, Liu, Jia, Shroff, Ness, Nowak, Robert |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.02755 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
by: Luo, Ziyue, et al.
Published: (2025)
by: Luo, Ziyue, et al.
Published: (2025)
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
by: Ali, Mehdi, et al.
Published: (2025)
by: Ali, Mehdi, et al.
Published: (2025)
Constraint-Rectified Training for Efficient Chain-of-Thought
by: Wu, Qinhang, et al.
Published: (2026)
by: Wu, Qinhang, et al.
Published: (2026)
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
by: Nourzad, Fatemeh, et al.
Published: (2025)
by: Nourzad, Fatemeh, et al.
Published: (2025)
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?
by: Ju, Peizhong, et al.
Published: (2024)
by: Ju, Peizhong, et al.
Published: (2024)
Large Language Models Achieve Gold Medal Performance at the International Olympiad on Astronomy & Astrophysics (IOAA)
by: Pinheiro, Lucas Carrit Delgado, et al.
Published: (2025)
by: Pinheiro, Lucas Carrit Delgado, et al.
Published: (2025)
From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models
by: Liang, Yuchen, et al.
Published: (2026)
by: Liang, Yuchen, et al.
Published: (2026)
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
by: Liang, Yuchen, et al.
Published: (2024)
by: Liang, Yuchen, et al.
Published: (2024)
AHA: Human-Assisted Out-of-Distribution Generalization and Detection
by: Bai, Haoyue, et al.
Published: (2024)
by: Bai, Haoyue, et al.
Published: (2024)
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
by: Matton, Katie, et al.
Published: (2025)
by: Matton, Katie, et al.
Published: (2025)
Benchmarking LLMs' Judgments with No Gold Standard
by: Xu, Shengwei, et al.
Published: (2024)
by: Xu, Shengwei, et al.
Published: (2024)
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
by: Wang, Yiming, et al.
Published: (2025)
by: Wang, Yiming, et al.
Published: (2025)
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
by: Chen, Zhixun, et al.
Published: (2025)
by: Chen, Zhixun, et al.
Published: (2025)
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
by: Saada, Thiziri Nait, et al.
Published: (2025)
by: Saada, Thiziri Nait, et al.
Published: (2025)
Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual
by: Li, Yining, et al.
Published: (2026)
by: Li, Yining, et al.
Published: (2026)
Sharp Convergence Rates for Masked Diffusion Models
by: Liang, Yuchen, et al.
Published: (2026)
by: Liang, Yuchen, et al.
Published: (2026)
Pretraining Large Language Models with NVFP4
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data
by: Wang, Xinyi, et al.
Published: (2024)
by: Wang, Xinyi, et al.
Published: (2024)
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
by: Lyu, Bohan, et al.
Published: (2025)
by: Lyu, Bohan, et al.
Published: (2025)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers
by: Liang, Yuchen, et al.
Published: (2024)
by: Liang, Yuchen, et al.
Published: (2024)
Procedural Pretraining: Warming Up Language Models with Abstract Data
by: Jiang, Liangze, et al.
Published: (2026)
by: Jiang, Liangze, et al.
Published: (2026)
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
by: Sam, Dylan, et al.
Published: (2025)
by: Sam, Dylan, et al.
Published: (2025)
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
by: Bhatt, Gantavya, et al.
Published: (2024)
by: Bhatt, Gantavya, et al.
Published: (2024)
Using Hallucinations to Bypass GPT4's Filter
by: Lemkin, Benjamin
Published: (2024)
by: Lemkin, Benjamin
Published: (2024)
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
by: Zhang, Meishan, et al.
Published: (2025)
by: Zhang, Meishan, et al.
Published: (2025)
Revisiting Multilingual Data Mixtures in Language Model Pretraining
by: Foroutan, Negar, et al.
Published: (2025)
by: Foroutan, Negar, et al.
Published: (2025)
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025)
by: Behnia, Tina, et al.
Published: (2025)
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
by: Bhaskar, Adithya, et al.
Published: (2024)
by: Bhaskar, Adithya, et al.
Published: (2024)
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
by: Liang, Yuchen, et al.
Published: (2025)
by: Liang, Yuchen, et al.
Published: (2025)
How to Find the Exact Pareto Front for Multi-Objective MDPs?
by: Li, Yining, et al.
Published: (2024)
by: Li, Yining, et al.
Published: (2024)
Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization
by: Cao, Linfeng, et al.
Published: (2025)
by: Cao, Linfeng, et al.
Published: (2025)
Data Mixing for Large Language Models Pretraining: A Survey and Outlook
by: Chen, Zhuo, et al.
Published: (2026)
by: Chen, Zhuo, et al.
Published: (2026)
Language Models Improve When Pretraining Data Matches Target Tasks
by: Mizrahi, David, et al.
Published: (2025)
by: Mizrahi, David, et al.
Published: (2025)
Temporal Entailment Pretraining for Clinical Language Models over EHR Data
by: Tanaka, Tatsunori, et al.
Published: (2025)
by: Tanaka, Tatsunori, et al.
Published: (2025)
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
by: Kıcıman, Emre, et al.
Published: (2023)
by: Kıcıman, Emre, et al.
Published: (2023)
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
by: Wei, Jiaheng, et al.
Published: (2024)
by: Wei, Jiaheng, et al.
Published: (2024)
Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation ?
by: Chiaranaipanich, Jirat, et al.
Published: (2024)
by: Chiaranaipanich, Jirat, et al.
Published: (2024)
On Training Data Influence of GPT Models
by: Chai, Yekun, et al.
Published: (2024)
by: Chai, Yekun, et al.
Published: (2024)
Monitoring State Transitions in Markovian Systems with Sampling Cost
by: Saurav, Kumar, et al.
Published: (2025)
by: Saurav, Kumar, et al.
Published: (2025)
Similar Items
-
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
by: Luo, Ziyue, et al.
Published: (2025) -
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
by: Ali, Mehdi, et al.
Published: (2025) -
Constraint-Rectified Training for Efficient Chain-of-Thought
by: Wu, Qinhang, et al.
Published: (2026) -
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
by: Nourzad, Fatemeh, et al.
Published: (2025) -
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?
by: Ju, Peizhong, et al.
Published: (2024)