Saved in:
| Main Authors: | Ju, Yiming, Ma, Huanhuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.07715 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DataDignity: Training Data Attribution for Large Language Models
by: Li, Xiaomin, et al.
Published: (2026)
by: Li, Xiaomin, et al.
Published: (2026)
KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models
by: Ju, Yiming, et al.
Published: (2023)
by: Ju, Yiming, et al.
Published: (2023)
Integrating Large Language Model for Improved Causal Discovery
by: Ban, Taiyu, et al.
Published: (2023)
by: Ban, Taiyu, et al.
Published: (2023)
Provable Training Data Identification for Large Language Models
by: Liu, Zhenlong, et al.
Published: (2025)
by: Liu, Zhenlong, et al.
Published: (2025)
Conda: Column-Normalized Adam for Training Large Language Models Faster
by: Wang, Junjie, et al.
Published: (2025)
by: Wang, Junjie, et al.
Published: (2025)
Evolving Subnetwork Training for Large Language Models
by: Li, Hanqi, et al.
Published: (2024)
by: Li, Hanqi, et al.
Published: (2024)
Detecting Training Data of Large Language Models via Expectation Maximization
by: Kim, Gyuwan, et al.
Published: (2024)
by: Kim, Gyuwan, et al.
Published: (2024)
Data Management For Training Large Language Models: A Survey
by: Wang, Zige, et al.
Published: (2023)
by: Wang, Zige, et al.
Published: (2023)
Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
Regurgitative Training: The Value of Real Data in Training Large Language Models
by: Zhang, Jinghui, et al.
Published: (2024)
by: Zhang, Jinghui, et al.
Published: (2024)
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science
by: Zhang, Shaolei, et al.
Published: (2025)
by: Zhang, Shaolei, et al.
Published: (2025)
MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
by: Wang, Lionel Z., et al.
Published: (2024)
by: Wang, Lionel Z., et al.
Published: (2024)
Online Training of Large Language Models: Learn while chatting
by: Liang, Juhao, et al.
Published: (2024)
by: Liang, Juhao, et al.
Published: (2024)
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
by: Zhang, Chi, et al.
Published: (2024)
by: Zhang, Chi, et al.
Published: (2024)
Actor-Critic based Online Data Mixing For Language Model Pre-Training
by: Ma, Jing, et al.
Published: (2025)
by: Ma, Jing, et al.
Published: (2025)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
Model Merging Scaling Laws in Large Language Models
by: Wang, Yuanyi, et al.
Published: (2025)
by: Wang, Yuanyi, et al.
Published: (2025)
The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA
by: Li, Yiming, et al.
Published: (2024)
by: Li, Yiming, et al.
Published: (2024)
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models
by: Bai, Yang, et al.
Published: (2024)
by: Bai, Yang, et al.
Published: (2024)
ImF: Implicit Fingerprint for Large Language Models
by: Wu, Jiaxuan, et al.
Published: (2025)
by: Wu, Jiaxuan, et al.
Published: (2025)
Eurekaverse: Environment Curriculum Generation via Large Language Models
by: Liang, William, et al.
Published: (2024)
by: Liang, William, et al.
Published: (2024)
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
by: Amin, Kareem, et al.
Published: (2025)
by: Amin, Kareem, et al.
Published: (2025)
Temporal Relational Reasoning of Large Language Models for Detecting Stock Portfolio Crashes
by: Koa, Kelvin J. L., et al.
Published: (2024)
by: Koa, Kelvin J. L., et al.
Published: (2024)
Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models
by: Zhang, Shuo, et al.
Published: (2025)
by: Zhang, Shuo, et al.
Published: (2025)
Extracting Training Dialogue Data from Large Language Model based Task Bots
by: Zhang, Shuo, et al.
Published: (2026)
by: Zhang, Shuo, et al.
Published: (2026)
On the Effectiveness of Incremental Training of Large Language Models
by: Li, Miles Q., et al.
Published: (2024)
by: Li, Miles Q., et al.
Published: (2024)
Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning
by: Lee, Max J. L., et al.
Published: (2024)
by: Lee, Max J. L., et al.
Published: (2024)
Benchmarking Large Language Models on Homework Assessment in Circuit Analysis
by: Chen, Liangliang, et al.
Published: (2025)
by: Chen, Liangliang, et al.
Published: (2025)
Polymetis:Large Language Modeling for Multiple Material Domains
by: Huang, Chao, et al.
Published: (2024)
by: Huang, Chao, et al.
Published: (2024)
Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction
by: Xu, Yiming, et al.
Published: (2025)
by: Xu, Yiming, et al.
Published: (2025)
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
by: Huang, Yiming, et al.
Published: (2024)
by: Huang, Yiming, et al.
Published: (2024)
Training Overhead Ratio: A Practical Reliability Metric for Large Language Model Training Systems
by: Lu, Ning, et al.
Published: (2024)
by: Lu, Ning, et al.
Published: (2024)
Training-Free Unsupervised Prompt for Vision-Language Models
by: Long, Sifan, et al.
Published: (2024)
by: Long, Sifan, et al.
Published: (2024)
Sparse Training of Discrete Diffusion Models for Graph Generation
by: Qin, Yiming, et al.
Published: (2023)
by: Qin, Yiming, et al.
Published: (2023)
Reconstructing Training Data from Adapter-based Federated Large Language Models
by: Chen, Silong, et al.
Published: (2026)
by: Chen, Silong, et al.
Published: (2026)
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training
by: Shi, Weijie, et al.
Published: (2025)
by: Shi, Weijie, et al.
Published: (2025)
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
by: Zhao, Haiteng, et al.
Published: (2025)
by: Zhao, Haiteng, et al.
Published: (2025)
Evolve Cost-aware Acquisition Functions Using Large Language Models
by: Yao, Yiming, et al.
Published: (2024)
by: Yao, Yiming, et al.
Published: (2024)
Hyperbolic Large Language Models
by: Patil, Sarang, et al.
Published: (2025)
by: Patil, Sarang, et al.
Published: (2025)
Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language Models
by: Wei, Yifan, et al.
Published: (2024)
by: Wei, Yifan, et al.
Published: (2024)
Similar Items
-
DataDignity: Training Data Attribution for Large Language Models
by: Li, Xiaomin, et al.
Published: (2026) -
KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models
by: Ju, Yiming, et al.
Published: (2023) -
Integrating Large Language Model for Improved Causal Discovery
by: Ban, Taiyu, et al.
Published: (2023) -
Provable Training Data Identification for Large Language Models
by: Liu, Zhenlong, et al.
Published: (2025) -
Conda: Column-Normalized Adam for Training Large Language Models Faster
by: Wang, Junjie, et al.
Published: (2025)