Saved in:
| Main Authors: | Yam, Hong Meng, Paek, Nathan J |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.06672 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mini Minds: Exploring Bebeshka and Zlata Baby Models
by: Proskurina, Irina, et al.
Published: (2023)
by: Proskurina, Irina, et al.
Published: (2023)
Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective
by: Gonzalez, Emmanuel Anaya, et al.
Published: (2025)
by: Gonzalez, Emmanuel Anaya, et al.
Published: (2025)
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)
by: Wang, Shengao, et al.
Published: (2025)
Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models
by: Zhou, Ke, et al.
Published: (2025)
by: Zhou, Ke, et al.
Published: (2025)
Should You Use Your Large Language Model to Explore or Exploit?
by: Harris, Keegan, et al.
Published: (2025)
by: Harris, Keegan, et al.
Published: (2025)
When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?
by: Iyer, Srikrishna
Published: (2024)
by: Iyer, Srikrishna
Published: (2024)
Baby Scale: Investigating Models Trained on Individual Children's Language Input
by: Feng, Steven Y., et al.
Published: (2026)
by: Feng, Steven Y., et al.
Published: (2026)
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
by: Zhao, Jun, et al.
Published: (2024)
by: Zhao, Jun, et al.
Published: (2024)
Bias Dynamics in BabyLMs: Towards a Compute-Efficient Sandbox for Democratising Pre-Training Debiasing
by: Trhlik, Filip, et al.
Published: (2026)
by: Trhlik, Filip, et al.
Published: (2026)
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
by: Aynetdinov, Ansar, et al.
Published: (2026)
by: Aynetdinov, Ansar, et al.
Published: (2026)
Learning to Plan for Language Modeling from Unlabeled Data
by: Cornille, Nathan, et al.
Published: (2024)
by: Cornille, Nathan, et al.
Published: (2024)
Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data
by: Ahmad, Zishan, et al.
Published: (2025)
by: Ahmad, Zishan, et al.
Published: (2025)
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
by: Smit, Andries, et al.
Published: (2023)
by: Smit, Andries, et al.
Published: (2023)
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models
by: Karov, Bar, et al.
Published: (2025)
by: Karov, Bar, et al.
Published: (2025)
Exploring Data-Efficient Adaptation of Large Language Models for Code Generation
by: Jiang, Xue, et al.
Published: (2024)
by: Jiang, Xue, et al.
Published: (2024)
Models Can and Should Embrace the Communicative Nature of Human-Generated Math
by: Boguraev, Sasha, et al.
Published: (2024)
by: Boguraev, Sasha, et al.
Published: (2024)
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
by: Xie, Xudong, et al.
Published: (2024)
by: Xie, Xudong, et al.
Published: (2024)
Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
Balanced Data Sampling for Language Model Training with Clustering
by: Shao, Yunfan, et al.
Published: (2024)
by: Shao, Yunfan, et al.
Published: (2024)
You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models
by: Mąka, Paweł, et al.
Published: (2025)
by: Mąka, Paweł, et al.
Published: (2025)
What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis
by: Wang, Peiran, et al.
Published: (2025)
by: Wang, Peiran, et al.
Published: (2025)
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
Dense X Retrieval: What Retrieval Granularity Should We Use?
by: Chen, Tong, et al.
Published: (2023)
by: Chen, Tong, et al.
Published: (2023)
Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs
by: Wagner, Eitan, et al.
Published: (2025)
by: Wagner, Eitan, et al.
Published: (2025)
WebDS: An End-to-End Benchmark for Web-based Data Science
by: Hsu, Ethan, et al.
Published: (2025)
by: Hsu, Ethan, et al.
Published: (2025)
Exploring the Performance of Large Language Models on Subjective Span Identification Tasks
by: Dmonte, Alphaeus, et al.
Published: (2026)
by: Dmonte, Alphaeus, et al.
Published: (2026)
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
by: Zhang, Liyi, et al.
Published: (2024)
by: Zhang, Liyi, et al.
Published: (2024)
Target-Aware Language Modeling via Granular Data Sampling
by: Chang, Ernie, et al.
Published: (2024)
by: Chang, Ernie, et al.
Published: (2024)
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
by: Hase, Peter, et al.
Published: (2024)
by: Hase, Peter, et al.
Published: (2024)
BabyReasoningBench: Generating Developmentally-Inspired Reasoning Tasks for Evaluating Baby Language Models
by: Dhole, Kaustubh D.
Published: (2026)
by: Dhole, Kaustubh D.
Published: (2026)
Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications
by: Kanjirangat, Vani, et al.
Published: (2025)
by: Kanjirangat, Vani, et al.
Published: (2025)
Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
by: Zhang, Yang, et al.
Published: (2025)
by: Zhang, Yang, et al.
Published: (2025)
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
by: von Recum, Alexander, et al.
Published: (2024)
by: von Recum, Alexander, et al.
Published: (2024)
Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement
by: Kong, Injin, et al.
Published: (2026)
by: Kong, Injin, et al.
Published: (2026)
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models
by: Ranaldi, Leonardo, et al.
Published: (2023)
by: Ranaldi, Leonardo, et al.
Published: (2023)
Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models
by: Zeng, Linda, et al.
Published: (2026)
by: Zeng, Linda, et al.
Published: (2026)
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models
by: Haller, Patrick, et al.
Published: (2024)
by: Haller, Patrick, et al.
Published: (2024)
When Should Models Change Their Minds? Contextual Belief Management in Large Language Models
by: Xu, Haoming, et al.
Published: (2026)
by: Xu, Haoming, et al.
Published: (2026)
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
by: Tang, Zhenheng, et al.
Published: (2025)
by: Tang, Zhenheng, et al.
Published: (2025)
Where Should I Study? Biased Language Models Decide! Evaluating Fairness in LMs for Academic Recommendations
by: Shailya, Krithi, et al.
Published: (2025)
by: Shailya, Krithi, et al.
Published: (2025)
Similar Items
-
Mini Minds: Exploring Bebeshka and Zlata Baby Models
by: Proskurina, Irina, et al.
Published: (2023) -
Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective
by: Gonzalez, Emmanuel Anaya, et al.
Published: (2025) -
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025) -
Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models
by: Zhou, Ke, et al.
Published: (2025) -
Should You Use Your Large Language Model to Explore or Exploit?
by: Harris, Keegan, et al.
Published: (2025)