Saved in:
| Main Author: | Iyer, Srikrishna |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.16487 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mini Minds: Exploring Bebeshka and Zlata Baby Models
by: Proskurina, Irina, et al.
Published: (2023)
by: Proskurina, Irina, et al.
Published: (2023)
BabyReasoningBench: Generating Developmentally-Inspired Reasoning Tasks for Evaluating Baby Language Models
by: Dhole, Kaustubh D.
Published: (2026)
by: Dhole, Kaustubh D.
Published: (2026)
Baby Scale: Investigating Models Trained on Individual Children's Language Input
by: Feng, Steven Y., et al.
Published: (2026)
by: Feng, Steven Y., et al.
Published: (2026)
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025)
by: Wang, Shengao, et al.
Published: (2025)
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance
by: Yam, Hong Meng, et al.
Published: (2024)
by: Yam, Hong Meng, et al.
Published: (2024)
Bias Dynamics in BabyLMs: Towards a Compute-Efficient Sandbox for Democratising Pre-Training Debiasing
by: Trhlik, Filip, et al.
Published: (2026)
by: Trhlik, Filip, et al.
Published: (2026)
Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models
by: Zeng, Linda, et al.
Published: (2026)
by: Zeng, Linda, et al.
Published: (2026)
EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data
by: Lin, Dongyan, et al.
Published: (2026)
by: Lin, Dongyan, et al.
Published: (2026)
Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy
by: Hu, Desheng, et al.
Published: (2025)
by: Hu, Desheng, et al.
Published: (2025)
Teaching-Assistant-in-the-Loop: Improving Knowledge Distillation from Imperfect Teacher Models in Low-Budget Scenarios
by: Zhou, Yuhang, et al.
Published: (2024)
by: Zhou, Yuhang, et al.
Published: (2024)
Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation
by: Liu, Yanjiang, et al.
Published: (2026)
by: Liu, Yanjiang, et al.
Published: (2026)
Don't Kill the Baby: The Case for AI in Arbitration
by: Broyde, Michael, et al.
Published: (2024)
by: Broyde, Michael, et al.
Published: (2024)
Multi-agent AI systems outperform human teams in creativity
by: Hu, Tiancheng, et al.
Published: (2026)
by: Hu, Tiancheng, et al.
Published: (2026)
Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher
by: Ok, Hyunjong, et al.
Published: (2024)
by: Ok, Hyunjong, et al.
Published: (2024)
BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data
by: Tastet, Jean-Loup, et al.
Published: (2024)
by: Tastet, Jean-Loup, et al.
Published: (2024)
BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop
by: Charpentier, Lucas, et al.
Published: (2025)
by: Charpentier, Lucas, et al.
Published: (2025)
ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context
by: Kim, Joongwon, et al.
Published: (2025)
by: Kim, Joongwon, et al.
Published: (2025)
Code-enabled language models can outperform reasoning models on diverse tasks
by: Zhang, Cedegao E., et al.
Published: (2025)
by: Zhang, Cedegao E., et al.
Published: (2025)
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?
by: Che, Xinyu, et al.
Published: (2026)
by: Che, Xinyu, et al.
Published: (2026)
On Teacher Hacking in Language Model Distillation
by: Tiapkin, Daniil, et al.
Published: (2025)
by: Tiapkin, Daniil, et al.
Published: (2025)
LLMs Can Teach Themselves to Better Predict the Future
by: Turtel, Benjamin, et al.
Published: (2025)
by: Turtel, Benjamin, et al.
Published: (2025)
Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization
by: Sumit, Dipto, et al.
Published: (2026)
by: Sumit, Dipto, et al.
Published: (2026)
BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop
by: Choshen, Leshem, et al.
Published: (2026)
by: Choshen, Leshem, et al.
Published: (2026)
Can postgraduate translation students identify machine-generated text?
by: Farrell, Michael
Published: (2025)
by: Farrell, Michael
Published: (2025)
When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models
by: Kostelec, Juan Gabriel, et al.
Published: (2026)
by: Kostelec, Juan Gabriel, et al.
Published: (2026)
CLARity: Reasoning Consistency Alone Can Teach Reinforced Experts
by: Lin, Jiuheng, et al.
Published: (2025)
by: Lin, Jiuheng, et al.
Published: (2025)
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
by: Xu, Wenda, et al.
Published: (2024)
by: Xu, Wenda, et al.
Published: (2024)
Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction
by: Li, Xinhe, et al.
Published: (2025)
by: Li, Xinhe, et al.
Published: (2025)
Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines
by: Jørgensen, Mikkel Godsk, et al.
Published: (2026)
by: Jørgensen, Mikkel Godsk, et al.
Published: (2026)
Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation
by: Wang, Bing, et al.
Published: (2026)
by: Wang, Bing, et al.
Published: (2026)
When Can Transformers Count to n?
by: Yehudai, Gilad, et al.
Published: (2024)
by: Yehudai, Gilad, et al.
Published: (2024)
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
by: Ning, Xuefei, et al.
Published: (2024)
by: Ning, Xuefei, et al.
Published: (2024)
Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review
by: Li, Zhuochun, et al.
Published: (2024)
by: Li, Zhuochun, et al.
Published: (2024)
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
by: Wang, Tianle, et al.
Published: (2026)
by: Wang, Tianle, et al.
Published: (2026)
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
by: He, Nan, et al.
Published: (2023)
by: He, Nan, et al.
Published: (2023)
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
by: Nag, Sayan, et al.
Published: (2024)
by: Nag, Sayan, et al.
Published: (2024)
Are BabyLMs Second Language Learners?
by: Edman, Lukas, et al.
Published: (2024)
by: Edman, Lukas, et al.
Published: (2024)
ELAD: Explanation-Guided Large Language Models Active Distillation
by: Zhang, Yifei, et al.
Published: (2024)
by: Zhang, Yifei, et al.
Published: (2024)
ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
by: Lee, Junseok, et al.
Published: (2026)
by: Lee, Junseok, et al.
Published: (2026)
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)
by: Yang, Wenkai, et al.
Published: (2026)
Similar Items
-
Mini Minds: Exploring Bebeshka and Zlata Baby Models
by: Proskurina, Irina, et al.
Published: (2023) -
BabyReasoningBench: Generating Developmentally-Inspired Reasoning Tasks for Evaluating Baby Language Models
by: Dhole, Kaustubh D.
Published: (2026) -
Baby Scale: Investigating Models Trained on Individual Children's Language Input
by: Feng, Steven Y., et al.
Published: (2026) -
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
by: Wang, Shengao, et al.
Published: (2025) -
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance
by: Yam, Hong Meng, et al.
Published: (2024)