Saved in:
| Main Authors: | Zhmoginov, Andrey, Lee, Jihwan, Sandler, Mark |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.05641 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Contextually Guided Transformers via Low-Rank Adaptation
by: Zhmoginov, Andrey, et al.
Published: (2025)
by: Zhmoginov, Andrey, et al.
Published: (2025)
Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning
by: Vladymyrov, Max, et al.
Published: (2023)
by: Vladymyrov, Max, et al.
Published: (2023)
Narrowing the Focus: Learned Optimizers for Pretrained Models
by: Kristiansen, Gus, et al.
Published: (2024)
by: Kristiansen, Gus, et al.
Published: (2024)
Learning and Unlearning of Fabricated Knowledge in Language Models
by: Sun, Chen, et al.
Published: (2024)
by: Sun, Chen, et al.
Published: (2024)
Long Context In-Context Compression by Getting to the Gist of Gisting
by: Petrov, Aleksandar, et al.
Published: (2025)
by: Petrov, Aleksandar, et al.
Published: (2025)
Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare
by: Gondara, Lovedeep, et al.
Published: (2025)
by: Gondara, Lovedeep, et al.
Published: (2025)
ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning
by: Su, Mingluo, et al.
Published: (2026)
by: Su, Mingluo, et al.
Published: (2026)
Need a Small Specialized Language Model? Plan Early!
by: Grangier, David, et al.
Published: (2024)
by: Grangier, David, et al.
Published: (2024)
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
by: Li, Pengyi, et al.
Published: (2025)
by: Li, Pengyi, et al.
Published: (2025)
LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models
by: Su, Yupeng, et al.
Published: (2024)
by: Su, Yupeng, et al.
Published: (2024)
How new data permeates LLM knowledge and how to dilute it
by: Sun, Chen, et al.
Published: (2025)
by: Sun, Chen, et al.
Published: (2025)
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation
by: Hsu, I-Hung, et al.
Published: (2024)
by: Hsu, I-Hung, et al.
Published: (2024)
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
by: Yu, Hantao, et al.
Published: (2025)
by: Yu, Hantao, et al.
Published: (2025)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
by: Kim, Doyoung, et al.
Published: (2025)
by: Kim, Doyoung, et al.
Published: (2025)
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
by: Crispino, Nicholas, et al.
Published: (2023)
by: Crispino, Nicholas, et al.
Published: (2023)
Many Minds from One Model: Bayesian-Inspired Transformers for Population Diversity
by: Yang, Diji, et al.
Published: (2025)
by: Yang, Diji, et al.
Published: (2025)
From Belief Entrenchment to Robust Reasoning in LLM Agents
by: Oh, Jihwan, et al.
Published: (2025)
by: Oh, Jihwan, et al.
Published: (2025)
DLM-One: Diffusion Language Models for One-Step Sequence Generation
by: Chen, Tianqi, et al.
Published: (2025)
by: Chen, Tianqi, et al.
Published: (2025)
Demystifying Embedding Spaces using Large Language Models
by: Tennenholtz, Guy, et al.
Published: (2023)
by: Tennenholtz, Guy, et al.
Published: (2023)
Flextron: Many-in-One Flexible Large Language Model
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
Zero-Shot Decision Tree Construction via Large Language Models
by: Carrasco, Lucas, et al.
Published: (2025)
by: Carrasco, Lucas, et al.
Published: (2025)
Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models
by: Kumar, Sachin
Published: (2026)
by: Kumar, Sachin
Published: (2026)
HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning
by: Gupta, Nikunj, et al.
Published: (2025)
by: Gupta, Nikunj, et al.
Published: (2025)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
by: Kang, Junmo, et al.
Published: (2024)
by: Kang, Junmo, et al.
Published: (2024)
Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
by: Aggarwal, Shubham, et al.
Published: (2026)
by: Aggarwal, Shubham, et al.
Published: (2026)
From Small to Large Language Models: Revisiting the Federalist Papers
by: Jeong, So Won, et al.
Published: (2025)
by: Jeong, So Won, et al.
Published: (2025)
Large Language Models are Null-Shot Learners
by: Taveekitworachai, Pittawat, et al.
Published: (2024)
by: Taveekitworachai, Pittawat, et al.
Published: (2024)
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
by: Kim, Suji, et al.
Published: (2026)
by: Kim, Suji, et al.
Published: (2026)
Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset
by: Ramalepe, Simon P., et al.
Published: (2025)
by: Ramalepe, Simon P., et al.
Published: (2025)
All Language Models Large and Small
by: Chen, Zhixun, et al.
Published: (2024)
by: Chen, Zhixun, et al.
Published: (2024)
Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines
by: Trofimova, Ekaterina, et al.
Published: (2024)
by: Trofimova, Ekaterina, et al.
Published: (2024)
Are Human Conversations Special? A Large Language Model Perspective
by: Jawale, Toshish, et al.
Published: (2024)
by: Jawale, Toshish, et al.
Published: (2024)
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic
by: Zhao, Xufeng, et al.
Published: (2023)
by: Zhao, Xufeng, et al.
Published: (2023)
Projected Compression: Trainable Projection for Efficient Transformer Compression
by: Stefaniak, Maciej, et al.
Published: (2025)
by: Stefaniak, Maciej, et al.
Published: (2025)
Provable Knowledge Acquisition and Extraction in One-Layer Transformers
by: Xu, Ruichen, et al.
Published: (2025)
by: Xu, Ruichen, et al.
Published: (2025)
Contextual Graph Transformer: A Small Language Model for Enhanced Engineering Document Information Extraction
by: Reddy, Karan, et al.
Published: (2025)
by: Reddy, Karan, et al.
Published: (2025)
Uncertainty-Aware Collaborative System of Large and Small Models for Multimodal Sentiment Analysis
by: Han, Shiqin, et al.
Published: (2025)
by: Han, Shiqin, et al.
Published: (2025)
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
by: Zhou, Baichuan, et al.
Published: (2024)
by: Zhou, Baichuan, et al.
Published: (2024)
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
by: Kong, Lecheng, et al.
Published: (2024)
by: Kong, Lecheng, et al.
Published: (2024)
Similar Items
-
Contextually Guided Transformers via Low-Rank Adaptation
by: Zhmoginov, Andrey, et al.
Published: (2025) -
Continual HyperTransformer: A Meta-Learner for Continual Few-Shot Learning
by: Vladymyrov, Max, et al.
Published: (2023) -
Narrowing the Focus: Learned Optimizers for Pretrained Models
by: Kristiansen, Gus, et al.
Published: (2024) -
Learning and Unlearning of Fabricated Knowledge in Language Models
by: Sun, Chen, et al.
Published: (2024) -
Long Context In-Context Compression by Getting to the Gist of Gisting
by: Petrov, Aleksandar, et al.
Published: (2025)