Saved in:
| Main Authors: | Patel, Arkil, Reddy, Siva, Mosbach, Marius, Bahdanau, Dzmitry |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.18607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How to Get Your LLM to Generate Challenging Problems for Evaluation
by: Patel, Arkil, et al.
Published: (2025)
by: Patel, Arkil, et al.
Published: (2025)
Evaluating In-Context Learning of Libraries for Code Generation
by: Patel, Arkil, et al.
Published: (2023)
by: Patel, Arkil, et al.
Published: (2023)
BRIDGE: Predicting Human Task Completion Time From Model Performance
by: Liu, Fengyuan, et al.
Published: (2026)
by: Liu, Fengyuan, et al.
Published: (2026)
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
by: BehnamGhader, Parishad, et al.
Published: (2024)
by: BehnamGhader, Parishad, et al.
Published: (2024)
Build the web for agents, not agents for the web
by: Lù, Xing Han, et al.
Published: (2025)
by: Lù, Xing Han, et al.
Published: (2025)
LLMs can learn self-restraint through iterative self-reflection
by: Piché, Alexandre, et al.
Published: (2024)
by: Piché, Alexandre, et al.
Published: (2024)
Investigating Adversarial Trigger Transfer in Large Language Models
by: Meade, Nicholas, et al.
Published: (2024)
by: Meade, Nicholas, et al.
Published: (2024)
Value Drifts: Tracing Value Alignment During LLM Post-Training
by: Bhatia, Mehar, et al.
Published: (2025)
by: Bhatia, Mehar, et al.
Published: (2025)
Not All Data Are Unlearned Equally
by: Krishnan, Aravind, et al.
Published: (2025)
by: Krishnan, Aravind, et al.
Published: (2025)
The Illusion of Superposition? A Principled Analysis of Latent Thinking in Language Models
by: Rizvi-Martel, Michael, et al.
Published: (2026)
by: Rizvi-Martel, Michael, et al.
Published: (2026)
SafeArena: Evaluating the Safety of Autonomous Web Agents
by: Tur, Ada Defne, et al.
Published: (2025)
by: Tur, Ada Defne, et al.
Published: (2025)
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
by: Lù, Xing Han, et al.
Published: (2025)
by: Lù, Xing Han, et al.
Published: (2025)
Scaling Laws for Predicting Downstream Performance in LLMs
by: Chen, Yangyi, et al.
Published: (2024)
by: Chen, Yangyi, et al.
Published: (2024)
Do Generalisation Results Generalise?
by: Boglioni, Matteo, et al.
Published: (2025)
by: Boglioni, Matteo, et al.
Published: (2025)
Understanding the Influence of Synthetic Data for Text Embedders
by: Springer, Jacob Mitchell, et al.
Published: (2025)
by: Springer, Jacob Mitchell, et al.
Published: (2025)
Faithfulness Measurable Masked Language Models
by: Madsen, Andreas, et al.
Published: (2023)
by: Madsen, Andreas, et al.
Published: (2023)
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
by: Xu, Chengyin, et al.
Published: (2025)
by: Xu, Chengyin, et al.
Published: (2025)
VinePPO: Refining Credit Assignment in RL Training of LLMs
by: Kazemnejad, Amirhossein, et al.
Published: (2024)
by: Kazemnejad, Amirhossein, et al.
Published: (2024)
Are self-explanations from Large Language Models faithful?
by: Madsen, Andreas, et al.
Published: (2024)
by: Madsen, Andreas, et al.
Published: (2024)
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
by: Krajewski, Jakub, et al.
Published: (2025)
by: Krajewski, Jakub, et al.
Published: (2025)
Scaling Laws for Downstream Task Performance of Large Language Models
by: Isik, Berivan, et al.
Published: (2024)
by: Isik, Berivan, et al.
Published: (2024)
What explains the success of cross-modal fine-tuning with ORCA?
by: García-de-Herreros, Paloma, et al.
Published: (2024)
by: García-de-Herreros, Paloma, et al.
Published: (2024)
Quantifying the Importance of Data Alignment in Downstream Model Performance
by: Chawla, Krrish, et al.
Published: (2025)
by: Chawla, Krrish, et al.
Published: (2025)
Operationalising the Superficial Alignment Hypothesis via Task Complexity
by: Vergara-Browne, Tomás, et al.
Published: (2026)
by: Vergara-Browne, Tomás, et al.
Published: (2026)
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
by: Lù, Xing Han, et al.
Published: (2024)
by: Lù, Xing Han, et al.
Published: (2024)
LLM2Vec-Gen: Generative Embeddings from Large Language Models
by: BehnamGhader, Parishad, et al.
Published: (2026)
by: BehnamGhader, Parishad, et al.
Published: (2026)
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance
by: Marbut, Anna C., et al.
Published: (2024)
by: Marbut, Anna C., et al.
Published: (2024)
Self-Refinement of Language Models from External Proxy Metrics Feedback
by: Ramji, Keshav, et al.
Published: (2024)
by: Ramji, Keshav, et al.
Published: (2024)
Why LLMs Cannot Think and How to Fix It
by: Jahrens, Marius, et al.
Published: (2025)
by: Jahrens, Marius, et al.
Published: (2025)
Interpretability Needs a New Paradigm
by: Madsen, Andreas, et al.
Published: (2024)
by: Madsen, Andreas, et al.
Published: (2024)
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
by: Murty, Shikhar, et al.
Published: (2024)
by: Murty, Shikhar, et al.
Published: (2024)
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text
by: Jarca, Andrei, et al.
Published: (2025)
by: Jarca, Andrei, et al.
Published: (2025)
Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream Tasks
by: Zhou, Qinhao, et al.
Published: (2025)
by: Zhou, Qinhao, et al.
Published: (2025)
ROSA: Random Subspace Adaptation for Efficient Fine-Tuning
by: Hameed, Marawan Gamal Abdel, et al.
Published: (2024)
by: Hameed, Marawan Gamal Abdel, et al.
Published: (2024)
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
by: Riabi, Arij, et al.
Published: (2021)
by: Riabi, Arij, et al.
Published: (2021)
Proxy Compression for Language Modeling
by: Zheng, Lin, et al.
Published: (2026)
by: Zheng, Lin, et al.
Published: (2026)
Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering
by: Barbu, Eduard, et al.
Published: (2025)
by: Barbu, Eduard, et al.
Published: (2025)
ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs
by: Butler, Landon, et al.
Published: (2025)
by: Butler, Landon, et al.
Published: (2025)
Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check
by: Lourie, Nicholas, et al.
Published: (2025)
by: Lourie, Nicholas, et al.
Published: (2025)
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
Similar Items
-
How to Get Your LLM to Generate Challenging Problems for Evaluation
by: Patel, Arkil, et al.
Published: (2025) -
Evaluating In-Context Learning of Libraries for Code Generation
by: Patel, Arkil, et al.
Published: (2023) -
BRIDGE: Predicting Human Task Completion Time From Model Performance
by: Liu, Fengyuan, et al.
Published: (2026) -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
by: BehnamGhader, Parishad, et al.
Published: (2024) -
Build the web for agents, not agents for the web
by: Lù, Xing Han, et al.
Published: (2025)