Saved in:
| Main Authors: | Schwartz, Eli, Choshen, Leshem, Shtok, Joseph, Doveh, Sivan, Karlinsky, Leonid, Arbelle, Assaf |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.00459 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement
by: Shtok, Joseph, et al.
Published: (2024)
by: Shtok, Joseph, et al.
Published: (2024)
MAEDAY: MAE for few and zero shot AnomalY-Detection
by: Schwartz, Eli, et al.
Published: (2022)
by: Schwartz, Eli, et al.
Published: (2022)
Towards Multimodal In-Context Learning for Vision & Language Models
by: Doveh, Sivan, et al.
Published: (2024)
by: Doveh, Sivan, et al.
Published: (2024)
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
by: Huang, Brandon, et al.
Published: (2024)
by: Huang, Brandon, et al.
Published: (2024)
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
by: Shabtay, Nimrod, et al.
Published: (2024)
by: Shabtay, Nimrod, et al.
Published: (2024)
Teaching VLMs to Localize Specific Objects from In-context Examples
by: Doveh, Sivan, et al.
Published: (2024)
by: Doveh, Sivan, et al.
Published: (2024)
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
by: Mitra, Chancharik, et al.
Published: (2024)
by: Mitra, Chancharik, et al.
Published: (2024)
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
by: Huang, Irene, et al.
Published: (2024)
by: Huang, Irene, et al.
Published: (2024)
Can Gradient Descent Simulate Prompting?
by: Zhang, Eric, et al.
Published: (2025)
by: Zhang, Eric, et al.
Published: (2025)
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community
by: Don-Yehiya, Shachar, et al.
Published: (2024)
by: Don-Yehiya, Shachar, et al.
Published: (2024)
Naturally Occurring Feedback is Common, Extractable and Useful
by: Don-Yehiya, Shachar, et al.
Published: (2024)
by: Don-Yehiya, Shachar, et al.
Published: (2024)
A Hitchhiker's Guide to Scaling Law Estimation
by: Choshen, Leshem, et al.
Published: (2024)
by: Choshen, Leshem, et al.
Published: (2024)
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
by: Zaman, Kerem, et al.
Published: (2023)
by: Zaman, Kerem, et al.
Published: (2023)
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
by: Ifergan, Maxim, et al.
Published: (2024)
by: Ifergan, Maxim, et al.
Published: (2024)
Do LLMs Benefit From Their Own Words?
by: Huang, Jenny Y., et al.
Published: (2026)
by: Huang, Jenny Y., et al.
Published: (2026)
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
by: Mirza, M. Jehanzeb, et al.
Published: (2024)
by: Mirza, M. Jehanzeb, et al.
Published: (2024)
Instructions Shape Production of Language, not Processing
by: Waldis, Andreas, et al.
Published: (2026)
by: Waldis, Andreas, et al.
Published: (2026)
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
by: Din, Alexander Yom, et al.
Published: (2023)
by: Din, Alexander Yom, et al.
Published: (2023)
Mediocrity is the key for LLM as a Judge Anchor Selection
by: Don-Yehiya, Shachar, et al.
Published: (2026)
by: Don-Yehiya, Shachar, et al.
Published: (2026)
Comparison Visual Instruction Tuning
by: Lin, Wei, et al.
Published: (2024)
by: Lin, Wei, et al.
Published: (2024)
CRISP: Complex Reasoning with Interpretable Step-based Plans
by: Vetzler, Matan, et al.
Published: (2025)
by: Vetzler, Matan, et al.
Published: (2025)
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization
by: Yadav, Prateek, et al.
Published: (2023)
by: Yadav, Prateek, et al.
Published: (2023)
tinyBenchmarks: evaluating LLMs with fewer examples
by: Polo, Felipe Maia, et al.
Published: (2024)
by: Polo, Felipe Maia, et al.
Published: (2024)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models
by: Waldis, Andreas, et al.
Published: (2024)
by: Waldis, Andreas, et al.
Published: (2024)
Pretraining Language Models for Diachronic Linguistic Change Discovery
by: Fittschen, Elisabeth, et al.
Published: (2025)
by: Fittschen, Elisabeth, et al.
Published: (2025)
Genie: Achieving Human Parity in Content-Grounded Datasets Generation
by: Yehudai, Asaf, et al.
Published: (2024)
by: Yehudai, Asaf, et al.
Published: (2024)
Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability
by: Akyürek, Afra Feyza, et al.
Published: (2024)
by: Akyürek, Afra Feyza, et al.
Published: (2024)
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
by: Damani, Mehul, et al.
Published: (2025)
by: Damani, Mehul, et al.
Published: (2025)
LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users
by: Hilel, Almog, et al.
Published: (2025)
by: Hilel, Almog, et al.
Published: (2025)
Will it Merge? On The Causes of Model Mergeability
by: Rahamim, Adir, et al.
Published: (2026)
by: Rahamim, Adir, et al.
Published: (2026)
Resolving Interference (RI): Disentangling Models for Improved Model Merging
by: Ramesh, Pratik, et al.
Published: (2026)
by: Ramesh, Pratik, et al.
Published: (2026)
ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models
by: Ashury-Tahan, Shir, et al.
Published: (2026)
by: Ashury-Tahan, Shir, et al.
Published: (2026)
Robustness as an Emergent Property of Task Performance
by: Ashury-Tahan, Shir, et al.
Published: (2026)
by: Ashury-Tahan, Shir, et al.
Published: (2026)
Label-Efficient Model Selection for Text Generation
by: Ashury-Tahan, Shir, et al.
Published: (2024)
by: Ashury-Tahan, Shir, et al.
Published: (2024)
TextArena
by: Guertler, Leon, et al.
Published: (2025)
by: Guertler, Leon, et al.
Published: (2025)
The Mighty ToRR: A Benchmark for Table Reasoning and Robustness
by: Ashury-Tahan, Shir, et al.
Published: (2025)
by: Ashury-Tahan, Shir, et al.
Published: (2025)
Efficient multi-prompt evaluation of LLMs
by: Polo, Felipe Maia, et al.
Published: (2024)
by: Polo, Felipe Maia, et al.
Published: (2024)
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation
by: Kondic, Jovana, et al.
Published: (2025)
by: Kondic, Jovana, et al.
Published: (2025)
Do These LLM Benchmarks Agree? Fixing Benchmark Evaluation with BenchBench
by: Perlitz, Yotam, et al.
Published: (2024)
by: Perlitz, Yotam, et al.
Published: (2024)
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
by: Habba, Eliya, et al.
Published: (2025)
by: Habba, Eliya, et al.
Published: (2025)
Similar Items
-
Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement
by: Shtok, Joseph, et al.
Published: (2024) -
MAEDAY: MAE for few and zero shot AnomalY-Detection
by: Schwartz, Eli, et al.
Published: (2022) -
Towards Multimodal In-Context Learning for Vision & Language Models
by: Doveh, Sivan, et al.
Published: (2024) -
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
by: Huang, Brandon, et al.
Published: (2024) -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
by: Shabtay, Nimrod, et al.
Published: (2024)