Saved in:
| Main Authors: | Vyas, Kaustubh, Graux, Damien, Montella, Sébastien, Vougiouklis, Pavlos, Lai, Ruofei, Li, Keshuang, Ren, Yang, Pan, Jeff Z. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.20175 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungle
by: Vyas, Kaustubh, et al.
Published: (2024)
by: Vyas, Kaustubh, et al.
Published: (2024)
Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning
by: Shen, Zhili, et al.
Published: (2024)
by: Shen, Zhili, et al.
Published: (2024)
Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts
by: Huang, Wenyu, et al.
Published: (2024)
by: Huang, Wenyu, et al.
Published: (2024)
GeAR: Graph-enhanced Agent for Retrieval-augmented Generation
by: Shen, Zhili, et al.
Published: (2024)
by: Shen, Zhili, et al.
Published: (2024)
A Usage-centric Take on Intent Understanding in E-Commerce
by: Zhou, Wendi, et al.
Published: (2024)
by: Zhou, Wendi, et al.
Published: (2024)
Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents
by: Shen, Zhili, et al.
Published: (2025)
by: Shen, Zhili, et al.
Published: (2025)
Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation
by: Huang, Wenyu, et al.
Published: (2025)
by: Huang, Wenyu, et al.
Published: (2025)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA
by: Huang, Wenyu, et al.
Published: (2024)
by: Huang, Wenyu, et al.
Published: (2024)
OpenSIR: Open-Ended Self-Improving Reasoner
by: Kwan, Wai-Chung, et al.
Published: (2025)
by: Kwan, Wai-Chung, et al.
Published: (2025)
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking
by: Zhu, Wang Bill, et al.
Published: (2026)
by: Zhu, Wang Bill, et al.
Published: (2026)
How Reliable are LLMs as Knowledge Bases? Re-thinking Facutality and Consistency
by: Zheng, Danna, et al.
Published: (2024)
by: Zheng, Danna, et al.
Published: (2024)
Long-Form Information Alignment Evaluation Beyond Atomic Facts
by: Zheng, Danna, et al.
Published: (2025)
by: Zheng, Danna, et al.
Published: (2025)
Evaluating and Safeguarding the Adversarial Robustness of Retrieval-Based In-Context Learning
by: Yu, Simon, et al.
Published: (2024)
by: Yu, Simon, et al.
Published: (2024)
Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs
by: Labroo, Arya, et al.
Published: (2026)
by: Labroo, Arya, et al.
Published: (2026)
Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning
by: Stein, Katharina, et al.
Published: (2023)
by: Stein, Katharina, et al.
Published: (2023)
Can Language Models Analyze Data? Evaluating Large Language Models for Question Answering over Datasets
by: Xenofontos, Andreas, et al.
Published: (2026)
by: Xenofontos, Andreas, et al.
Published: (2026)
Rethinking Memory in LLM based Agents: Representations, Operations, and Emerging Topics
by: Du, Yiming, et al.
Published: (2025)
by: Du, Yiming, et al.
Published: (2025)
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
by: Dhole, Kaustubh
Published: (2025)
by: Dhole, Kaustubh
Published: (2025)
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)
by: Kwon, Deuksin, et al.
Published: (2024)
Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context
by: Ruan, Kai, et al.
Published: (2024)
by: Ruan, Kai, et al.
Published: (2024)
Evaluating the Capabilities of LLMs for Supporting Anticipatory Impact Assessment
by: Allaham, Mowafak, et al.
Published: (2024)
by: Allaham, Mowafak, et al.
Published: (2024)
Spectral Attention Steering for Prompt Highlighting
by: Li, Weixian Waylon, et al.
Published: (2026)
by: Li, Weixian Waylon, et al.
Published: (2026)
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
by: Wei, Hui, et al.
Published: (2025)
by: Wei, Hui, et al.
Published: (2025)
How Does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective
by: Zhang, Shimao, et al.
Published: (2025)
by: Zhang, Shimao, et al.
Published: (2025)
EarthSE: A Benchmark for Evaluating Earth Scientific Exploration Capability of LLMs
by: Xu, Wanghan, et al.
Published: (2025)
by: Xu, Wanghan, et al.
Published: (2025)
VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning
by: Jiang, Zhihuan, et al.
Published: (2024)
by: Jiang, Zhihuan, et al.
Published: (2024)
Automated Capability Discovery via Foundation Model Self-Exploration
by: Lu, Cong, et al.
Published: (2025)
by: Lu, Cong, et al.
Published: (2025)
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds
by: Wang, Lei, et al.
Published: (2024)
by: Wang, Lei, et al.
Published: (2024)
Assessing the Capabilities of LLMs in Humor:A Multi-dimensional Analysis of Oogiri Generation and Evaluation
by: Sakabe, Ritsu, et al.
Published: (2025)
by: Sakabe, Ritsu, et al.
Published: (2025)
Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents
by: Turk, Matt
Published: (2026)
by: Turk, Matt
Published: (2026)
Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)
by: Liu, Junnan, et al.
Published: (2024)
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
by: Sirdeshmukh, Ved, et al.
Published: (2025)
by: Sirdeshmukh, Ved, et al.
Published: (2025)
Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples
by: Vougiouklis, Pavlos, et al.
Published: (2017)
by: Vougiouklis, Pavlos, et al.
Published: (2017)
BabyReasoningBench: Generating Developmentally-Inspired Reasoning Tasks for Evaluating Baby Language Models
by: Dhole, Kaustubh D.
Published: (2026)
by: Dhole, Kaustubh D.
Published: (2026)
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
by: Chen, Mingyang, et al.
Published: (2025)
by: Chen, Mingyang, et al.
Published: (2025)
PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities
by: Li, Haoming, et al.
Published: (2025)
by: Li, Haoming, et al.
Published: (2025)
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
by: Liu, Xiao, et al.
Published: (2024)
by: Liu, Xiao, et al.
Published: (2024)
Resolving Intent Ambiguities by Retrieving Discriminative Clarifying Questions
by: Dhole, Kaustubh D.
Published: (2020)
by: Dhole, Kaustubh D.
Published: (2020)
Evaluation of Multilingual LLMs Personalized Text Generation Capabilities Targeting Groups and Social-Media Platforms
by: Macko, Dominik
Published: (2026)
by: Macko, Dominik
Published: (2026)
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability
by: Zhang, Leizhen, et al.
Published: (2026)
by: Zhang, Leizhen, et al.
Published: (2026)
Similar Items
-
From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungle
by: Vyas, Kaustubh, et al.
Published: (2024) -
Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning
by: Shen, Zhili, et al.
Published: (2024) -
Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts
by: Huang, Wenyu, et al.
Published: (2024) -
GeAR: Graph-enhanced Agent for Retrieval-augmented Generation
by: Shen, Zhili, et al.
Published: (2024) -
A Usage-centric Take on Intent Understanding in E-Commerce
by: Zhou, Wendi, et al.
Published: (2024)