Saved in:
| Main Authors: | Łajewska, Weronika, Missault, Paul, Davidson, George, Mansour, Saab |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.19002 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GINGER: Grounded Information Nugget-Based Generation of Responses
by: Łajewska, Weronika, et al.
Published: (2025)
by: Łajewska, Weronika, et al.
Published: (2025)
Towards Reliable and Factual Response Generation: Detecting Unanswerable Questions in Information-Seeking Conversations
by: Łajewska, Weronika, et al.
Published: (2024)
by: Łajewska, Weronika, et al.
Published: (2024)
Estimating the Usefulness of Clarifying Questions and Answers for Conversational Search
by: Sekulić, Ivan, et al.
Published: (2024)
by: Sekulić, Ivan, et al.
Published: (2024)
PAARS: Persona Aligned Agentic Retail Shoppers
by: Mansour, Saab, et al.
Published: (2025)
by: Mansour, Saab, et al.
Published: (2025)
Grounded and Transparent Response Generation for Conversational Information-Seeking Systems
by: Łajewska, Weronika
Published: (2024)
by: Łajewska, Weronika
Published: (2024)
Using Optimal Transport as Alignment Objective for fine-tuning Multilingual Contextualized Embeddings
by: Alqahtani, Sawsan, et al.
Published: (2021)
by: Alqahtani, Sawsan, et al.
Published: (2021)
PKG API: A Tool for Personal Knowledge Graph Management
by: Bernard, Nolwenn, et al.
Published: (2024)
by: Bernard, Nolwenn, et al.
Published: (2024)
Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition
by: Sheth, Ivaxi, et al.
Published: (2026)
by: Sheth, Ivaxi, et al.
Published: (2026)
DFlow: Diverse Dialogue Flow Simulation with Large Language Models
by: Du, Wanyu, et al.
Published: (2024)
by: Du, Wanyu, et al.
Published: (2024)
Trust Me on This: A User Study of Trustworthiness for RAG Responses
by: Łajewska, Weronika, et al.
Published: (2026)
by: Łajewska, Weronika, et al.
Published: (2026)
FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs
by: Roy, Shamik, et al.
Published: (2024)
by: Roy, Shamik, et al.
Published: (2024)
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
by: Feng, Xiang, et al.
Published: (2025)
by: Feng, Xiang, et al.
Published: (2025)
Understanding and Improving Information Preservation in Prompt Compression for LLMs
by: Łajewska, Weronika, et al.
Published: (2025)
by: Łajewska, Weronika, et al.
Published: (2025)
DeAL: Decoding-time Alignment for Large Language Models
by: Huang, James Y., et al.
Published: (2024)
by: Huang, James Y., et al.
Published: (2024)
Eliciting Better Multilingual Structured Reasoning from LLMs through Code
by: Li, Bryan, et al.
Published: (2024)
by: Li, Bryan, et al.
Published: (2024)
FineSurE: Fine-grained Summarization Evaluation using LLMs
by: Song, Hwanjun, et al.
Published: (2024)
by: Song, Hwanjun, et al.
Published: (2024)
Structured List-Grounded Question Answering
by: Sung, Mujeen, et al.
Published: (2024)
by: Sung, Mujeen, et al.
Published: (2024)
CERET: Cost-Effective Extrinsic Refinement for Text Generation
by: Cai, Jason, et al.
Published: (2024)
by: Cai, Jason, et al.
Published: (2024)
Explainability for Transparent Conversational Information-Seeking
by: Łajewska, Weronika, et al.
Published: (2024)
by: Łajewska, Weronika, et al.
Published: (2024)
AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys
by: Lin, Chenxi, et al.
Published: (2025)
by: Lin, Chenxi, et al.
Published: (2025)
Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection
by: He, Jianfeng, et al.
Published: (2024)
by: He, Jianfeng, et al.
Published: (2024)
Can Your Model Tell a Negation from an Implicature? Unravelling Challenges With Intent Encoders
by: Zhang, Yuwei, et al.
Published: (2024)
by: Zhang, Yuwei, et al.
Published: (2024)
MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
by: Blandón, María Andrea Cruz, et al.
Published: (2025)
by: Blandón, María Andrea Cruz, et al.
Published: (2025)
Multilingual Self-Taught Faithfulness Evaluators
by: Alfano, Carlo, et al.
Published: (2025)
by: Alfano, Carlo, et al.
Published: (2025)
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology
by: Zhang, Junlei, et al.
Published: (2023)
by: Zhang, Junlei, et al.
Published: (2023)
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
by: Huang, Xu, et al.
Published: (2025)
by: Huang, Xu, et al.
Published: (2025)
Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks
by: Warner, Benjamin, et al.
Published: (2026)
by: Warner, Benjamin, et al.
Published: (2026)
Controllable Conversational Theme Detection Track at DSTC 12
by: Shalyminov, Igor, et al.
Published: (2025)
by: Shalyminov, Igor, et al.
Published: (2025)
Understanding Layer Significance in LLM Alignment
by: Shi, Guangyuan, et al.
Published: (2024)
by: Shi, Guangyuan, et al.
Published: (2024)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations
by: Cao, Yong, et al.
Published: (2025)
by: Cao, Yong, et al.
Published: (2025)
GLEAN: Active Generalized Category Discovery with Diverse LLM Feedback
by: Zou, Henry Peng, et al.
Published: (2025)
by: Zou, Henry Peng, et al.
Published: (2025)
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
by: Liu, Yinhong, et al.
Published: (2025)
by: Liu, Yinhong, et al.
Published: (2025)
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis
by: Cui, Tianyu, et al.
Published: (2024)
by: Cui, Tianyu, et al.
Published: (2024)
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models
by: Nguyen, Trong-Hieu, et al.
Published: (2024)
by: Nguyen, Trong-Hieu, et al.
Published: (2024)
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation
by: Yao, Jihan, et al.
Published: (2025)
by: Yao, Jihan, et al.
Published: (2025)
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
by: Aboutalebi, Hossein, et al.
Published: (2024)
by: Aboutalebi, Hossein, et al.
Published: (2024)
An Ecosystem for Personal Knowledge Graphs: A Survey and Research Roadmap
by: Skjæveland, Martin G., et al.
Published: (2023)
by: Skjæveland, Martin G., et al.
Published: (2023)
KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding
by: Hwang, Bokwang, et al.
Published: (2025)
by: Hwang, Bokwang, et al.
Published: (2025)
Preference Ranking Optimization for Human Alignment
by: Song, Feifan, et al.
Published: (2023)
by: Song, Feifan, et al.
Published: (2023)
A Comprehensive Survey of Text Classification Techniques and Their Research Applications: Observational and Experimental Insights
by: Taha, Kamal, et al.
Published: (2024)
by: Taha, Kamal, et al.
Published: (2024)
Similar Items
-
GINGER: Grounded Information Nugget-Based Generation of Responses
by: Łajewska, Weronika, et al.
Published: (2025) -
Towards Reliable and Factual Response Generation: Detecting Unanswerable Questions in Information-Seeking Conversations
by: Łajewska, Weronika, et al.
Published: (2024) -
Estimating the Usefulness of Clarifying Questions and Answers for Conversational Search
by: Sekulić, Ivan, et al.
Published: (2024) -
PAARS: Persona Aligned Agentic Retail Shoppers
by: Mansour, Saab, et al.
Published: (2025) -
Grounded and Transparent Response Generation for Conversational Information-Seeking Systems
by: Łajewska, Weronika
Published: (2024)