Saved in:
| Main Authors: | Tian, Yufei, Sun, Jiao, Peng, Nanyun, Zhang, Zizhao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.00319 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Detecting Machine-Generated Long-Form Content with Latent-Space Variables
by: Tian, Yufei, et al.
Published: (2024)
by: Tian, Yufei, et al.
Published: (2024)
Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations
by: Lu, Li-Chun, et al.
Published: (2025)
by: Lu, Li-Chun, et al.
Published: (2025)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024)
by: Fayyaz, Mohsen, et al.
Published: (2024)
REFFLY: Melody-Constrained Lyrics Editing Model
by: Zhao, Songyan, et al.
Published: (2024)
by: Zhao, Songyan, et al.
Published: (2024)
Multimodal Cultural Safety: Evaluation Framework and Alignment Strategies
by: Qiu, Haoyi, et al.
Published: (2025)
by: Qiu, Haoyi, et al.
Published: (2025)
PhonologyBench: Evaluating Phonological Skills of Large Language Models
by: Suvarna, Ashima, et al.
Published: (2024)
by: Suvarna, Ashima, et al.
Published: (2024)
Are Akpans Trick or Treat: Unveiling Helpful Biases in Assistant Systems
by: Sun, Jiao, et al.
Published: (2022)
by: Sun, Jiao, et al.
Published: (2022)
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
by: Qiu, Haoyi, et al.
Published: (2023)
by: Qiu, Haoyi, et al.
Published: (2023)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs
by: Bandarkar, Lucas, et al.
Published: (2025)
by: Bandarkar, Lucas, et al.
Published: (2025)
Are Large Language Models Capable of Generating Human-Level Narratives?
by: Tian, Yufei, et al.
Published: (2024)
by: Tian, Yufei, et al.
Published: (2024)
Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts
by: Martin, Liu O., et al.
Published: (2026)
by: Martin, Liu O., et al.
Published: (2026)
Open-Domain Text Evaluation via Contrastive Distribution Methods
by: Lu, Sidi, et al.
Published: (2023)
by: Lu, Sidi, et al.
Published: (2023)
Scientific Discourse Tagging for Evidence Extraction
by: Li, Xiangci, et al.
Published: (2019)
by: Li, Xiangci, et al.
Published: (2019)
A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification
by: Li, Xiangci, et al.
Published: (2020)
by: Li, Xiangci, et al.
Published: (2020)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
by: Qiu, Haoyi, et al.
Published: (2024)
by: Qiu, Haoyi, et al.
Published: (2024)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts
by: Guo, Guangfu, et al.
Published: (2025)
by: Guo, Guangfu, et al.
Published: (2025)
Vulnerability of LLMs to Vertically Aligned Text Manipulations
by: Li, Zhecheng, et al.
Published: (2024)
by: Li, Zhecheng, et al.
Published: (2024)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
by: Wang, Cheng, et al.
Published: (2024)
by: Wang, Cheng, et al.
Published: (2024)
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
by: Joshi, Brihi, et al.
Published: (2025)
by: Joshi, Brihi, et al.
Published: (2025)
Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures
by: Yerukola, Akhila, et al.
Published: (2025)
by: Yerukola, Akhila, et al.
Published: (2025)
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
by: Yang, Kevin, et al.
Published: (2023)
by: Yang, Kevin, et al.
Published: (2023)
Decoupling Task-Solving and Output Formatting in LLM Generation
by: Deng, Haikang, et al.
Published: (2025)
by: Deng, Haikang, et al.
Published: (2025)
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
by: He, Xuan, et al.
Published: (2024)
by: He, Xuan, et al.
Published: (2024)
IQ Test for LLMs: An Evaluation Framework for Uncovering Core Skills in LLMs
by: Maimon, Aviya, et al.
Published: (2025)
by: Maimon, Aviya, et al.
Published: (2025)
TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models
by: Zhang, Junyi, et al.
Published: (2025)
by: Zhang, Junyi, et al.
Published: (2025)
MacGyver: Are Large Language Models Creative Problem Solvers?
by: Tian, Yufei, et al.
Published: (2023)
by: Tian, Yufei, et al.
Published: (2023)
Enhancing LLM Character-Level Manipulation via Divide and Conquer
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion
by: Qiu, Haoyi, et al.
Published: (2025)
by: Qiu, Haoyi, et al.
Published: (2025)
Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)
by: Fayyaz, Mohsen, et al.
Published: (2025)
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
by: Doddapaneni, Sumanth, et al.
Published: (2024)
by: Doddapaneni, Sumanth, et al.
Published: (2024)
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
by: Zheng, Jingnan, et al.
Published: (2024)
by: Zheng, Jingnan, et al.
Published: (2024)
Learning Action Conditions from Instructional Manuals for Instruction Understanding
by: Wu, Te-Lin, et al.
Published: (2022)
by: Wu, Te-Lin, et al.
Published: (2022)
OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving
by: Zhang, Xinyu, et al.
Published: (2026)
by: Zhang, Xinyu, et al.
Published: (2026)
LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs
by: Guo, Pei-Fu, et al.
Published: (2025)
by: Guo, Pei-Fu, et al.
Published: (2025)
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
by: Fayyaz, Mohsen, et al.
Published: (2025)
by: Fayyaz, Mohsen, et al.
Published: (2025)
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning
by: Meng, Silin, et al.
Published: (2024)
by: Meng, Silin, et al.
Published: (2024)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization
by: Li, Zhecheng, et al.
Published: (2024)
by: Li, Zhecheng, et al.
Published: (2024)
Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs
by: Yu, Xingyang, et al.
Published: (2026)
by: Yu, Xingyang, et al.
Published: (2026)
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning
by: Parekh, Tanmay, et al.
Published: (2025)
by: Parekh, Tanmay, et al.
Published: (2025)
Evaluating Cultural and Social Awareness of LLM Web Agents
by: Qiu, Haoyi, et al.
Published: (2024)
by: Qiu, Haoyi, et al.
Published: (2024)
Similar Items
-
Detecting Machine-Generated Long-Form Content with Latent-Space Variables
by: Tian, Yufei, et al.
Published: (2024) -
Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations
by: Lu, Li-Chun, et al.
Published: (2025) -
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
by: Fayyaz, Mohsen, et al.
Published: (2024) -
REFFLY: Melody-Constrained Lyrics Editing Model
by: Zhao, Songyan, et al.
Published: (2024) -
Multimodal Cultural Safety: Evaluation Framework and Alignment Strategies
by: Qiu, Haoyi, et al.
Published: (2025)