:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Weijie, Cui, Shixian, Fang, Xi, Xue, Chi, Eckman, Stephanie, Reddy, Chandan K.
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence 68T01 I.2.7
Online Access:	https://arxiv.org/abs/2506.00643
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
by: Zhang, Zhehao, et al.
Published: (2025)

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
by: Fang, Xi, et al.
Published: (2025)

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
by: Xu, Weijie, et al.
Published: (2025)

A Comparative Study of Feature Selection in Tsetlin Machines
by: Halenka, Vojtech, et al.
Published: (2025)

Benchmarking Energy Efficiency of Large Language Models Using vLLM
by: Pronk, K., et al.
Published: (2025)

ContractEval: A Benchmark for Evaluating Contract-Satisfying Assertions in Code Generation
by: Lim, Soohan, et al.
Published: (2025)

An Epidemiological Knowledge Graph extracted from the World Health Organization's Disease Outbreak News
by: Consoli, Sergio, et al.
Published: (2025)

TSDS: Data Selection for Task-Specific Model Finetuning
by: Liu, Zifan, et al.
Published: (2024)

AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems
by: Yagoubi, Faouzi El, et al.
Published: (2026)

Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems
by: Berman, Shmuel, et al.
Published: (2024)

Low-Resource Neural Machine Translation Using Recurrent Neural Networks and Transfer Learning: A Case Study on English-to-Igbo
by: Ekle, Ocheme Anthony, et al.
Published: (2025)

From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars
by: Kornilov, Albert, et al.
Published: (2024)

Project Synapse: A Hierarchical Multi-Agent Framework with Hybrid Memory for Autonomous Resolution of Last-Mile Delivery Disruptions
by: Yadav, Arin Gopalan, et al.
Published: (2026)

Reasoning aligns language models to human cognition
by: Guiomar, Gonçalo, et al.
Published: (2026)

Benchmarking Deception Probes via Black-to-White Performance Boosts
by: Parrack, Avi, et al.
Published: (2025)

Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics
by: Subaharan, Sukesh
Published: (2026)

Vibe-Creation: The Epistemology of Human-AI Emergent Cognition
by: Levin, Ilya
Published: (2026)

Navigational Thinking as an Emerging Paradigm of Computer Science in the Age of Generative AI
by: Levin, Ilya
Published: (2026)

Large Language Models are Inconsistent and Biased Evaluators
by: Stureborg, Rickard, et al.
Published: (2024)

A transfer learning approach for automatic conflicts detection in software requirement sentence pairs based on dual encoders
by: Wang, Yizheng, et al.
Published: (2025)

An Automatic Text Classification Method Based on Hierarchical Taxonomies, Neural Networks and Document Embedding: The NETHIC Tool
by: Lomasto, Luigi, et al.
Published: (2026)

Mind the Metrics: Patterns for Telemetry-Aware In-IDE AI Application Development using the Model Context Protocol (MCP)
by: Koc, Vincent, et al.
Published: (2025)

Data and AI governance: Promoting equity, ethics, and fairness in large language models
by: Abhishek, Alok, et al.
Published: (2025)

SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models
by: Abhishek, Alok, et al.
Published: (2026)

BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models
by: Abhishek, Alok, et al.
Published: (2025)

Reasoning Promotes Robustness in Theory of Mind Tasks
by: de Haan, Ian B., et al.
Published: (2026)

Epidemic Information Extraction for Event-Based Surveillance using Large Language Models
by: Consoli, Sergio, et al.
Published: (2024)

Tailoring Vaccine Messaging with Common-Ground Opinions
by: Stureborg, Rickard, et al.
Published: (2024)

CUBO: Self-Contained Retrieval-Augmented Generation on Consumer Laptops 10 GB Corpora, 16 GB RAM, Single-Device Deployment
by: Astrino, Paolo
Published: (2026)

HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance
by: Laddha, Shubh, et al.
Published: (2025)

RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being
by: Ferdousi, Rahatara, et al.
Published: (2025)

Triplètoile: Extraction of Knowledge from Microblogging Text
by: Zavarella, Vanni, et al.
Published: (2024)

On Adversarial Examples for Text Classification by Perturbing Latent Representations
by: Sooksatra, Korn, et al.
Published: (2024)

Morphological Synthesizer for Ge'ez Language: Addressing Morphological Complexity and Resource Limitations
by: Gebremariam, Gebrearegawi, et al.
Published: (2025)

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration
by: Amanlou, Mohammad, et al.
Published: (2026)

Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey
by: Fang, Xi, et al.
Published: (2024)

AI Model for Predicting Binding Affinity of Antidiabetic Compounds Targeting PPAR
by: Aman, La Ode, et al.
Published: (2024)

MORQA: Benchmarking Evaluation Metrics for Medical Open-Ended Question Answering
by: Yim, Wen-wai, et al.
Published: (2025)

Murphys Laws of AI Alignment: Why the Gap Always Wins
by: Gaikwad, Madhava
Published: (2025)

PaperAudit-Bench: Benchmarking Error Detection in Research Papers for Critical Automated Peer Review
by: Tu, Songjun, et al.
Published: (2026)