:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiao, Yunze, He, Tingyu, Wang, Lionel Z., Ma, Yiming, Song, Xingyu, Xu, Xiaohang, Diab, Mona, Li, Irene, Ng, Ka Chung
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2503.21679
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can Large Language Models Resolve Semantic Discrepancy in Self-Destructive Subcultures? Evidence from Jirai Kei
by: Wang, Peng, et al.
Published: (2026)

Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
by: Xiao, Yunze, et al.
Published: (2025)

Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens
by: AlKhamissi, Mai, et al.
Published: (2025)

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
by: Wang, Lionel Z., et al.
Published: (2024)

Towards Valid Student Simulation with Large Language Models
by: Yuan, Zhihao, et al.
Published: (2026)

Sentipolis: Emotion-Aware Agents for Social Simulations
by: Fu, Chiyuan, et al.
Published: (2026)

SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
by: Subramani, Nishant, et al.
Published: (2025)

A Note on Bias to Complete
by: Xu, Jia, et al.
Published: (2024)

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
by: Bai, Yushi, et al.
Published: (2023)

Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
by: Salama, Rana, et al.
Published: (2025)

Evaluating Large Language Model Biases in Persona-Steered Generation
by: Liu, Andy, et al.
Published: (2024)

Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
by: Liu, Jiarui, et al.
Published: (2025)

DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding
by: Zhu, Hengchuan, et al.
Published: (2025)

StressRoBERTa: Cross-Condition Transfer Learning from Depression, Anxiety, and PTSD to Stress Detection
by: Alqahtani, Amal, et al.
Published: (2025)

DWTSumm: Discrete Wavelet Transform for Document Summarization
by: Salama, Rana, et al.
Published: (2026)

Taming Object Hallucinations with Verified Atomic Confidence Estimation
by: Liu, Jiarui, et al.
Published: (2025)

Semantic Compression for Word and Sentence Embeddings using Discrete Wavelet Transform
by: Salama, Rana Aref, et al.
Published: (2025)

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
by: Muhamed, Aashiq, et al.
Published: (2024)

Emotion Classification in Low and Moderate Resource Languages
by: Tafreshi, Shabnam, et al.
Published: (2024)

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
by: Hu, Tiancheng, et al.
Published: (2025)

ScholarBench: A Bilingual Benchmark for Abstraction, Comprehension, and Reasoning Evaluation in Academic Contexts
by: Noh, Dongwon, et al.
Published: (2025)

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
by: Li, Wenkai, et al.
Published: (2024)

CoRAG: Collaborative Retrieval-Augmented Generation
by: Muhamed, Aashiq, et al.
Published: (2025)

Personal Information Parroting in Language Models
by: Subramani, Nishant, et al.
Published: (2026)

Biases Propagate in Encoder-based Vision-Language Models: A Systematic Analysis From Intrinsic Measures to Zero-shot Retrieval Outcomes
by: Ghate, Kshitish, et al.
Published: (2025)

Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
by: Liu, Jiarui, et al.
Published: (2024)

LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization
by: Liu, Jiarui, et al.
Published: (2025)

EigenBench: A Comparative Behavioral Measure of Value Alignment
by: Chang, Jonathn, et al.
Published: (2025)

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
by: ElNokrashy, Muhammad, et al.
Published: (2022)

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
by: He, Chaoqun, et al.
Published: (2024)

LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark
by: Chen, Ziyang, et al.
Published: (2026)

Investigating Cultural Alignment of Large Language Models
by: AlKhamissi, Badr, et al.
Published: (2024)

ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
by: Cheung, Ka Lung, et al.
Published: (2024)

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
by: Xiao, Yunze, et al.
Published: (2024)

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
by: Wu, Yuhao, et al.
Published: (2024)

DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment
by: Wedgwood, James, et al.
Published: (2026)

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
by: Gauba, Aruna, et al.
Published: (2025)

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection
by: Liu, Jiarui, et al.
Published: (2026)

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
by: Kim, Yoonsik, et al.
Published: (2024)