Saved in:
| Main Authors: | Xiao, Yunze, He, Tingyu, Wang, Lionel Z., Ma, Yiming, Song, Xingyu, Xu, Xiaohang, Diab, Mona, Li, Irene, Ng, Ka Chung |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.21679 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Large Language Models Resolve Semantic Discrepancy in Self-Destructive Subcultures? Evidence from Jirai Kei
by: Wang, Peng, et al.
Published: (2026)
by: Wang, Peng, et al.
Published: (2026)
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
by: Xiao, Yunze, et al.
Published: (2025)
by: Xiao, Yunze, et al.
Published: (2025)
Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens
by: AlKhamissi, Mai, et al.
Published: (2025)
by: AlKhamissi, Mai, et al.
Published: (2025)
MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
by: Wang, Lionel Z., et al.
Published: (2024)
by: Wang, Lionel Z., et al.
Published: (2024)
Towards Valid Student Simulation with Large Language Models
by: Yuan, Zhihao, et al.
Published: (2026)
by: Yuan, Zhihao, et al.
Published: (2026)
Sentipolis: Emotion-Aware Agents for Social Simulations
by: Fu, Chiyuan, et al.
Published: (2026)
by: Fu, Chiyuan, et al.
Published: (2026)
SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
by: Subramani, Nishant, et al.
Published: (2025)
by: Subramani, Nishant, et al.
Published: (2025)
A Note on Bias to Complete
by: Xu, Jia, et al.
Published: (2024)
by: Xu, Jia, et al.
Published: (2024)
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
by: Bai, Yushi, et al.
Published: (2023)
by: Bai, Yushi, et al.
Published: (2023)
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
by: Salama, Rana, et al.
Published: (2025)
by: Salama, Rana, et al.
Published: (2025)
Evaluating Large Language Model Biases in Persona-Steered Generation
by: Liu, Andy, et al.
Published: (2024)
by: Liu, Andy, et al.
Published: (2024)
Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
by: Liu, Jiarui, et al.
Published: (2025)
by: Liu, Jiarui, et al.
Published: (2025)
DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding
by: Zhu, Hengchuan, et al.
Published: (2025)
by: Zhu, Hengchuan, et al.
Published: (2025)
StressRoBERTa: Cross-Condition Transfer Learning from Depression, Anxiety, and PTSD to Stress Detection
by: Alqahtani, Amal, et al.
Published: (2025)
by: Alqahtani, Amal, et al.
Published: (2025)
DWTSumm: Discrete Wavelet Transform for Document Summarization
by: Salama, Rana, et al.
Published: (2026)
by: Salama, Rana, et al.
Published: (2026)
Taming Object Hallucinations with Verified Atomic Confidence Estimation
by: Liu, Jiarui, et al.
Published: (2025)
by: Liu, Jiarui, et al.
Published: (2025)
Semantic Compression for Word and Sentence Embeddings using Discrete Wavelet Transform
by: Salama, Rana Aref, et al.
Published: (2025)
by: Salama, Rana Aref, et al.
Published: (2025)
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
by: Muhamed, Aashiq, et al.
Published: (2024)
by: Muhamed, Aashiq, et al.
Published: (2024)
Emotion Classification in Low and Moderate Resource Languages
by: Tafreshi, Shabnam, et al.
Published: (2024)
by: Tafreshi, Shabnam, et al.
Published: (2024)
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
by: Hu, Tiancheng, et al.
Published: (2025)
by: Hu, Tiancheng, et al.
Published: (2025)
ScholarBench: A Bilingual Benchmark for Abstraction, Comprehension, and Reasoning Evaluation in Academic Contexts
by: Noh, Dongwon, et al.
Published: (2025)
by: Noh, Dongwon, et al.
Published: (2025)
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
by: Li, Wenkai, et al.
Published: (2024)
by: Li, Wenkai, et al.
Published: (2024)
CoRAG: Collaborative Retrieval-Augmented Generation
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
Personal Information Parroting in Language Models
by: Subramani, Nishant, et al.
Published: (2026)
by: Subramani, Nishant, et al.
Published: (2026)
Biases Propagate in Encoder-based Vision-Language Models: A Systematic Analysis From Intrinsic Measures to Zero-shot Retrieval Outcomes
by: Ghate, Kshitish, et al.
Published: (2025)
by: Ghate, Kshitish, et al.
Published: (2025)
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
by: Liu, Jiarui, et al.
Published: (2024)
by: Liu, Jiarui, et al.
Published: (2024)
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization
by: Liu, Jiarui, et al.
Published: (2025)
by: Liu, Jiarui, et al.
Published: (2025)
EigenBench: A Comparative Behavioral Measure of Value Alignment
by: Chang, Jonathn, et al.
Published: (2025)
by: Chang, Jonathn, et al.
Published: (2025)
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
by: ElNokrashy, Muhammad, et al.
Published: (2022)
by: ElNokrashy, Muhammad, et al.
Published: (2022)
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
by: He, Chaoqun, et al.
Published: (2024)
by: He, Chaoqun, et al.
Published: (2024)
LongBench Pro: A More Realistic and Comprehensive Bilingual Long-Context Evaluation Benchmark
by: Chen, Ziyang, et al.
Published: (2026)
by: Chen, Ziyang, et al.
Published: (2026)
Investigating Cultural Alignment of Large Language Models
by: AlKhamissi, Badr, et al.
Published: (2024)
by: AlKhamissi, Badr, et al.
Published: (2024)
ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
by: Cheung, Ka Lung, et al.
Published: (2024)
by: Cheung, Ka Lung, et al.
Published: (2024)
ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
by: Xiao, Yunze, et al.
Published: (2024)
by: Xiao, Yunze, et al.
Published: (2024)
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
by: Wu, Yuhao, et al.
Published: (2024)
by: Wu, Yuhao, et al.
Published: (2024)
DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment
by: Wedgwood, James, et al.
Published: (2026)
by: Wedgwood, James, et al.
Published: (2026)
AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark
by: Gauba, Aruna, et al.
Published: (2025)
by: Gauba, Aruna, et al.
Published: (2025)
MixSD: Mixed Contextual Self-Distillation for Knowledge Injection
by: Liu, Jiarui, et al.
Published: (2026)
by: Liu, Jiarui, et al.
Published: (2026)
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
by: Kim, Yoonsik, et al.
Published: (2024)
by: Kim, Yoonsik, et al.
Published: (2024)
Similar Items
-
Can Large Language Models Resolve Semantic Discrepancy in Self-Destructive Subcultures? Evidence from Jirai Kei
by: Wang, Peng, et al.
Published: (2026) -
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
by: Xiao, Yunze, et al.
Published: (2025) -
Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens
by: AlKhamissi, Mai, et al.
Published: (2025) -
MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
by: Wang, Lionel Z., et al.
Published: (2024) -
Towards Valid Student Simulation with Large Language Models
by: Yuan, Zhihao, et al.
Published: (2026)