Saved in:
| Main Authors: | Zhang, Hengxiang, Gao, Hongfu, Hu, Qiang, Chen, Guanhua, Yang, Lili, Jing, Bingyi, Wei, Hongxin, Wang, Bing, Bai, Haifeng, Yang, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.18491 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fine-tuning can Help Detect Pretraining Data from Large Language Models
by: Zhang, Hengxiang, et al.
Published: (2024)
by: Zhang, Hengxiang, et al.
Published: (2024)
Exploring Imbalanced Annotations for Effective In-Context Learning
by: Gao, Hongfu, et al.
Published: (2025)
by: Gao, Hongfu, et al.
Published: (2025)
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
by: Zhang, Wenjing, et al.
Published: (2024)
by: Zhang, Wenjing, et al.
Published: (2024)
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving
by: Chen, Andong, et al.
Published: (2024)
by: Chen, Andong, et al.
Published: (2024)
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
by: Lei, Yang, et al.
Published: (2023)
by: Lei, Yang, et al.
Published: (2023)
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
by: Tan, Yingshui, et al.
Published: (2024)
by: Tan, Yingshui, et al.
Published: (2024)
Defending Membership Inference Attacks via Privacy-aware Sparsity Tuning
by: Hu, Qiang, et al.
Published: (2024)
by: Hu, Qiang, et al.
Published: (2024)
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models
by: Guo, Xin, et al.
Published: (2023)
by: Guo, Xin, et al.
Published: (2023)
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
by: Cao, Hongye, et al.
Published: (2025)
by: Cao, Hongye, et al.
Published: (2025)
Detecting Distillation Data from Reasoning Models
by: Zhang, Hengxiang, et al.
Published: (2025)
by: Zhang, Hengxiang, et al.
Published: (2025)
AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models
by: Yan, Lian, et al.
Published: (2025)
by: Yan, Lian, et al.
Published: (2025)
AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models
by: Wei, Yuting, et al.
Published: (2024)
by: Wei, Yuting, et al.
Published: (2024)
INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance
by: Chen, Shisong, et al.
Published: (2025)
by: Chen, Shisong, et al.
Published: (2025)
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
by: Lan, Tian, et al.
Published: (2025)
by: Lan, Tian, et al.
Published: (2025)
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models
by: Hou, Jinchang, et al.
Published: (2024)
by: Hou, Jinchang, et al.
Published: (2024)
Safety Evaluation of DeepSeek Models in Chinese Contexts
by: Zhang, Wenjing, et al.
Published: (2025)
by: Zhang, Wenjing, et al.
Published: (2025)
MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine
by: Kong, Shufeng, et al.
Published: (2025)
by: Kong, Shufeng, et al.
Published: (2025)
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
by: He, Yancheng, et al.
Published: (2024)
by: He, Yancheng, et al.
Published: (2024)
LiveSecBench: A Dynamic and Event-Driven Safety Benchmark for Chinese Language Model Applications
by: Li, Yudong, et al.
Published: (2025)
by: Li, Yudong, et al.
Published: (2025)
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models
by: Nie, Ying, et al.
Published: (2024)
by: Nie, Ying, et al.
Published: (2024)
MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media
by: Zhai, Wei, et al.
Published: (2024)
by: Zhai, Wei, et al.
Published: (2024)
Chengyu-Bench: Benchmarking Large Language Models for Chinese Idiom Understanding and Use
by: Fu, Yicheng, et al.
Published: (2025)
by: Fu, Yicheng, et al.
Published: (2025)
AlignBench: Benchmarking Chinese Alignment of Large Language Models
by: Liu, Xiao, et al.
Published: (2023)
by: Liu, Xiao, et al.
Published: (2023)
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models
by: Ding, Jing, et al.
Published: (2025)
by: Ding, Jing, et al.
Published: (2025)
Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems
by: Wu, Wanxing, et al.
Published: (2026)
by: Wu, Wanxing, et al.
Published: (2026)
TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine
by: Yue, Wenjing, et al.
Published: (2024)
by: Yue, Wenjing, et al.
Published: (2024)
ClinConsensus: A Physician-Calibrated Benchmark for Evaluating Clinical Rubric Coverage in Chinese Medical LLMs
by: Zheng, Xiang, et al.
Published: (2026)
by: Zheng, Xiang, et al.
Published: (2026)
CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
by: Zhao, Jiahao, et al.
Published: (2024)
by: Zhao, Jiahao, et al.
Published: (2024)
Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media
by: Qi, Hongzhi, et al.
Published: (2023)
by: Qi, Hongzhi, et al.
Published: (2023)
ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
by: Wei, Victor Junqiu, et al.
Published: (2024)
by: Wei, Victor Junqiu, et al.
Published: (2024)
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
by: LI, Yizhi, et al.
Published: (2024)
by: LI, Yizhi, et al.
Published: (2024)
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
by: Yu, Erxin, et al.
Published: (2024)
by: Yu, Erxin, et al.
Published: (2024)
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
by: Yu, Linhao, et al.
Published: (2024)
by: Yu, Linhao, et al.
Published: (2024)
LAiW: A Chinese Legal Large Language Models Benchmark
by: Dai, Yongfu, et al.
Published: (2023)
by: Dai, Yongfu, et al.
Published: (2023)
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models
by: Chen, Haibin, et al.
Published: (2025)
by: Chen, Haibin, et al.
Published: (2025)
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models
by: Li, Haitao, et al.
Published: (2024)
by: Li, Haitao, et al.
Published: (2024)
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
by: Qiu, Zexuan, et al.
Published: (2024)
by: Qiu, Zexuan, et al.
Published: (2024)
Safety Evaluation and Enhancement of DeepSeek Models in Chinese Contexts
by: Zhang, Wenjing, et al.
Published: (2025)
by: Zhang, Wenjing, et al.
Published: (2025)
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
by: Liu, Mianxin, et al.
Published: (2024)
by: Liu, Mianxin, et al.
Published: (2024)
Evaluating LLMs on Chinese Idiom Translation
by: Yang, Cai, et al.
Published: (2025)
by: Yang, Cai, et al.
Published: (2025)
Similar Items
-
Fine-tuning can Help Detect Pretraining Data from Large Language Models
by: Zhang, Hengxiang, et al.
Published: (2024) -
Exploring Imbalanced Annotations for Effective In-Context Learning
by: Gao, Hongfu, et al.
Published: (2025) -
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
by: Zhang, Wenjing, et al.
Published: (2024) -
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving
by: Chen, Andong, et al.
Published: (2024) -
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
by: Lei, Yang, et al.
Published: (2023)