Saved in:
| Main Authors: | Tong, Xin, Jin, Bo, Lin, Zhi, Wang, Binjun, Yu, Ting, Cheng, Qiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.07234 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HAVE: Head-Adaptive Gating and ValuE Calibration for Hallucination Mitigation in Large Language Models
by: Tong, Xin, et al.
Published: (2025)
by: Tong, Xin, et al.
Published: (2025)
MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors
by: Tong, Xin, et al.
Published: (2025)
by: Tong, Xin, et al.
Published: (2025)
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models
by: Ding, Jing, et al.
Published: (2025)
by: Ding, Jing, et al.
Published: (2025)
TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine
by: Yue, Wenjing, et al.
Published: (2024)
by: Yue, Wenjing, et al.
Published: (2024)
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
by: Li, Zhong-Zhi, et al.
Published: (2024)
by: Li, Zhong-Zhi, et al.
Published: (2024)
Chinese Labor Law Large Language Model Benchmark
by: Lan, Zixun, et al.
Published: (2026)
by: Lan, Zixun, et al.
Published: (2026)
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
by: Yu, Linhao, et al.
Published: (2024)
by: Yu, Linhao, et al.
Published: (2024)
Benchmarking Reasoning Robustness in Large Language Models
by: Yu, Tong, et al.
Published: (2025)
by: Yu, Tong, et al.
Published: (2025)
ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese Domain
by: Wang, Bingchao
Published: (2024)
by: Wang, Bingchao
Published: (2024)
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
by: Liu, Mianxin, et al.
Published: (2024)
by: Liu, Mianxin, et al.
Published: (2024)
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models
by: Liu, Shuyi, et al.
Published: (2025)
by: Liu, Shuyi, et al.
Published: (2025)
Dynamic Guided and Domain Applicable Safeguards for Enhanced Security in Large Language Models
by: Luo, Weidi, et al.
Published: (2024)
by: Luo, Weidi, et al.
Published: (2024)
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
by: Zhou, Chengfeng, et al.
Published: (2025)
by: Zhou, Chengfeng, et al.
Published: (2025)
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model
by: Zhou, Zhi, et al.
Published: (2024)
by: Zhou, Zhi, et al.
Published: (2024)
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
by: LI, Yizhi, et al.
Published: (2024)
by: LI, Yizhi, et al.
Published: (2024)
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving
by: Chen, Andong, et al.
Published: (2024)
by: Chen, Andong, et al.
Published: (2024)
Revolutionizing Database Q&A with Large Language Models: Comprehensive Benchmark and Evaluation
by: Zheng, Yihang, et al.
Published: (2024)
by: Zheng, Yihang, et al.
Published: (2024)
DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation
by: Liang, Renzhao, et al.
Published: (2026)
by: Liang, Renzhao, et al.
Published: (2026)
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset
by: Zhu, Jie, et al.
Published: (2024)
by: Zhu, Jie, et al.
Published: (2024)
AlignBench: Benchmarking Chinese Alignment of Large Language Models
by: Liu, Xiao, et al.
Published: (2023)
by: Liu, Xiao, et al.
Published: (2023)
MSDiagnosis: A Benchmark for Evaluating Large Language Models in Multi-Step Clinical Diagnosis
by: Hou, Ruihui, et al.
Published: (2024)
by: Hou, Ruihui, et al.
Published: (2024)
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
by: Zheng, Yu, et al.
Published: (2025)
by: Zheng, Yu, et al.
Published: (2025)
Enterprise Large Language Model Evaluation Benchmark
by: Wang, Liya, et al.
Published: (2025)
by: Wang, Liya, et al.
Published: (2025)
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models
by: Chen, Xiaoyang, et al.
Published: (2025)
by: Chen, Xiaoyang, et al.
Published: (2025)
MedGo: A Chinese Medical Large Language Model
by: Zhang, Haitao, et al.
Published: (2024)
by: Zhang, Haitao, et al.
Published: (2024)
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
by: Ma, Guoqing, et al.
Published: (2025)
by: Ma, Guoqing, et al.
Published: (2025)
Benchmarking Multi-Step Legal Reasoning and Analyzing Chain-of-Thought Effects in Large Language Models
by: Yu, Wenhan, et al.
Published: (2025)
by: Yu, Wenhan, et al.
Published: (2025)
Large Language Models for Cyber Security: A Systematic Literature Review
by: Xu, Hanxiang, et al.
Published: (2024)
by: Xu, Hanxiang, et al.
Published: (2024)
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
by: Li, Miao, et al.
Published: (2024)
by: Li, Miao, et al.
Published: (2024)
AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation
by: Zhang, Tanghaoran, et al.
Published: (2026)
by: Zhang, Tanghaoran, et al.
Published: (2026)
A Survey on Data Security in Large Language Models
by: Chen, Kang, et al.
Published: (2025)
by: Chen, Kang, et al.
Published: (2025)
Designing Domain-Specific Large Language Models: The Critical Role of Fine-Tuning in Public Opinion Simulation
by: Lin, Haocheng
Published: (2024)
by: Lin, Haocheng
Published: (2024)
DivLogicEval: A Framework for Benchmarking Logical Reasoning Evaluation in Large Language Models
by: Chung, Tsz Ting, et al.
Published: (2025)
by: Chung, Tsz Ting, et al.
Published: (2025)
HSKBenchmark: Modeling and Benchmarking Chinese Second Language Acquisition in Large Language Models through Curriculum Tuning
by: Yang, Qihao, et al.
Published: (2025)
by: Yang, Qihao, et al.
Published: (2025)
Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
by: Tan, Yingshui, et al.
Published: (2024)
by: Tan, Yingshui, et al.
Published: (2024)
KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs
by: Pei, Aihua, et al.
Published: (2024)
by: Pei, Aihua, et al.
Published: (2024)
Chinese Metaphor Recognition Using a Multi-stage Prompting Large Language Model
by: Wang, Jie, et al.
Published: (2024)
by: Wang, Jie, et al.
Published: (2024)
Similar Items
-
HAVE: Head-Adaptive Gating and ValuE Calibration for Hallucination Mitigation in Large Language Models
by: Tong, Xin, et al.
Published: (2025) -
MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors
by: Tong, Xin, et al.
Published: (2025) -
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models
by: Ding, Jing, et al.
Published: (2025) -
TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine
by: Yue, Wenjing, et al.
Published: (2024) -
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
by: Li, Zhong-Zhi, et al.
Published: (2024)