Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Qiyao, Chen, Guhong, Wang, Hongbo, Liu, Huaren, Zhu, Minghui, Qin, Zhifei, Li, Linwei, Yue, Yilin, Wang, Shiqiang, Li, Jiayan, Wu, Yihang, Liu, Ziqiang, Chen, Longze, Luo, Run, Fan, Liyang, Li, Jiaming, Zhang, Lei, Xu, Kan, Li, Chengming, Alinejad-Rokny, Hamid, Ni, Shiwen, Lin, Yuan, Yang, Min
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.15524
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912610580955136
author	Wang, Qiyao Chen, Guhong Wang, Hongbo Liu, Huaren Zhu, Minghui Qin, Zhifei Li, Linwei Yue, Yilin Wang, Shiqiang Li, Jiayan Wu, Yihang Liu, Ziqiang Chen, Longze Luo, Run Fan, Liyang Li, Jiaming Zhang, Lei Xu, Kan Li, Chengming Alinejad-Rokny, Hamid Ni, Shiwen Lin, Yuan Yang, Min
author_facet	Wang, Qiyao Chen, Guhong Wang, Hongbo Liu, Huaren Zhu, Minghui Qin, Zhifei Li, Linwei Yue, Yilin Wang, Shiqiang Li, Jiayan Wu, Yihang Liu, Ziqiang Chen, Longze Luo, Run Fan, Liyang Li, Jiaming Zhang, Lei Xu, Kan Li, Chengming Alinejad-Rokny, Hamid Ni, Shiwen Lin, Yuan Yang, Min
contents	Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_15524
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property Wang, Qiyao Chen, Guhong Wang, Hongbo Liu, Huaren Zhu, Minghui Qin, Zhifei Li, Linwei Yue, Yilin Wang, Shiqiang Li, Jiayan Wu, Yihang Liu, Ziqiang Chen, Longze Luo, Run Fan, Liyang Li, Jiaming Zhang, Lei Xu, Kan Li, Chengming Alinejad-Rokny, Hamid Ni, Shiwen Lin, Yuan Yang, Min Computation and Language Artificial Intelligence Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.
title	IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2504.15524

Similar Items