Saved in:
Bibliographic Details
Main Authors: Wang, Qiyao, Chen, Guhong, Wang, Hongbo, Liu, Huaren, Zhu, Minghui, Qin, Zhifei, Li, Linwei, Yue, Yilin, Wang, Shiqiang, Li, Jiayan, Wu, Yihang, Liu, Ziqiang, Chen, Longze, Luo, Run, Fan, Liyang, Li, Jiaming, Zhang, Lei, Xu, Kan, Li, Chengming, Alinejad-Rokny, Hamid, Ni, Shiwen, Lin, Yuan, Yang, Min
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.15524
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912610580955136
author Wang, Qiyao
Chen, Guhong
Wang, Hongbo
Liu, Huaren
Zhu, Minghui
Qin, Zhifei
Li, Linwei
Yue, Yilin
Wang, Shiqiang
Li, Jiayan
Wu, Yihang
Liu, Ziqiang
Chen, Longze
Luo, Run
Fan, Liyang
Li, Jiaming
Zhang, Lei
Xu, Kan
Li, Chengming
Alinejad-Rokny, Hamid
Ni, Shiwen
Lin, Yuan
Yang, Min
author_facet Wang, Qiyao
Chen, Guhong
Wang, Hongbo
Liu, Huaren
Zhu, Minghui
Qin, Zhifei
Li, Linwei
Yue, Yilin
Wang, Shiqiang
Li, Jiayan
Wu, Yihang
Liu, Ziqiang
Chen, Longze
Luo, Run
Fan, Liyang
Li, Jiaming
Zhang, Lei
Xu, Kan
Li, Chengming
Alinejad-Rokny, Hamid
Ni, Shiwen
Lin, Yuan
Yang, Min
contents Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.
format Preprint
id arxiv_https___arxiv_org_abs_2504_15524
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
Wang, Qiyao
Chen, Guhong
Wang, Hongbo
Liu, Huaren
Zhu, Minghui
Qin, Zhifei
Li, Linwei
Yue, Yilin
Wang, Shiqiang
Li, Jiayan
Wu, Yihang
Liu, Ziqiang
Chen, Longze
Luo, Run
Fan, Liyang
Li, Jiaming
Zhang, Lei
Xu, Kan
Li, Chengming
Alinejad-Rokny, Hamid
Ni, Shiwen
Lin, Yuan
Yang, Min
Computation and Language
Artificial Intelligence
Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.
title IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2504.15524