Saved in:
| Main Authors: | Tang, Xiangru, Liu, Yuliang, Cai, Zefan, Shao, Yanjun, Lu, Junjie, Zhang, Yichi, Deng, Zexuan, Hu, Helan, An, Kaikai, Huang, Ruijun, Si, Shuzheng, Chen, Sheng, Zhao, Haozhe, Chen, Liang, Wang, Yan, Liu, Tianyu, Jiang, Zhiwei, Chang, Baobao, Fang, Yin, Qin, Yujia, Zhou, Wangchunshu, Zhao, Yilun, Cohan, Arman, Gerstein, Mark |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.09835 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning
by: Si, Shuzheng, et al.
Published: (2023)
by: Si, Shuzheng, et al.
Published: (2023)
Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints
by: An, Kaikai, et al.
Published: (2024)
by: An, Kaikai, et al.
Published: (2024)
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
by: Tang, Xiangru, et al.
Published: (2023)
by: Tang, Xiangru, et al.
Published: (2023)
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
by: Zhao, Haozhe, et al.
Published: (2024)
by: Zhao, Haozhe, et al.
Published: (2024)
Investigating Data Contamination in Modern Benchmarks for Large Language Models
by: Deng, Chunyuan, et al.
Published: (2023)
by: Deng, Chunyuan, et al.
Published: (2023)
MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise
by: Deng, Chunyuan, et al.
Published: (2024)
by: Deng, Chunyuan, et al.
Published: (2024)
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
by: Tang, Xiangru, et al.
Published: (2025)
by: Tang, Xiangru, et al.
Published: (2025)
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
by: Zhao, Haozhe, et al.
Published: (2023)
by: Zhao, Haozhe, et al.
Published: (2023)
Step-Back Profiling: Distilling User History for Personalized Scientific Writing
by: Tang, Xiangru, et al.
Published: (2024)
by: Tang, Xiangru, et al.
Published: (2024)
Risks of AI Scientists: Prioritizing Safeguarding Over Autonomy
by: Tang, Xiangru, et al.
Published: (2024)
by: Tang, Xiangru, et al.
Published: (2024)
Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
by: Zhao, Haozhe, et al.
Published: (2024)
by: Zhao, Haozhe, et al.
Published: (2024)
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2023)
by: Tang, Xiangru, et al.
Published: (2023)
SUCEA: Reasoning-Intensive Retrieval for Adversarial Fact-checking through Claim Decomposition and Editing
by: Liu, Hongjun, et al.
Published: (2025)
by: Liu, Hongjun, et al.
Published: (2025)
PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
by: Chen, Liang, et al.
Published: (2024)
by: Chen, Liang, et al.
Published: (2024)
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
by: Chen, Liang, et al.
Published: (2024)
by: Chen, Liang, et al.
Published: (2024)
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search
by: Hu, Yunhai, et al.
Published: (2025)
by: Hu, Yunhai, et al.
Published: (2025)
PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving
by: Feng, Kaiyue, et al.
Published: (2025)
by: Feng, Kaiyue, et al.
Published: (2025)
Table-R1: Inference-Time Scaling for Table Reasoning
by: Yang, Zheyuan, et al.
Published: (2025)
by: Yang, Zheyuan, et al.
Published: (2025)
GATEAU: Selecting Influential Samples for Long Context Alignment
by: Si, Shuzheng, et al.
Published: (2024)
by: Si, Shuzheng, et al.
Published: (2024)
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
by: Zhao, Haozhe, et al.
Published: (2024)
by: Zhao, Haozhe, et al.
Published: (2024)
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
by: Zhao, Haochen, et al.
Published: (2024)
by: Zhao, Haochen, et al.
Published: (2024)
LimRank: Less is More for Reasoning-Intensive Information Reranking
by: Song, Tingyu, et al.
Published: (2025)
by: Song, Tingyu, et al.
Published: (2025)
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering
by: Long, Yitao, et al.
Published: (2025)
by: Long, Yitao, et al.
Published: (2025)
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents
by: Hu, Tiansheng, et al.
Published: (2026)
by: Hu, Tiansheng, et al.
Published: (2026)
FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains
by: Zhao, Yilun, et al.
Published: (2023)
by: Zhao, Yilun, et al.
Published: (2023)
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2025)
by: Tang, Xiangru, et al.
Published: (2025)
FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents
by: Zhao, Yilun, et al.
Published: (2024)
by: Zhao, Yilun, et al.
Published: (2024)
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
by: Deng, Chunyuan, et al.
Published: (2024)
by: Deng, Chunyuan, et al.
Published: (2024)
ANCHOR: Branch-Point Data Generation for GUI Agents
by: Wei, Jinbiao, et al.
Published: (2026)
by: Wei, Jinbiao, et al.
Published: (2026)
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
by: Zhao, Yilun, et al.
Published: (2025)
by: Zhao, Yilun, et al.
Published: (2025)
SANTA: Separate Strategies for Inaccurate and Incomplete Annotation Noise in Distantly-Supervised Named Entity Recognition
by: Si, Shuzheng, et al.
Published: (2023)
by: Si, Shuzheng, et al.
Published: (2023)
Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering
by: Si, Shuzheng, et al.
Published: (2025)
by: Si, Shuzheng, et al.
Published: (2025)
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
by: Zhao, Yilun, et al.
Published: (2023)
by: Zhao, Yilun, et al.
Published: (2023)
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
by: Zhao, Yilun, et al.
Published: (2026)
by: Zhao, Yilun, et al.
Published: (2026)
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain
by: Hu, Tiansheng, et al.
Published: (2025)
by: Hu, Tiansheng, et al.
Published: (2025)
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks
by: Si, Shuzheng, et al.
Published: (2025)
by: Si, Shuzheng, et al.
Published: (2025)
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
by: Yu, Zhaojian, et al.
Published: (2024)
by: Yu, Zhaojian, et al.
Published: (2024)
UltraIF: Advancing Instruction Following from the Wild
by: An, Kaikai, et al.
Published: (2025)
by: An, Kaikai, et al.
Published: (2025)
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
by: Chen, Liang, et al.
Published: (2024)
by: Chen, Liang, et al.
Published: (2024)
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
by: Zhang, Siyue, et al.
Published: (2025)
by: Zhang, Siyue, et al.
Published: (2025)
Similar Items
-
Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning
by: Si, Shuzheng, et al.
Published: (2023) -
Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints
by: An, Kaikai, et al.
Published: (2024) -
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
by: Tang, Xiangru, et al.
Published: (2023) -
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
by: Zhao, Haozhe, et al.
Published: (2024) -
Investigating Data Contamination in Modern Benchmarks for Large Language Models
by: Deng, Chunyuan, et al.
Published: (2023)