Human–Machine Benchmarking

Project

Human–Machine Benchmarking

Overview

We are studying how AI tools and lawyers perform side by side to help define quality standards in legal work. Through blind evaluations by senior attorneys, we compare output from leading legal AI tools, LLMs, and mid-level associates to better understand how technology can help strengthen the quality of legal services.

We are investigating underlying quantitative and qualitative metrics around notions of quality in legal work output. Senior attorneys at large firms undergo a blind evaluation and will rank, based on their preferences, output completed by enterprise legal AI tools (e.g. CoCounsel, Lexis+, Harvey, etc.), off-the-shelf large language models (e.g. GPT-5, Claude-4, etc.), and human mid-level associates (3-5 years of experience).