Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection

Ina Ganguli; Jeffrey Lin; Vitaly Meursault; Nicholas F. Reynolds

doi:10.3386/w32934

Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection

Ina Ganguli, Jeffrey Lin, Vitaly Meursault & Nicholas F. Reynolds

Working Paper 32934

DOI 10.3386/w32934

Issue Date September 2024

As distorted maps may mislead, Natural Language Processing (NLP) models may misrepresent. How do we know which NLP model to trust? We provide comprehensive guidance for selecting and applying NLP representations of patent text. We develop novel validation tasks to evaluate several leading NLP models. These tasks assess how well candidate models align with both expert and non-expert judgments of patent similarity. State-of-the-art language models significantly outperform traditional approaches such as TF-IDF. Using our validated representations, we measure a secular decline in contemporaneous patent similarity: inventors are “spreading out” over an expanding knowledge frontier. This finding is corroborated by declining rates of multiple invention from newly-digitized historical patent interference records. In contrast, selecting another single representation without validating alternatives yields an ambiguous or even opposing trend. Thus, our framework addresses a fundamental challenge of selecting among different black-box NLP models that produce varying economic measurements. To facilitate future research, we plan to provide our validation task data and embeddings for all US patents from 1836–2023.

The views expressed in this paper are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Philadelphia, the Federal Reserve System, or the National Bureau of Economic Research. No statements here should be treated as legal advice. Any errors or omissions are the responsibility of the authors. Acknowledgements: We gratefully acknowledge support from an NBER Innovation Policy Grant. We also received excellent RA support from Aaron Rosenbaum, Joseph Huang, Cameron Fen, Annette Gailliot, Jake Moore, and Isaac Rand. Finally, we received useful feedback from Matt Clancy, Darya Davydova, Gaétan de Rassenfosse, Luise Eisfeld, Deanna James, Semyon Malamud, Roxana Mihet, participants of the seminar at EPFL, and participants of the NBER Innovation Information Initiative Technical Working Group Meeting and TADA 2023. First version: December 21, 2023.
MARC RIS BibTeΧ

Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection

Related

Topics

Programs

Working Groups

Conferences

More from NBER