We develop new methods to link the output of an AI model to its input, also called “data attribution”. Our method focuses on achieving such attribution in an efficient way, so that it can be done at scale. Data attribution is a particularly important aspect of LLMs because, among others, it would enable creators to be compensated when some of their underlying work is used to generate new texts, images, music and so on.
Project