Can LLMs take on the role of human experts in data analysis?


Can we use the large language models as a mechanism for quantitative knowledge retrieval to aid data analysis tasks? A guest post by Kai Spriestersbach.

In data science, researchers often face the challenge of working with incomplete data sets. Many established algorithms simply cannot process incomplete data series. Traditionally, data scientists have turned to experts to fill in the gaps with their expert knowledge, a process that is time-consuming and not always practical.

But what if a machine could take over this expert role?

Our research group has focused on this question and investigated whether large language models (LLMs) can act as digital experts. These models, trained on a huge amount of text, potentially have a deep understanding of diverse topics, from medical data to social science issues.



By comparing the LLMs’ answers with real data and established statistical methods for dealing with data gaps, we have gained exciting insights. Our results show that in many cases, LLMs can provide similarly accurate estimates as traditional methods without relying on human experts.

Two methods in data analysis

When analyzing data, whether in medicine, economics, or environmental research, one often encounters the problem of incomplete information. Two key techniques are used: prior elicitation (the determination of prior knowledge) and data imputation (the supplementation of missing data).

Prior elicitation refers to the systematic collection of existing expert knowledge to make assumptions about certain parameters in our models.

Data imputation, on the other hand, comes into play when information is missing from our data sets. Rather than discarding valuable data sets because of a few gaps, scientists use statistical methods to fill those gaps with plausible values.

Data imputation with LLMs

In the first part of the research project, we asked whether large language models (LLMs) can replace human experts in practice, and how the information from LLMs compares to traditional data imputation methods.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top