LLM Pipeline for Mapping Heterogeneous Data: A Case Study in Food Classification

Avtorji

Kevin Nils Röhl
Univerza uporabnih znanosti HTW Berlin
Jan Wirsam
Univerza uporabnih znanosti HTW Berlin
https://orcid.org/0009-0004-7083-178X (neavtoriziran)

Kratka vsebina

Accurate food classification is essential for ensuring compliance with dietary regulations, nutritional standards, and sustainability guidelines, but it remains challenging due to fragmented data and semantic complexity. This study presents a pipeline leveraging large language model (LLM) embeddings, ontology mapping, and human-in-the-loop validation to enhance food classification in institutional food services. The pipeline achieves high accuracy in dietary-group mapping (precision 0.94, recall 0.91, F1-score 0.92), though precise FoodEx2 code matching remains challenging. A confidence-based validation strategy effectively balances automated processes with expert oversight to manage ambiguity. The proposed approach enables digital transformation of traditionally fragmented food service systems, enhancing transparency, operational efficiency, and alignment with dietary and public health guidelines. Future research should deploy this pipeline in operational canteen settings to refine embedding techniques, enhance accuracy, and support sustainable nutrition management.

Biografije avtorja

Kevin Nils Röhl, Univerza uporabnih znanosti HTW Berlin

Berlin, Nemčija. E-mail: roehl@htw-berlin.de

Rainer Alt, Leipzig University

Leipzig, Nemčija. E-mail: rainer.alt@uni-leipzig.de

Jan Wirsam, Univerza uporabnih znanosti HTW Berlin

Berlin, Nemčija. E-mail: wirsam@htw-berlin.de

Prenosi

Napovedujemo

09.06.2025