Skip to content
← Back to the blog
Methodology5 min read

Why we build our own nutrient dataset

Why Zutato leans on BLS, manufacturer labels, and its own calculations – instead of trusting OpenFoodFacts or an AI.

Published on

We don't ask the AI what's in your carrot.

Sounds odd for 2026 – we're building an app that uses AI in plenty of places. But the moment it's about the concrete nutrient value that ends up in your logbook, we deliberately don't ask a language model.

Plausible, but wrong.

Ask a language model today how much iron is in 100 g of whole-grain oats. Ask it again tomorrow. Ask it the day after with slightly different wording. You'll get three answers that all sound convincing – and differ by several milligrams.

That's not a vibe, that's measurable. A 2025 study fed ChatGPT meal photographs and compared the estimates against reference values. The findings:

  • Calcium 27.8 % below the real value
  • Potassium 49.5 % below the real value
  • Folate 38.6 % below the real value
  • even vitamin D estimated at zero at the median
  • Portion weight underestimated in 76.3 % of cases – and every micronutrient estimate builds on top of that

A separate 2025 study compared three language models and found ChatGPT and Claude at a mean absolute error of around 36 % for weight and energy – but 40 to 73 % on the macronutrients themselves, with Gemini landing between 64 and 110 % depending on the nutrient. The authors see real potential for rough tracking use cases, but explicitly call the models unsuitable for precise values.

The underlying issue isn't "AI isn't good enough yet" – it's structural. Language models generate plausible-sounding values without a verifiable source behind them. There's no dataset the answer traces back to, and the next roll of the dice gives you a different number.

For a tracking app that's a non-starter. A 4.6 can't turn into a 6.7 tomorrow just because the model rolled differently. If your iron numbers are supposed to mean something, they have to be reproducible.

Crowdsourced is great – just not enough as a sole source.

The obvious alternative would be OpenFoodFacts. Over 100,000 volunteers have contributed more than 4 million products from 150 countries there, the database is open, free to use, and the project follows a mission we respect. Without OFF the whole conversation around open food data would be poorer.

What OFF does well: reach, openness, a huge pool of products that aren't digitally captured anywhere else. What OFF structurally can't do: guarantee that individual values were verified before publication. The platform says so itself in its terms of use, in essence – it offers no assurance that the data is accurate, complete, or reliable. Review there means community moderation: other contributors can correct entries, automated pipelines extract values from photos and flag anomalies. But a binding, formal review process before publication doesn't exist. Through the API you eventually get the same unverified values that were entered.

That's not a complaint about OFF – it's the honest consequence of the model. Crowdsourced works for reach. For an app you trust with your own nutrition tracking, it isn't enough as a sole source.

What we're left with: math.

So we take the inconvenient path. For base ingredients – oats, carrots, lentils, tofu, olive oil, the few hundred building blocks most products are made of – we rely on the Bundeslebensmittelschlüssel (BLS). It's the standard reference maintained by the Max Rubner-Institut, with values whose origin is traceable.

For specific branded products, the manufacturer label is the primary source. What's printed on the pack goes in. When a manufacturer doesn't declare micronutrients – which is the rule, not the exception – things get interesting.

At the spot where a typical app would ask an AI, we calculate. From a product's ingredient list and the known BLS values for each ingredient, the total nutrient profile can be derived deterministically. It's work – we have to estimate proportions, account for processing steps, sanity-check the result – but it's work that yields the same answer every time. If someone provides the same input tomorrow, the same output comes out. If a BLS update lands, it's clear which values shifted and why. What we calculated shows up next to your value as calculated – not as magic truth, but as what it is.

We're deliberately not getting into the specific algorithms here. What matters isn't how clever the calculation is – what matters is that it's deterministic and traceable. An AI isn't, by design.

More work for us, less magic for you.

The result is a curated, self-owned dataset we're responsible for. Smaller than OFF, growing more slowly, considerably less spectacular. In exchange, a 4.6 is still allowed to be a 4.6 two weeks from now – and if it isn't, we can see exactly why.

That's the deal: we take on the work, you get a number you can trust.

Sources

  • O'Hara C. et al. (2025): An Evaluation of ChatGPT for Nutrient Content Estimation from Meal Photographs. Nutrients 17(4):607. doi.org/10.3390/nu17040607
  • Fridolfsson J. et al. (2025): Performance Evaluation of 3 Large Language Models for Nutritional Content Estimation from Food Images. Curr. Dev. Nutr. 2025;9(10):107556. doi.org/10.1016/j.cdnut.2025.107556