The rapidly growing body of publicly available data on food chemistry and food usage can be analysed using data mining and network analysis methods. Here we discuss how these approaches can yield new insights both into the sensory perception of food and the anthropology of culinary practice. We also show that this development is part of a larger trend. Over the past two decades large-scale data analysis has revolutionized the biological sciences, which have experienced an explosion of experimental data as a result of the advent of high-throughput technology. Large datasets are also changing research methodologies in the social sciences due to the data generated by mobile communication technology and online social networks. Even the arts and humanities are seeing the establishment of ‘digital humanities’ research centres in order to cope with the increasing digitization of literary and historical sources. We argue that food science is likely to be one of the next beneficiaries of large-scale data analysis, perhaps resulting in fields such as ‘computational gastronomy’.
Keywords:Networks; Data mining; Sensory science; Computational gastronomy; Flavour compounds
Large-scale data analysis
The past two decades have seen the advent of high-throughput technologies in biology, making it possible to sequence genomes cheaply and quickly, to measure gene expression for thousands of genes in parallel, and to test large numbers of potential regulatory interactions between genes in a single experiment. The large amounts of data created by these technologies have given rise to entire new research areas in biology, such as computational biology and systems biology. The latter, which attempts to understand biological processes at a ‘systems’ level, is particularly indicative of the potential advantage that large datasets and their analysis can offer to biology, and to other fields of research. This advantage is a ‘birds-eye’ perspective, which, with the right kind of analysis, can complement the more established research methods that take place ‘on the ground’ and investigate the system in much more detail. An example would be the analysis of high-throughput gene expression data of tumour tissues in order to highlight a set of potential candidate genes that may play a role in causing a particular cancer. These candidates would then be investigated one by one, for instance by creating mutant organisms in which one of these genes is deactivated.
Similar large-scale data analysis methods have more recently arrived in the social sciences as a result of rapidly growing mobile communications networks and online social networking sites. Here too data analysis offers a birds-eye perspective of large social networks and the opportunity to study social dynamics and human mobility on an unprecedented scale. The most recent research areas to be transformed by information technology are the Arts and Humanities, which have witnessed the emergence of ‘digital humanities’. As more and more literary and historical documents are digitized, it becomes possible to uncover fundamental relationships that underlie large corpora of literary texts, or long-term historical and political developments. A striking example is the discovery by Lieberman et al.  that the regularisation of verbs across 12 centuries of English is governed by a simple quantitative relationship between the frequency of verb usage and the speed at which it is regularised.
Network analysis of flavour compounds
The growing availability of network data in a wide variety of research disciplines has made complex network analysis a rapidly growing research area ever since two seminal publications in the late 1990s uncovered fundamental principles that underlie many real-world networks such as social networks, power grids, neural networks and genetic regulatory networks [2,3]. In recent work  we construct a bipartite network of chemical flavour compounds and food ingredients in which a link signifies the natural occurrence of a compound in an ingredient. These data were derived from Fenaroli’s Handbook of Flavour Ingredients. Using a one-mode projection the bipartite network is converted into a weighted network of ingredients only, in which the weight of a link between two ingredients is given by the number of flavour compounds they share. This weighted network shows a modular organization, with modules corresponding to food types such as fruits, vegetables and meats. While this might be expected, it is particularly interesting to see the location of these modules with respect to each other. Meats for instance lie between fruits and vegetables, and closer to spices and herbs than seafood does. The backbone of this network, extracted using the method described in , is shown in Figure 1.
Figure 1. Flavour network . Culinary ingredients (circles) and their chemical relationship are illustrated . The colour of each ingredient represents the food category that the ingredient belongs to, and the size of an ingredient is proportional to the usage frequency (collected from online recipe databases: epicurious.com, allrecipes.com, menupan.com). Two culinary ingredients are connected if they share many flavour compounds. We extracted the list of flavor compounds in each ingredient from them and then applied a backbone extraction method by Serrano et al.  to pick statistically significant links between ingredients. The thickness of an edge represents the number of shared flavour compounds. To reduce clutter, edges are bundled based on the algorithm by Danny Holten (http://www.win.tue.nl/~dholten/ webcite). © Yong-Yeol Ahn, Sebastian E Ahnert, James P. Bagrow, and Albert-László Barabási.
The chef Heston Blumenthal, together with flavour scientists, has suggested that two foods that share chemical flavour compounds are more likely to taste good in combination . By comparing the network of ingredients to a body of 56,498 online recipes, downloaded from epicurious.com, allrecipes.com, and menupan.com, we were able to show that this hypothesis is confirmed in most Western cuisines, but not in Eastern ones. This result indicates that shared compounds may offer one of several possible mechanisms that can make two ingredients compatible.
Our network of ingredients and flavour compounds is just a first step towards a true network of shared flavour compound perception, which would have to include compound concentrations  and detection thresholds  in order to further investigate the shared compound hypothesis. Its most important purpose is to open up a new way in which data analysis can aid sensory science and the study of culinary practice.
In a broader development the increasing availability of data on food usage, food chemistry and sensory biology is likely to result in the establishment of new research disciplines, such as ‘computational gastronomy’.
The author declares that he has no competing interests.
SEA is supported by the Royal Society, UK.