An Evolutionary Perspective on Human Cross-sensitivity to Tree Nut and Seed Allergens

Tree nut allergies are some of the most common and serious allergies in the United States. Patients who are sensitive to nuts or to seeds commonly called nuts are advised to avoid consuming a variety of different species, even though these may be distantly related in terms of their evolutionary history. This is because studies in the literature report that patients often display sensitivity to multiple nut species (crosssensitivity) if they have an existing nut allergy. These reports suggest that cross-sensitivity in patients with nut allergies may be caused by an IgE antibody reacting with epitopes present in the seed proteins of different species (cross-reactivity), for example, if IgE isolated from the serum of a patient were able to bind to both almond and peanut allergens. We hypothesize that allergenic proteins in seeds may have similar amino acid sequences that cause the observed cross-sensitivity. Here, we test the hypothesis that similarity in the protein sequences of allergenic nuts drives cross-sensitivity and cross-reactivity by reconstructing the gene trees of three allergenic seed-storage proteins (vicilin, legumin, and 2S albumin) from species sampled across vascular plants. We generate estimates of their phylogenetic relationships and compare these to the allergen cross-sensitivity and cross-reactivity data that is reported in the literature. In general, evolutionary relationships of the three proteins are congruent with the current understanding of plant species relationships. However, we find little evidence that distantly related nut species reported to be crossreactive share similar vicilin, legumin, or 2S albumin amino acid sequences. Our data thus suggests that features of the proteins other than their amino acid sequences may be driving the cross-reactivity observed during in vitro tests and skin tests. Our results support current treatment guidelines to limit nut and seed consumption if allergies are present in a patient. More studies are necessary to better understand the characteristics of allergenic proteins and patterns of cross-sensitivity in patients who suffer from nut allergies.


INTRODUCTION
Nuts are a major agricultural commodity in California (USDA 2014) and nut consumption has been shown to lower cholesterol (Morgan and Clayshulte 2000;Garg et al. 2003), reduce the incidence of coronary heart disease (Fraser et al. 1992), and lessen the impact of age-related brain dysfunction (Carey et al. 2012).Despite their potential health benefits, nut allergies are among the most common allergies in the United States (Bock et al. 2001), and consuming nuts may elicit serious and life-threatening immunological responses in people with food allergies (Teuber et al. 2003;Cianferoni and Muraro 2012).
Food allergies are a major topic of research in immunology and there are a number of immunological terms used throughout the paper that we have defined in a glossary (Appendix 1).Research on food allergies is complicated by the observation that proteins that have been identified as allergens in some species are not necessarily allergens in all species in which they are found.For example, vicilin protein has been identified as an allergen in peanut and it is also found in kiwi seeds, but patients with peanut allergies do not typically also have a kiwi allergy.However, a patient who exhibits an allergy to a protein in one plant species may also demonstrate crosssensitivity to another species (Appendix 1), presumably due to the similarity of proteins the two plants contain.For example, patients with tree nut allergies may also show cross-sensitivity to peanuts (Ewan 1996;Teuber et al. 2003).Patients who exhibit sensitivity to one species of nut are thus advised to avoid all tree nuts and peanuts (Ewan 1996); but, it is unknown if this level of caution is warranted, as nuts and edible seeds have evolved multiple times in the plant tree of life (Fig. 1).
Nut is a botanical term that describes a fruit containing a single seed, with a hard, dry, outer layer, and a special covering called a cupule (the "cap" of an acorn) (Brouk 1975;Harris and Harris 2001).Fruit type is not generally conserved within plant lineages, but true nuts are only found in the plant order Fagales, which contains commonly consumed tree nuts such as pecans, walnuts, and hazelnuts (Fig. 1).Many fruits colloquially referred to as nuts are actually other types of fruits (e.g., coconuts) or seeds that have a hard covering (e.g., Fig. 1.Evolutionary relationships of nuts and edible seeds on a summary tree of vascular plant evolution.Not all plant orders are shown; notably missing are most of the early-diverging eudicots and core eudicots.The true nuts are restricted to the Fagales order (acorns, chestnuts, hazelnuts, pecans, walnuts) and are indicated with an acorn.Plant orders with seeds that are colloquially called nuts are labeled with a gray oval.For example, peanuts are considered nuts (and one of their common names is "ground nuts"), but botanically-speaking they are the seed of a legume.Adapted from the Angiosperm Phylogeny Website and others (Li et al. 2004;Stevens 2001 onwards).
almonds, Brazil nuts, cashews, macadamia nuts, peanuts, pine nuts, pistachios).We use the term nut in the colloquial sense throughout this paper.Even when we consume a botanically true nut, we are only ingesting the seed contained within the nut.Edible seeds develop within many different types of fruits (Table 1) and may be only distantly related to true nut-bearing plants found in the order Fagales (Fig. 1).Edible seeds are found in all of the major lineages of seed plants, including gymnosperms (pine nuts), monocots (e.g., coconuts and grasses), and eudicots (e.g., almonds, Brazil nuts, peanuts).These three plant lineages last shared a recent common ancestor at least 300 million years ago (Stein et al. 2012;Magallon et al. 2013), approximately the same amount of time since the common ancestor of amphibians and mammals (Hedges 2009).The evolutionary distance between species of plants that produce edible seeds may decrease the likelihood that their seeds contain similar proteins, or that proteins shared between these species have retained sequence similarity.
The nuts most commonly consumed in the United States and Europe are almonds, cashews, hazelnuts, peanuts, pecans, and walnuts.Less frequently eaten are Brazil nuts, chestnuts, macadamia nuts, pine nuts, pistachios, and others (Table 1).These nuts contain thousands of different proteins (Clarke et al. 2000) and several of the proteins are potential allergens that are either plant defense proteins to protect against fungi, bacteria, viruses, and invertebrates, or seed storage proteins that provide nutrition for the germinating embryo (Table 2; Radauer and Breiteneder 2007).Proteins are chains of amino Table 1.Common nuts and edible seeds according to their classification in the plant tree of life.Fruit structure is not highly conserved in major plant lineages, but true nuts are only found in the Fagales.The table is not a comprehensive list of nuts and edible seeds and it is decidedly biased towards those eaten in the US and Europe.An asterisk next to the common name indicates there is a published genome or other genetic resource, such as a transcriptome, for this species.(Willison et al. 2008) acids that fold into three-dimensional structures.Antibodies in the human immune system can interact with short sections of the amino acid sequence called epitopes.A majority of the proteins in seeds can be grouped into four protein families: prolamins, cupins, profilins, and the Bet v 1-like family.The latter is a protein family named after the birch pollen allergen recovered from the white birch, Betula verrucosa Ehrh.Members of these protein families have different structural features that are described below (and reviewed in Breiteneder and Radauer 2004), but they are all highly resistant to water stress and to thermal and proteolytic denaturation, and these characteristics may contribute to their allergenicity.Members of several protein families have been implicated in allergic reactions to nuts; these include prolamins, seed storage proteins in the cupin superfamily (including legumins [11S globulins] and vicilins [7S or 8S globulins]), and profilins (Witke 2004).Prolamins are the major protein found in grains such as wheat and corn, and contain the lipid-transfer and 2S albumin proteins.The amino acid sequences of these proteins are highly variable, but they share a pattern of 6-8 conserved cysteine residues and a conserved three-dimensional structure (Mills et al. 2004).The prolamin family is only found in the land plants (flowering plants, gymnosperms, ferns, and mosses) and most plant proteins that are allergenic when ingested are members of the prolamin family (Allfam 2011).A number of the prolamin proteins in nuts are allergenic, including 2S albumin in Brazil nuts, cashews, hazelnuts, peanuts, pistachios, soybeans, and walnuts (also see other references listed in Table 2).Cupins are a diverse group of proteins found in bacteria, fungi, animals, and plants; these share one or more double-stranded cupin domains that have been described as "barrel-like" or resembling a "jelly-roll".Two distinct groups of cupin seed storage proteins are the legumins (11S globulins) and vicilins (7S or 8S globulins) (Allfam 2011) and these make up as much as 70% of the protein of some seeds (Bewley and Black 1994).Allergenic cupins have been identified in almonds, Brazil nuts, cashews, hazelnuts, peanuts, pistachios, soybeans, and walnuts (Allergen.org2014 and other references listed in Table 2).Profilins are a conserved group of proteins found in every eukaryote and in some viruses (Radauer and Breiteneder 2007) and they play key roles in cell movement and signaling (Witke 2004).Plant profilins are abundant in pollen and present in smaller amounts in other plant structures such as fruits and seeds (Radauer et al. 2006).Allergenic profilins have been identified in almonds, hazelnuts, and peanuts (Table 2).Lehrer et al. (2006) reviewed the properties of allergenic proteins to explore whether allergenicity is predictable based on sequence similarity.These authors and others have suggested that evolutionary relatedness of some edible seeds may play a part in allergen cross-sensitivity, although preliminary experiments and case studies suggest that IgE cross-reactivity can occur between distantly related species.We sought to systematically test whether distantly related edible seeds have similar amino acid sequences in allergenic seed storage proteins.If this were the case, it could explain the IgE cross-reactivity observed between some nuts and would allow physicians to predict which species a patient is most likely to show cross-sensitivity to, given a known allergy.

Plant Phylogeny
We reviewed evolutionary relationships of plant species that produce nuts and edible seeds from published studies and mapped them onto a summary phylogenetic tree of vascular plants (Soltis et al. 2000;Stevens 2001 onwards;Li et al. 2004).

Literature Review of Allergen Cross-Reactivity
A literature review was conducted to identify published evidence of serum and skin cross-reactivity between allergenic proteins derived from nuts and seeds.We first searched for allergen cross-reactivity studies in the Allergen.orgdatabase (2014), which led us to a number of key studies (Ewan 1996;Tariq et al. 1996;Moneret-Vautrin et al. 1998;Willison et al. 2008;Stutius et al. 2010).

Sequence Selection and Alignment
We focused on two of the major allergenic seed protein families, the cupin super family (vicilin and legumin) and prolamins (2S albumin), to test for convergent evolution by comparing gene trees of the protein sequences with estimates of organismal phylogeny.Three separate proteins were chosen for phylogenetic analysis-legumin (11S globulin), vicilin (2S globulin), and 2S albumin.Sequences were downloaded from Allergen.org,Uniprot, and Genbank by searching protein databases for "legumin," "vicilin," or "2S albumin," and by conducting BLAST searches.Our preliminary dataset included all available sequences from database searches.We removed sequences that did not appear to be orthologous in preliminary trees (they did not resolve within the angiosperms), and redundant sequences for clades that were over-represented in the sampling.Protein alignments of the trimmed dataset were generated with the program MUltiple Sequence Comparison by Log-Expectation (MUS-CLE v3.8.31, Edgar 2004;Jenkins et al. 2005).The alignments were then modified using the program Gblocks as implemented in SEAVIEW (Guoy et al. 2010) to remove hypervariable regions, with the following stringency settings: "Protein," "Allow small final blocks," "Allow gaps within blocks," "Allow less strict flanking regions."Then, duplicate sequences were removed by examining the alignment using the following criteria: sequences with the same genus/species identifier and the exact same protein sequence were culled so that only one remained; longer sequences were preferentially retained.

Model Selection and Gene Tree Analyses
Model selection for phylogenetic analyses was conducted separately on each protein alignment using the program    .We rooted the tree on the branch leading to the fern.ML bootstrap (BS) values (1000 replicates) are printed above branches except when branches had 100% ML BS.Nodes are collapsed if ML BS support was , 80%.A 5 vicilin protein allergenic in this species.The box outlines an unexpectedly close relationship (80% ML BS) between non-grass monocots (date palm, coconut, and oil palm [Elaeis]), kiwi, grape, sesame, citrus, pistachio, cashew (Sapindales), hazelnut (Fagales), and Rosales (almond and strawberry; 98% ML BS).ProtTest (Abascal et al. 2005), with the following settings: "Build BioNJ Tree," "LnL as model selection criteria," "Optimize tree topology, no."The best model available for maximum likelihood (ML) analysis in RAxML was selected based on the LnL score (Table 3).ML phylogenetic analyses were conducted separately on each protein alignment in RAxML (Miller et al. 2010;Stamatakis et al. 2012) using the following settings: "protein," "estimate proportion of invariable sites GTRGAMMA+I, yes," "use empirical base frequencies, no."A protein substitution matrix was chosen for each analysis based on ProtTest results (see above).1000 bootstrap (BS) replicates were used to estimate node support.Rooting with fern and gymnosperm outgroups was done in the software package FigTree (Bouckaert et al. 2014).Final trees were visualized and edited in TreeGraph2 (Sto ¨ver and Miller 2010) and Adobe Illustrator (Adobe Systems Incorporated, San Jose, California).
Gene trees and the best estimate of organismal phylogeny were compared to detect conflicting relationships that were supported by BS values greater than 80%.Incongruence between the gene tree and phylogeny was considered potential evidence for protein sequence convergence.Sequences involved in the conflicting relationships were checked to ensure they were correctly aligned and did not contain ambiguous bases.

Epitope Mapping
The MUSCLE alignment for each protein was annotated with experimentally verified epitopes.The vicilin annotation followed the mapping of IgE binding sites in peanut (Arachis) and used the numbering system of Shin et al. (1998).The legumin annotation followed the epitopes of pecan (Carya illinoinensis; Sharma et al. 2011) and homologous epitopes documented in the Structural Database of Allergenic Proteins (SDAP; Ivanciuc et al. 2003).The 2S albumin alignment was annotated with the epitope identified in walnut (Juglans regia; Robotham et al. 2002).

Plant Phylogeny
Fruit types are not generally conserved within plant orders, although there are some notable exceptions, including the true nuts.Many trees in the order Fagales produce nuts that can be described as indehiscent fruits with a hard covering and a single seed.The most familiar nut in temperate zones may be the acorn that is the fruit of oak trees (Quercus), which was a common food in North America only a few hundred years ago.Close relatives to the oak are chestnut (Castanea), hazelnut (Corylus), pecan (Carya), and walnut (Juglans).Evolutionary relationships among these species have been difficult to estimate (Manos and Steele 1997;Soltis et al. 2000;Ruiqi et al. 2002;Cook and Crisp 2005;Sauquet et al. 2012), but analyses of DNA sequence data are converging on the topology shown in Fig. 1 (Soltis et al. 2000;Stevens 2001 onwards;Li et al. 2004;Herbert et al. 2006).

Taxa and Sequences Sampled and Final Alignments
The complete listing of sampled taxa and protein GenBank accession numbers can be found in

Gene Trees
The 2S albumin maximum likelihood (ML) estimate was poorly resolved, but exhibits relationships congruent with our current knowledge of plant orders if we consider branches with bootstrap support (BS) .80 (Fig. 2).Sampled species appear to have 2S albumin sequences most closely related to the 2S albumin sequences of their close relatives or their placement is unresolved in our analysis.
Similar to the 2S albumin tree, relationships with BS .80 in the legumin tree (Fig. 3) reflect our current understanding of relationships of plant orders, with a minor exception.Poplar (Populus trichocarpa) is recovered outside the Malpighiales clade, and in our tree it is sister to pistachio (Sapindales) legumin with 99 ML BS support (Fig. 3,box) The majority of the relationships in the vicilin tree (Fig. 4) conform to our current understanding of plant species phylogeny, with some notable exceptions.The main Fagales clade contains a strongly supported (100 ML BS) clade of pecan + walnut with an unexpected sister relationship with castor oil plant (Ricinus communis), albeit with only moderate support (80 ML BS).Tomato (Solanum, Solanales) is unresolved in the vicilin tree, although we expected it would be sister with sesame (Sesamum, Lamiales) and closely related to kiwi (Actinidia, Ericales) based on current estimates of plant phylogeny.Instead, kiwi and sesame unexpectedly form a group with non-grass monocots (date palm [Phoenix], coconut [Cocos] and oil palm [Elaeis]), grape (Vitis), a wellsupported (100 ML BS) clade of citrus, pistachio, and cashew (Sapindales), almond and strawberry (Rosales; 98 ML BS), and hazelnut (Corylus,Fagales;80 ML BS;Fig. 4,box).Interestingly, hazelnut is sister to almond and strawberry with moderate support (84 ML BS), instead of forming a relationship with other Fagales in the dataset.

Cross-Reactivity is not Associated with Sequence Similarity
Here, we sought to systematically test whether similarity in the amino acid sequences of seed storage proteins is congruent with the cross-reactivity reported in the literature between distantly related nuts.We conducted independent phylogenetic analyses of three allergenic proteins and found little evidence for sequence convergence.In other words, we did not recover evidence that cross-reactive species that are distantly related have 2S albumin, vicilin, or legumin proteins with similar sequences.The trees we recovered generally lacked strong statistical support (BS values), but the majority of supported relationships do not contradict our current best estimate of evolutionary history and therefore do not suggest convergence between distantly-related allergenic proteins at the level of amino acid sequences.
Our results suggest that cross-reactivity reported in the literature may be difficult to predict and may be the result of factors unrelated to amino acid sequences, as other authors have suggested.Wang et al. (2002) aligned a newly identified vicilin gene in cashews (Ana o 1) with the peanut vicilin gene (Ara h 1).They found few conserved amino acids, even within IgE-binding epitopes, and hypothesized that protein sequence similarity is not necessarily predictive of allergenicity.We also found that the amino acid sequence was highly variable in areas of the alignment that have been identified as IgE-binding sites in some species (Fig. 5).Barre et al. (2005) showed that there is little structural homology between Brazil nut, peanut, pecan, or walnut 2S albumin proteins, but the authors did find a similar three-dimensional protein structure in walnut and pecan and these species are close relatives (Barre et al. 2005).Thus, factors such as structural homology may play a role in Table 5. Summary of nut cross-reactivity studies.This table illustrates how little is known about nut and seed cross-reactivity.The rectangles are shaded according to a scale so that a light rectangle indicates no or low evidence for some cross-reactivity and a dark rectangle indicates strong evidence for some cross-reactivity.Many combinations have never been tested, especially for infrequently eaten nuts and several studies have a sample size of N 5 1.Ewan (1996) tested 62 patients for cross-reactivity to six nuts: peanut, Brazil nut, almond, hazelnut, walnut, and cashew nut, but did not report the species involved in each treatment.Table based on references (Altenbach et al. 1987;Arshad et al. 1991;Fernandez et al. 1995;Tariq et al. 1996;Marinas et al. 1998;Moneret-Vautrin et al. 1998;Sutherland et al. 1999;Teuber and Peterson 1999;Teuber et al. 1999Teuber et al. , 2003;;Diaz-Perales et al. 2000;Bannon et al. 2001;Poltronieri et al. 2002;Wang et al. 2002;de Leon et al. 2003;Roux et al. 2003;Asero et al. 2004;Lerch et al. 2005;Crespo et al. 2006;Benito et al. 2007;Willison et al. 2008;Ahn et al. 2009;Breiteneder 2009;Garino et al. 2010;Allergen.org 2014).

VOLUME 33(2)
Tree Nut Allergens cross-reactivity, though additional comparisons are needed to investigate this hypothesis.

Pine Nuts and Ginkgo Nuts
Pine nuts and Ginkgo nuts are both gymnosperms, but their exact relationship to one another is not known (Mathews 2009;Ran et al. 2010;Yang et al. 2012).Pine nuts are the seeds of approximately 12 Pinus species that are eaten raw, roasted or in cooked dishes (Rosengarten 1984).The seed of the Ginkgo tree (Ginkgo biloba) is boiled, roasted, and salted before it is eaten.A single near-fatal reaction (Beyer et al. 1998) and four systemic reactions (Nielsen 1990;Roux et al. 1998) have been reported after pine nut ingestion.In one patient, pine nut showed cross-reactivity to almonds (Marinas et al. 1998; Table 5), but cross-reactivity to Ginkgo was not tested.As far as we are aware, Ginkgo nuts have only been tested for contact dermatitis (Lepoittevin et al. 1989).We sampled Ginkgo only for the legumin dataset.This species is currently not an important food source, but is commonly used as an herbal remedy.It would be interesting to study whether the same seed-storage proteins are present in these gymnosperms, which proteins are allergenic, and whether they are cross-reactive.

Palm Nuts
A number of seeds from palm species (monocots in the Arecaceae family) are roasted and eaten as nuts.These include the date palm (Phoenix dactylifera) and the ivory nut palm (Phytelephas aequatorialis) (Rosengarten 1984).Coconuts (Cocos nucifera, Arecaceae) are also considered nuts by some, and the coconut milk or meat that is eaten is the seed nutritive tissue.In two patients, cross-reactivity to tree nuts has been observed (Teuber and Peterson 1999).One patient who showed anaphylaxis to coconut also demonstrated crossreactivity to hazelnut (Nguyen et al. 2004).A retrospective chart review of 231 patients who underwent skin prick tests to determine sensitization demonstrated that coconut allergy was not more common in patients with sensitization or allergy to peanuts or tree nuts than in those without peanut or tree nut allergies (Stutius et al. 2010).

Peanuts, Soy, Almonds, Hemp Nuts
These nuts are in the rosid I clade of eudicot plants.We recovered a bean and pea (Fabales) clade in the legumin and vicilin trees, and the 2S albumin tree is congruent with a Fabales clade.Peanuts are the most widely tested species for cross-reactivity with other seeds (Table 5): peanuts exhibited 51% cross-reactivity with almonds, a close relative in the Rosales (Tariq et al. 1996;Moneret-Vautrin et al. 1998), 20.8% cross-reactivity with hazelnuts (Tariq et al. 1996;Moneret-Vautrin et al. 1998), and 17.6% cross-reactivity with English walnuts (Tariq et al. 1996;de Leon et al. 2003).They also exhibit similar cross-reactivity with more distant relatives such as Brazil nuts (28.4%;Moneret-Vautrin et al. 1998;de Leon et al. 2007), cashews (39.5%;Moneret-Vautrin et al. 1998;de Leon et al. 2003), pistachios (31%;Moneret-Vautrin et al. 1998), and sesame seeds (13%; Stutius et al. 2010).de Leon et al. (2007) tested sera from three subjects for IgE crossreactivity between a 2S albumin peanut extract and almond, Brazil nut, cashew, and hazelnut extracts and demonstrated a strong cross-reaction between peanut 2S albumin and roasted almond and a weaker interaction between peanut 2S albumin and raw Brazil nut.The allergens that have been identified in peanuts are vicilin, 2S albumin, and lipid-transfer proteins (Allergen.org2014).We sampled the vicilin and 2S albumin genes for peanut and found no evidence of amino acid sequence convergence between species, including closely-related species like mung bean and lentil, which are also allergenic for these proteins.The edible Tahitian chestnut or mape (Inocarpus fagifer) is also a close relative to peanuts, but we found no information about allergies to this species.
Almonds (Prunus dulcis, Rosaceae), hemp nuts (Cannabis sativa, Cannabaceae), and breadnuts (the common name for two distinct species, Artocarpus camansi in Polynesia and Brosimum alicastrum in the neotropics) are in the Rosales order.We recovered a small Rosales clade in all of the trees, but in the vicilin tree, almond and strawberry (both in the Rosaceae) were unexpectedly sister to hazelnut (Fagales) on a branch with moderate support (84 ML BS; Fig. 4).Vicilin has been identified as an allergen in hazelnut (Roux et al. 2003), but not in almond or strawberry.Hazelnut and almond exhibited cross-reactivity in 6 out of 21 patients (28.6%;Tariq et al. 1996;Poltronieri et al. 2002) and the same level of reactivity was found in those studies between almond and walnut, yet walnut is sister to pecan in our vicilin tree.Almonds are the most commonly consumed tree nut in the United States (Roux et al. 2003) and their identified allergens include 2S albumin (Poltronieri et al. 2002), legumin (Willison et al. 2011), lipid-transfer protein, andprofilin (Allergen.org 2014).Hemp nuts (Cannabis sativa, Rosales) were not sampled in our study, but lipid transfer proteins have been identified from the nuts and allergies to hemp nuts have been reported (Ebo et al. 2013).To our knowledge, breadnuts have not been tested for allergens.
In the vicilin tree (Fig. 4) the castor oil plant, Ricinus (Euphorbiaceae, Malpighiales), is sister to pecan and walnut (Fagales).Vicilin has been identified as an allergen in walnut, but not in pecan or castor oil plant.Salcedo et al. (2001) found cross-reactivity between rubber latex allergens (natural latex is derived from the rubber tree Hevea and both Hevea and Ricinus are in the Euphorbiaceae), but this was thought to have been due to the presence of profilin proteins.We did not include profilins in this study because their amino acid sequences are highly conserved across plants (Vieths et al. 2002;Radauer et al. 2006).
The tropical candlenut (or kukui, Aleurites moluccana) and kluwak nut (Pangium edule) are sister to the clade containing the Fagales (true nuts), the Rosales, and the Fabales.We found no sequence data or information about allergic reactions to these species.

Fagales
Acorns, beechnuts, butternuts, chestnuts, hazelnuts, hickory nuts, pecans, and walnuts are closely related fruits in the Fagales order.We recovered tree topologies that are congruent with the current best estimate of evolutionary history in the Fagales, except for the placement of hazelnut in the vicilin tree.Walnut and pecan are close relatives in the Juglandaceae family.Hazelnut is in the Betulaceae family, while chestnut, oak, and beechnut are close relatives in the Fagaceae family (Stevens 2001 onwards;Li et al. 2004).100% cross-reactivity has been reported between walnut and pecan (Teuber et al. 2003), walnut and hazelnut (Asero et al. 2004), and other nuts in this order (Bock and Atkins 1989;Ewan 1996;Sicherer et al. 1998;Teuber et al. 2003).These reports are usually restricted to a single patient with multiple allergies and it is unclear if the sensitivity is caused by cross-reactivity to a protein found in both nuts or to co-sensitivity.Additionally, Barre et al. (2005) found that the epitope sequences and overall protein structures are highly similar in walnut and pecan 2S albumin.Given how closely related these species are it seems reasonable to avoid eating any nuts or seeds from plants in the Fagales order if there are signs of allergy to one species.

Pistachios and Cashews
These are closely related species in the Anacardiaceae family (Sapindales, rosid II) and are also closely related to pilinut (Canarium ovatum, Burseraceae), lychee (Litchi chinensis, Sapindaceae), whose seed is sometimes eaten as a nut, and kola nut (Cola acuminata, Malvales).Poison ivy (Toxicodendron, Anacardiaceae) and mangos (Anacardiaceae) are also close relatives that commonly cause allergic reactions.A more distant relative, but also in the rosid II clade, is the water chestnut (Trapa natans, Myrtales).Water chestnuts were once a major source of starch for central Europeans (Rosengarten 1984).Species in the genus Trapa are now primarily eaten in China, Japan, and Korea (Rosengarten 1984), but we found no information about them as allergens.
Pistachio and cashew exhibit 76% cross-reactivity in serum analyses (Fernandez et al. 1995;de Leon et al. 2003), but they have not been tested for cross-reactivity with nuts other than peanuts.Vicilin, legumin, and 2S albumin have been identified as allergens in pistachio and cashew (Wang et al. 2002; Allergen.org2014) and we found a close relationship between their 2S albumin and vicilin protein sequences, but in the legumin tree pistachio is sister to poplar (Malpighiales, rosid I) and cashew is unresolved.It is not clear to us why the pistachio legumin sequence would be similar to poplar legumin, a common pollen allergen, but this relationship deserves further study.
Distant Relatives to All Other Nuts: Macadamia Nuts, Brazil Nuts, Sesame and Sunflower Seeds Macadamia nuts (M.integrifolia or M. tetraphylla) are in an early-diverging lineage of the eudicots (Proteales) and are only distantly related to other commonly eaten seeds.We sampled three forms of vicilin from macadamia and they formed a clade.There have been reported cases of allergic reaction after ingestion of macadamia nuts, but the allergenic protein has yet to be identified (Roux et al. 2003 and references therein).Studies involving a total of two patients found that both patients were cross-reactive to hazelnut and macadamia, but cross-reactivity did not extend to almond, Brazil nut, peanut or walnut (de Leon et al. 2003;Lerch et al. 2005).We found no relationship between hazelnut and macadamia vicilin sequences, but one of the other allergenic proteins in hazelnut may be responsible for the cross-reactivity and this relationship deserves more study.
Brazil nut (Bertholletia excelsa) is classified in the Ericales order of the asterids, along with paradise nuts (Lecythis zabucajo) and shea nuts (Vitellaria paradoxa), two species rarely eaten in the US or Europe.We sampled the allergenic Brazil nut 2S albumin and legumin sequences and Brazil nut was unresolved in both trees.Brazil nut proteins elicited a cross-reaction with peanuts in 28.4% of patients (Moneret-Vautrin et al. 1998;de Leon et al. 2007), with hazelnuts in 12.5% of patients (Tariq et al. 1996), with walnuts in a single patient (Arshad et al. 1991), with almonds in 6.7% of patients (Tariq et al. 1996), and didn't cross-react with macadamia in the single patient tested (Lerch et al. 2005) (Table 5).In most of these tests for cross-reactivity the patient sample size was quite low and we can offer no explanation for the observed pattern of cross-reactivity given the evolutionary distance between Ericales and these other nuts.
Sesame seeds (Sesamum indicum, Pedaliaceae) are one of the few edible seeds besides chia (Salvia spp.) in the mint order (Lamiales) of plants.Sesame seeds are mostly pressed for oil, but are also used in breads, sauces (mole), and pastes (tahini).Sesame allergens include vicilin, legumin, 2S albumin, and oleosin (Pastorello et al. 2001;Breiteneder 2009).We sampled several accessions of sesame 2S albumin, but its relationship to other species was unresolved (Fig. 2).Sesame is a major cause of food allergy in some countries (Dalal et al. 2002;Osborne et al. 2011) and may be increasingly common in the US (Zuidmeer et al. 2008).Stutius et al. (2010) found that only 9 out of 69 patients with peanut allergies were also sensitive to sesame (Table 5).
Sunflowers are in the Asterales order of plants and are a commonly eaten seed in the United States.Several reports have been published on sensitization and allergy to sunflower seeds (Noyes et al. 1979;Axelsson et al. 1994) and sensitivity seems to be provoked by birch or mugwort pollen (Vieths et al. 2002).Sunflower seed lipid transfer protein has been identified as an allergen (Yagami 2010), while profilin in this species has been shown to be non-allergenic (Asturias et al. 1998).In one case, a patient with sunflower seed allergy was not reactive to other plants in the Asterales (mugwort, ragweed, dandelion) or to nuts (almonds, Brazil nuts, peanuts; Yagami 2010).There are no commonly eaten edible seeds that are closely related to sunflowers.

Study Limitations
Our study sampled three proteins from the five known, common allergenic proteins (Table 2), and it is likely that researchers have not yet identified all of the allergenic proteins in nuts and seeds.Many proteins are present in small amounts and may be difficult to isolate and identify, but nextgeneration sequencing may quickly increase the pace of discovery of allergenic proteins.In particular, transcriptomes of nuts and seeds may yield protein sequences of potential allergens that will allow researchers a much larger database from which they can compare allergenic and non-allergenic proteins (Radauer et al. 2008).This will allow for more robust tests of proteins and protein characteristics (such as tertiary structure) that are shared by allergens.
We have referred to cross-reactivity assays such as skinprick tests and IgE binding tests as evidence for the immune system's reaction to a protein after sensitization by exposure to the protein from another source.However, these assays are problematic, in that their results "do not always correlate with clinical reactivity" (Burks et al. 2012).There are many possible explanations for why cross-reactivity assays are not robust, including that a person may be genetically predisposed to multiple allergies (Sicherer 2002;Morafo et al. 2003;Kulis et al. 2011) or sensitive to many foods (Latcham et al. 2003), without proteins from those foods exhibiting cross-reactivity.Food allergies seem to be complicated responses that are partly genetic, partly caused by exposure, and partly caused by the allergenic potential of certain proteins (reviewed in Berin and Sampson 2013).To our knowledge, there have not been nut and seed cross-reactivity studies such as skin prick tests conducted on large patient populations (100s of participants) to allow for statistical tests of correlation.Moneret-Vautin et al.'s (1998) work on peanut cross-reactivity with tree nuts is the exception, with 74 participants tested for cross-reactivity between peanut and five tree nut species, but cross-reactivity between the tree nut species was not tested.Additional crossreactivity studies are necessary in order to better understand this phenomenon and its underlying causes, if these causes are indeed unrelated to amino acid similarities.

EVOLUTIONARY MEDICINE: USING KNOWLEDGE OF PLANT RELATIONSHIPS TO INFORM ALLERGY STUDIES
In this study, we used phylogenetic analyses to test for similarity in allergenic proteins from edible nuts and seeds across seed-bearing plants.We demonstrated what a number of smaller, non-evolutionary studies have already suggested: that sequence data alone does not explain the multiple nut allergies experienced by some patients.Our data are consistent with the idea that there are structural or chemical features of allergenic proteins that sensitize the immune system and can cause cross-reactions; those features, however, are currently unidentified.Importantly, our method of testing for sequence similarity by comparing new evidence to plant species relationships can be extended to test structural, chemical, and other characters of the proteins.
With this in mind, future research on nut and seed allergens may want to consider an evolutionary framework to generate null hypotheses about cross-reactivity between edible seeds, as such an approach allows for hypothesis testing.For example, we can hypothesize that cross-reactivity may be more common between closely related species in the Fagales order, based on their close evolutionary relationship, and this hypothesis can be tested with in vitro tests between proteins extracted from Fagales nuts listed in Table 1 and other common nuts.If walnuts show higher cross-reactivity rates with distantly related Brazil nuts than their close relatives (as reported from a single patient, Table 3), then the structural and chemical characteristics of proteins in walnuts and Brazil nuts can be studied to identify shared characteristics.Additionally, plants of no culinary value that are closely related to species with common allergens can be studied to understand whether there are characteristics that predispose these proteins to be allergenic.Such an approach would require additional data on the prevalence of cross-reactivity to adequately test which nut species and proteins are cross-reactive at a statistically significant rate.Ultimately, a better understanding of the evolutionary history of plant seed proteins may shed light on the numerous, unanswered scientific questions regarding tree nut allergen cross-reactivity.We advocate that an evolutionary approach to such unanswered questions could be both novel and fruitful.

Fig. 2 .
Fig. 2. Estimate of phylogenetic relationships of 2S albumin protein sequences based on a maximum likelihood (ML) analysis in the program RAxML under a JTT + I + gamma model.One fern and five gymnosperm sequences serve as outgroups.ML bootstrap (BS) values (1000 replicates) are printed above branches except when branches had 100% ML BS.Nodes are collapsed if ML BS support was , 80.A 5 2S albumin protein allergenic in this species.

Fig. 3 .
Fig. 3. Estimate of phylogenetic relationships of legumin protein sequences based on a maximum likelihood (ML) analysis in the program RAxML under a LG + I + gamma model.We rooted the tree on the branch leading to (gymnosperms + ferns) + angiosperms.ML bootstrap (BS) values (1000 replicates) are printed above branches except when branches had 100% ML BS.Nodes are collapsed if ML BS support was , 80.A 5 legumin protein allergenic in this species.The box outlines an unexpectedly close relationship (99% ML BS) between pistachio and poplar (Populus trichocarpa).

Fig. 4 .
Fig. 4. Estimate of phylogenetic relationships of vicilin protein sequences based on a maximum likelihood (ML) analysis in the program RAxML under an LG + I + gamma model.The vicilin dataset contains one fern (Matteuccia), and four gymnosperms (three Pinales and one cycad).We rooted the tree on the branch leading to the fern.ML bootstrap (BS) values (1000 replicates) are printed above branches except when branches had 100% ML BS.Nodes are collapsed if ML BS support was , 80%.A 5 vicilin protein allergenic in this species.The box outlines an unexpectedly close relationship (80% ML BS) between non-grass monocots (date palm, coconut, and oil palm [Elaeis]), kiwi, grape, sesame, citrus, pistachio, cashew (Sapindales), hazelnut (Fagales), and Rosales (almond and strawberry; 98% ML BS).

Fig. 5 .
Fig. 5. High levels of amino acid variation in sequences homologous to known Ig-E binding epitopes in A) vicilin peptides, B) legumin, and C) 2S albumin.Black boxes outline sequences that have been experimentally verified as epitopes.The colored graph above the alignment and the shading of each amino acid reflect similarity across the alignment.

Table 2 .
Proteins that have been identified as allergens in edible nuts and seeds.

Table 3 .
Model selection for gene tree analyses.Model reported and used here represents the best-fit model for each protein that was available as a model choice in RAxML.

Table 4 .
Sequences included in the study were downloaded from Genbank (GB), the Protein Database (PDB) and Uniprot (UP).

Table 4 .
The 2S albumin analysis contained 53 taxa and 115 characters.The legumin analysis contained 67 taxa and 337 characters.The vicilin analysis contained 50 taxa and 316 characters.