Robust Inference of Monocot Deep Phylogeny Using an Expanded Multigene Plastid Data Set

We use multiple photosynthetic, chlororespiratory, and plastid translation apparatus loci and their associated noncoding regions (ca. 16 kb per taxon, prior to alignment) to make strongly supported inferences of the deep internal branches of monocot phylogeny. Most monocot relationships are robust (an average of ca. 91% bootstrap support per branch examined), including those poorly supported or unresolved in other studies. Our data strongly support a sister-group relationship between Asparagales and the commelinid monocots, the inclusion of the orchids in Asparagales, and the status of Petro-saviaceae as the sister group of all monocots except Acorus and Alismatales. The latter finding supports recognition of the order Petrosaviales. Also strongly supported is a placement of Petermannia disjunct from Colchicaceae (Liliales) and a sister-group relationship between Commelinales and Zingiberales. We highlight the remaining weak areas of monocot phylogeny, including the positions of Dioscoreales, Liliales, and Pandanales. Despite substantial variation in the overall rate of molecular evolution among lineages, inferred amounts of change among codon-position data partitions are correlated with each other across the monocot tree, consistent with low incongruence between these partitions. Cerato-phyllum and Chloranthaceae appear to have a destabilizing effect on the position of the monocots among other angiosperms; the issue of monocot placement in broader angiosperm phylogeny remains problematic.


INTRODUCTION
Available sources of phylogenetic data do not allow for well-supported inference of all of the deep branches of monocot phylogeny. The internal branches that have proven most resilient to well-supported resolution are often relatively short (e.g., Chase et al. 2000), including portions of the backbone linking those major clades defined as orders in the classification of the Angiosperm Phylogeny Group II [APG II] (2003), and a number of poorly resolved branches deep within each order. To address these and other problematic areas of higher-order monocot phylogeny, we collected new data for a large number of exemplar monocot taxa from an expanded range of regions in the plastid genome, including many that have not been examined intensively in the monocots. The plastid regions surveyed consist of portions of ten photosystem II genes, two NADH dehydrogenase subunit genes, three ribosomal protein genes, atpB, rbcL, and a diverse collection of noncoding regions that span these and other genes from this genome. Collectively, these represent ca. 16 kb of DNA sequence data per taxon, or about a ninth of the nonduplicated information in the plastid genome. Most of the regions were sequenced using primers developed for the inference of deep nodes of angiosperm phylogeny .
Discussions of strategies for large-scale phylogenetic inference are usually framed in terms of a trade off between taxon sampling vs. amount of data collected per taxon. Aiming for a relatively dense taxon sampling is generally a beneficial strategy for accurate inference of phylogenetic relationships among and within large groups of organisms (e.g., Hillis 1998;Pollock et al. 2002;Zwickl and Hillis 2002), and examining more characters per taxon can also be very useful (e.g., Graham et al. 1998;Soltis et al. 1998;Poe and Swofford 1999;Hillis et al. 2003). Both strategies have demonstrated their effectiveness in ongoing studies of monocot phylogeny (e.g., Chase et al. 1993Chase et al. , 1995aChase et al. , b, 2000Chase et al. , 2006Stevenson et al. 2000;Givnish et al. 2006). While it is relatively clear which areas of monocot phylogeny are more (or less) sturdy, the level of data sampling appropriate for tackling the remaining areas of uncertainty is not clear in advance (e.g., Hillis et al. 2003). Fortunately, the economic trade off between taxa and characters is becoming less limiting. With the development and application of methods for efficiently examining multiple genomic regions among distantly related taxa, we can collect and analyze substantially more data per taxon (and research dollar) than was possible at the time of the first two monocot conferences.
We therefore sampled a large number of characters for a broad sampling of monocot taxa. Our preliminary monocot study constitutes the largest to date, in terms of the amount of data examined per taxon. It also has a sufficient taxon density to represent all major branches of monocot phylogeny (excluding several achlorophyllous lineages), with several major clades examined at a taxon density comparable to other recent large studies. We report on the strong inferences that can be drawn from the current taxon sampling, highlight several remaining areas of uncertainty, and briefly review extensive among-lineage rate shifts in the context of the molecular evolutionary dynamics of different codon-position data partitions.

Taxonomic and Genomic Sampling
The taxa surveyed are an expansion of the set examined in Graham et al. (2000) and McPherson et al. (submitted). Our sampling includes representatives of all ten clades recognized as monocot orders and 60 of 93 clades recognized at the rank of family in the classification scheme of APG II (2003), with Petermanniaceae accepted as a distinct family (discussed below). This sampling includes Petrosaviaceae and Dasypogonaceae, two families unplaced to order in the APG II (2003) classification scheme. We use 25 "dicots" as outgroup taxa (24 from Graham et al. [2000] and Trimenia moorei (Oliv.) W. R. Philipson [Austrobaileyales: Trimeniaceae; NSW 433770, GenBank nos. AYl16652-AYI16659]). The exemplars were chosen to represent a broad phylogenetic diversity of families, guided by the classification of APG II (2003) and other large-scale studies. Details of the species used, sample provenance, and Gen-Bank accession information are presented in McPherson et al. (submitted) and in the Appendix. Ninety-four exemplar taxa are represented, 69 from the monocots. The latter number is an increase of 23 monocots from McPherson et al. (submitted).
Details of DNA extraction, amplification, and sequencing protocols are provided in Graham and Olmstead (2000) and McPherson et al. (submitted). The data set considered for phylogenetic analysis here spans 17 protein-coding plastid genes (atpB, ndhB, ndhF, psbB, psbC, psbD, psbE, psbF, psbH, psbJ, psbL, psbN, psbT, rbcL, rpl2, rps7, 3'-rps12), and includes six intergenic regions in two of the photosystem II gene clusters (psbE-psbF-psbL-psbJ and psbB-psbT-psbN-psbH), two intergenic regions in a cluster of genes spanning 3' -rps12, rps7 (two ribosomal small subunit genes) and ndhB, and three introns (one each in ndhB, rpl2 and 3'-rpsI2). Most monocots were sampled for a larger portion of ndhF (ca. 2.08 kb, representing most of the locus) than the outgroup taxa (ca. 1.29 kb from the relatively slowly evolving 5' -end of the gene). Three additional regions that are largely noncoding were also sampled for Asparagales and Liliales (and several other taxa; Appendix): an intergenic region between ndhB and trnL (CAA), an intergenic region between atpB and rbcL, and a contiguous region spanning trnL (UAA) and trnF (GAA) that includes an intron in the former gene and an intergenic spacer region. In a few cases closely related alternative taxa were sampled for the latter region (see Appendix). With these and a few other minor exceptions, all taxa were completely represented for all regions. Further details, including methods of data compilation, are provided in Graham and Olmstead (2000) and Mc-Pherson et al. (submitted) The genes examined include a mix of those involved in photosynthesis (atpB, rbcL, and the photosystem II genes), chlororespiration (the ndh genes) and plastid translation (the trn, rps, and rpl loci). Those situated in the plastid inverted repeat regions (rpI2, rps7, 3' -rps12, ndhB and associated noncoding regions) are exceptionally slowly evolving (e.g., Graham et al. 2000), and the single-copy genes include slowly and rapidly evolving protein-coding and noncoding regions. The ndhF locus, for example, includes slowly and rapidly evolving portions (Olmstead and Sweere 1994;Kim and Jansen 1995); the 3'-end of ndhF evolves substantially more rapidly (per nucleotide) than some of the single-copy noncoding regions examined here, such as the trnL-trnF region Saarela et al. in press). The characters examined represent most of the spectrum of plastid evolutionary rates.

Alignment and Phylogenetic Analysis
We added taxa to previously published alignments (Rai et al. 2003) using criteria set out in Graham et al. (2000). The unaligned total DNA sequence length obtained in monocot exemplars ranges from ca. 12.7 kb in Burmannia L. to ca. 17.9 kb in Coelogyne Lindl., with a mean unaligned length in monocots of ca. 15.9 kb. Some monocots (such as Burmannia) lack sequence data for one or more major regions (see Appendix). The combined alignment is nearly 32 kb in length (31,900 base pairs [BP]), approximately twice the unaligned length of any individual taxon. This size is a result of fairly extensive gaps and/or unalignable regions, including some in nonangiosperm taxa that were part of the overall alignment, but that were not considered in the current study. Those noncoding regions that were too difficult to align were set aside in staggered gapped regions, with the staggered elements largely restricted to single taxa. Because they are unique, such single-taxon elements are parsimony uninformative and have no influence on the analysis. This allowed us to avoid defining character exclusion sets in hard to align regions, a substantial undertaking for a matrix of this size. However, in a subset of cases, we were able to align small blocks of taxa within otherwise unalignable regions. Variable characters within these elements can therefore contribute to tree searching (for a comparable example, see Steane et al. 1999). All gap cells are treated as missing data. We did not attempt to score insertion/deletion (indel) events. Graham et al. (2000) provide an overview of the (relatively limited) utility of indels inferred from several plastid inverted-repeat noncoding regions for inference of deep angiosperm phylogeny.
In total, 5617 aligned sites are potentially parsimony in-VOLUME 22 Deep Monocot Phylogeny 5 formative in the full 94-taxon data set, or 4798 sites in the monocots alone. We analyzed the matrix using heuristic maximum-parsimony (MP) searches with PAUP* verso 4.0blO (Swofford 2002), using tree-bisection-reconnection (TBR) branch swapping, and 100 random-addition replicates. Branch support was estimated using bootstrapping (Felsenstein 1985) with 100 bootstrap replicates. No tree limits were set, and all other settings used were the default ones. Ceratophyllum L. and Chloranthaceae have variable and poorly supported placements in basal angiosperm phylogeny (e.g., Qiu et al. 1999;Graham and Olmstead 2000;Graham et al. 2000;Soltis et al. 2000;Hilu et al. 2003;Davis et al. 2004), possibly because of a long, undivided subtending branch (Ceratophyllum) and short "basal" branches in angiosperm phylogeny (Chloranthaceae; e.g., Fig. 3 in Graham et al. 2000). We therefore performed a separate bootstrap analyses with these two taxa excluded, to examine their influence on phylogenetic analysis. McPherson et al. (submitted) used simulated data sets based on several hypotheses of relationship for Aphyllanthes monspeliensis L., the sole member of Aphyllanthaceae (Asparagales), to demonstrate that there is substantial scope for systematic error in analyses that include this taxon, at least for the plastid regions examined here. Their analyses showed that a subset of hypotheses are recovered rarely when used as model trees, while others may be recovered frequently when they are not used to simulate data (i.e., high potential for type I and II errors). We therefore performed two basic sets of analyses with Aphyllanthes L. included or excluded from consideration (for more on this problematic taxon, see Fay et al. [2000]; Chase et al. 2006;Pires et al. 2006); Ceratophyllum and Chloranthaceae were included in both cases.
To examine whether the rate of molecular evolution shifts in parallel among different codon position classes across the monocot backbone, we set up character sets (CHARSETs) in PAUP* that classify each nucleotide in the protein-coding regions into one of the three codon positions (reading frames were defined with respect to Nicotiana tabacum L. and Ginkgo biloba L. sequences). The short overlap between psbD and psbC was ignored, as individual sites in this region have different codon positions in these two genes. Parsimony-inferred changes from each codon position were noted for each branch in monocot phylogeny, based on one of the mostparsimonious trees (see Fig. 2). We determined whether length estimates for the first two codon positions were correlated across branches with those for the third codon position, using JMP verso 4.0.1 (SAS Institute 2000).

RESULTS
Most aspects of outgroup relationship are discussed elsewhere Graham et al. 2000). Amborella trichopoda Baill. (Amborellaceae) was considered to be the sister group of all remaining angiosperms, following the current consensus result of recent large-scale analyses that included seed-plant outgroups (Mathews and Donoghue 1999;Parkinson et al. 1999;Qiu et al. 1999;Soltis et al. 1999;Soltis et al. 2000;Barkman et al. 2000;Graham and Olmstead 2000;Graham et al. 2000;Zanis et al. 2002;Hilu et al. 2003;Stefanovic et al. 2004). A recent result placing Amborella Baill. apart from the basal angio-sperm split (Goremykin et al. 2003(Goremykin et al. , 2004 is likely to be an artifact of extremely low taxon density, combined with exceptionally long branches in the exemplar taxa used by these authors for monocots, all of which are members of Poaceae Stefanovic et al. 2004;S. W. Graham unpubl. data).
We typically use family (or higher taxon) names to describe terminal taxa (for a rationale, see Chase et al. 2006). A detailed consideration of relationships recovered in Asparagales is provided in McPherson et al. (submitted); the results presented here for this order focus primarily on two taxa not considered previously for these plastid regions (Agapanthus africanus [Agapanthaceae] and Doryanthes palmeri [Doryanthaceae]). With Aphyllanthes excluded, most random-addition replicate searches found the same pair of trees ( Fig. 1, 2) for 96 of 100 random-addition replicates, with a tree length of 35,826 steps, consistency index (CI), including all sites, of 0.362, and retention index (RI) of 0.474. With Aphyllanthes included, a single tree was found (a portion is shown in Fig. 3) in 95 of 100 random-addition replicates, with length 36,276 steps (CI = 0.360; RI = 0.473). The latter tree is topologically equivalent to one of the former trees when Aphyllanthes is pruned from it. We considered well-supported ("strongly supported" or "robust") branches to have bootstrap support of ca. 95% and more, and poorly supported branches to have ca. 75% or less bootstrap support. By this criterion, most branches of monocot phylogeny are well supported by the current data ( Fig. 1). To simplify the presentation of the results, we focus initially on presenting those branches that are poorly to moderately supported, and then address the well-supported relationships. All bootstrap values below refer to analyses with Aphyllanthes excluded, and Ceratophyllum and Chloranthaceae included, unless otherwise stated.

Poorly to Moderately Supported Relationships
We infer the eudicots plus Ceratophyllum to be the sister group of the monocots, and Chloranthaceae to be the sister group of the magnoliids (see also Graham et al. 2000). Both placements have weak bootstrap support (BP) from bootstrap analysis (67% and 60% BP, respectively; A few deep branches of monocot phylogeny are not strongly supported. These include the precise order of splits at the base of Alismatales: Araceae are found to be the sister group to a clade of three sampled alismatid families (86% BP). The arrangement of Poales, Commelinales-Zingiberales, Arecales, and Dasypogonaceae at the base of the commelinid monocots is also unclear; the two branches supporting the relationship observed here each have <50% BP. -Relationships among the major angiosperm lineages. One of two most-parsimonious trees inferred from a large plastid data set (atpB, ndhB, ndhF, ten photosystem genes, rhcL, rpl2, rps7, 3'-rpsI2, and various introns and noncoding regions; see text). Numbers above branches are results of bootstrap analyses using all taxa; those below branches are results when Ceratophyllum and Chloranthaceae are excluded from consideration. Aphyllanthes monspeliensis is excluded from both analyses (but see Fig. 3).
Melanthiaceae are found to be the sister group to the other families of Liliales sampled here, but with only 39% BP. Of the two members of Asparagales new to this study, Agapanthaceae are clearly a member of a clade of three closely related families (Agapanthaceae, Alliaceae s.s., AmarylIidaceae). However, inferred interrelationships among these three taxa are poorly supported, with only 37% support for one of the arrangements shown here (Alliaceae-AmarylIi-daceae; Fig. lB). An alternative arrangement, Alliaceae-Agapanthaceae, is found on the other most parsimonious tree, with 56% BP (Fig. 3). We find Sparganiaceae-Typhaceae and Bromeliaceae to be successive sister groups of the remaining families of Poales, but with only 45% BP for this arrangement. However, these two lineages are strongly supported as emerging from the base of Poales (Fig. IB). An arrangement of Fiagellariaceae and Restionaceae as successive sister groups of the remaining sampled graminid taxa (Ecdeiocoleaceae and Poaceae) has moderately strong support (87% BP).

I Acorales
Fig. lB.-Phylogenetic relationships in the monocots. One of two most-parsimonious trees inferred from a large plastid data set (atpB, ndhB, ndhF, ten photosystem genes, rbcL, rpl2, rps7, 3'-rpsI2, and various introns and noncoding regions; see text). Numbers above branches are results of bootstrap analyses using all taxa. Aphyllanthes monspeliensis is excluded from the analyses (but see Fig. 3). An arrowhead indicates a branch not seen on both shortest trees (the alternative arrangement with bootstrap values is shown in Fig. 3). Order and family placements and designations follow APG II (2003), using the optional "bracketed" system in Asparagales (" 1" = Asparagaceae s.l.; "2" = Alliaceae s.l.; "3" = Xanthorrhoeaceae s.I.), except that Petermanniaceae are also recognized as a family (see text). Petrosaviaceae and Dasypogonaceae are currently unplaced to order. Tree length and fit statistics are provided in the main text.

Strongly Supported Relationships
The remaining relationships are nearly all strongly supported by bootstrap analysis (ca. 95% BP and higher; Fig.  1 B). Ignoring the depression in support for monocot monophyly when Ceratophyllum and Chloranthaceae are included (84% vs. 100%; Fig. lA), Acorus L. is robustly inferred to be the sister group of all other monocots. Alismatales and Petrosaviaceae are (respectively) the next successive sister groups of the remaining monocots. Asparagales and the commelinid monocots are well supported as sister taxa. All orders conslstmg of more than one family (sensu APG II 2003) are well supported at the taxon samplings used here. Several multi ordinal clades are also strongly supported, including Commelinales-Zingiberales, and the commelinid monocots as a whole (Arecales, Commelinales-Zingiberales, Poales, Dasypogonaceae).
Most relationships within orders are also robustly supported. The (partly) mycoheterotrophic family Burmanniaceae is strongly supported as the sister group of Dioscoreaceae (only these two members of Dioscoreales were sam-   (Aphyllanthaceae) is included in analysis ("1" = Asparagaceae s.1.; "2" = Alliaceae s.I.). Numbers above branches are bootstrap values with Aphyllanthes excluded, those below branches are bootstrap values with it included. The tree shown is a portion of the most-parsimonious tree found with Aphyllanthes included (see text); this tree is otherwise identical to one of the two trees found with Aphyllanthes excluded (the other is shown in Figs. 1, 2), with comparable bootstrap values for the rest of the tree. pled). Apart from the basal split in Liliales, all relationships in this order are well supported. Petermanniaceae are distinct from Colchicaceae, whose sister group is Alstroemeriaceae of taxa sampled. Within Alismatales, the inferred relationships of Alismataceae, Butomaceae, and Scheuchzeriaceae are all well supported (the former two are sister groups with respect to the taxa sampled). Within Pandanales, both internal branches are well supported; Velloziaceae are the sister group of the remaining Pandanales, and Pandanaceae and Cyclanthaceae are sister taxa. Pontederiaceae are the sister group of Haemodoraceae of families sampled in Commelinales. Apart from the base of Poales, relationships among the remaining taxa of Poales are almost all well supported. Cyperaceae-Xyridaceae are the sister group of Mayacaceae. This cyperid clade is the sister group of the graminid families. Ecdeiocoleaceae and Poaceae are sister groups with respect to the taxa included.
The position of Doryanthaceae (a member of Asparagales not sampled by McPherson et al. submitted) is quite well supported (92% BP). It is inferred to be the sister group of a large clade consisting of Iridaceae, Xeronemataceae, AIliaceae s.l., Asparagaceae s.l., and Xanthorrhoeaceae s.1. The latter three taxa (Alliaceae s.l., Asparagaceae s.l., and Xanthorrhoeaceae s.l.) are indicated with numerals in Fig. IB, and represent the more inclusive versions of these families in APG II (2003). Iridaceae are sister to a clade consisting of Xeronemataceae, Alliaceae s.l., Asparagaceae s.1. and Xanthorrhoeaceae s.1. A well-supported clade comprised of Ixioliriaceae and Tecophilaeaceae is the sister group to all of these taxa. Bootstrap support for this major part of the backbone of Asparagales [Asparagales base, «Ixioliriaceae-Tecophilaeaceae), (Doryanthaceae, (Iridaceae, (Xeronema-taceae, (Xanthorrhoeaceae s.l., (Asparagaceae s.I.-Alliaceae s.I.))))))] is thus robustly supported by our data.

Effect of Inclusion of Aphyllanthes monspeliensis (Aphyllanthaceae)
When Aphyllanthes is included in the analysis, it is resolved as the sister group of Agavaceae, but with poor support (50% BP, Fig. 3; see also McPherson et al. submitted). Its inclusion does not impinge on other inferred relationships, but it does moderately depress bootstrap values for five of ten branches (by 10-25%) in a local cluster of families corresponding to Asparagaceae s.l., plus Alliaceae s.1. (Fig. 3). There is strong support (100% BP) for the inclusion of Aphyllanthes within the major clade consisting of Alliaceae s.1. and Asparagaceae s.l., and bootstrap support outside this local cluster does not appear to be adversely affected by its inclusion (data not shown).

DISCUSSION
The plastid gene set considered here has been used for phylogenetic inference of other deep and difficult phylogenetic problems Rai et al. 2003) and provides strong support for a number of deep monocot relationships that were poorly supported or unresolved in earlier studies. Average bootstrap support for the monocot portion of the tree from our data is ca. 91% per branch, based on 65 internal branches involving 68 monocot taxa (Aphyllanthes excluded). Three-quarters of the internal monocot branches have at least 95% bootstrap support, and 90% have at least 70% bootstrap support. Only six monocot branches on either shortest tree have less than 50% bootstrap support (Fig. lB). Some of these improvements in support compared to earlier studies are paralleled in other recent studies, including the two-gene sampling of Tamura et al. (2004), and the multi gene studies of Chase et al. (2006) and Pires et al. (2006). A few clades are poorly or moderately well supported here, but are nonetheless congruent with other studies (e.g., the poorly supported position of Melanthiaceae relative to other sampled members of Liliales, which is consistent with that in Tamura et al. [2004], Givnish et al. [2006], and one of the analyses in Chase et al. [2006]).

Assessing the Strength of Support of Deep Monocot Relationships
Sampling error will tend to reduce estimated branch support when too few data are collected per taxon ("not enough characters examined"), or when too few characters define individual branches ("rapid radiations"). However, even when deep branches are short, simulation results using available models of DNA sequence evolution (e.g., Hillis 1998) indicate that maximum parsimony can yield very accurate reconstructions for relatively small amounts of data per taxon (a few thousand kb of DNA sequence data), provided a sufficiently dense taxon sampling is employed. Nonetheless, empirical analyses based on real DNA sequence data sets that are densely sampled and of this order of size (several kb long) are incompletely congruent with each other concerning major and minor points of relationship, and have numerous areas with weak statistical support (e.g., Davis et al. 2004;Tamura et al. 2004;Chase et al. 2006;Pires et al. 2006), or exclude many problematic taxa ). The existence of these uncertain areas (e.g., the composition or relative arrangements of Asparagales, Dioscoreales, Liliales, Pandanales, and the major commelinid lineages) provides a continuing impetus for expansion of the amount of data collected per taxon.
Empiricists do not have access to the correct tree of monocot phylogeny, but instead use a variety of statistical methods, such as bootstrap analysis, to assess the degree of confidence in phylogenetic inference (e.g., Hillis and Bull 1993;Felsenstein 2004). There is no clear consensus on what the cutoff should be for considering a clade to be strongly supported ("robust"), and there is some disagreement about why different methods provide different pictures of clade support. Bayesian phylogenetic inference, for example, is thought to suffer from inflated clade posterior probability estimates (e.g., Suzuki et al. 2002;Erixon et al. 2003;Lemmon and Moriarty 2004), and empirical and theoretical studies have demonstrated that bootstrap analysis can provide biased measures of support (e.g., Felsenstein and Kishino 1993;Hillis and Bull 1993). Small differences in clade support are also demonstrable for different resampling methods (jackknife, bootstrap), or different implementation strategies for these methods ). However, the numerical value that we accepted to indicate well-supported clades in bootstrap analysis (ca. 95%) may often be on the conservative side for bootstrap analysis (Felsenstein and Kishino 1993;Hillis and Bull 1993).
A caveat for bootstrap analysis (and other methods for statistical inference of branch support) is that there are conditions, such as the oft-cited phenomenon of "long-branch attraction," under which phylogenetic inferences can be misleading (Felsenstein 1978;Hendy and Penny 1989). Longbranch attraction has been invoked for the placement of several problematic monocot taxa (e.g., the positions of Ixioliriaceae and Aphyllanthaceae in Asparagales; Fay et al. 2000). Parametric phylogenetic methods such as maximum likelihood and Bayesian analysis have been found to be less prone to the distorting effects of long-branch attraction on phylogenetic analysis (e.g., Chang 1996;Swofford et al. 2001), unless there are discordant changes in evolutionary rates among characters ("heterotachy")-in which case, maximum parsimony may be more reliable (Kolaczkowski and Thornton 2004).
The overall rate of molecular evolution in monocot plastid genomes observed here is quite variable among lineages (Fig. 2), in line with previous studies (Wilson et al. 1990;Bousquet et al. 1992;Gaut et al. 1992Gaut et al. , 1996. The effect of this rate variation on phylogenetic analysis is unclear. However, we can at least rule out substantial heterotachy between two codon position partitions in our protein-coding regions-the first two vs. third codon positions. The former should predominantly reflect nonsynonymous substitutions, and the latter synonymous ones (e.g., Sanderson et al. 2000). We might therefore expect these to have different substitution dynamics among different lineages, based on theoretical considerations (Kimura 1968(Kimura , 1983Ohta 1992). In practice, however, the rate of synonymous and nonsynonymous substitutions are correlated in a broad variety of organisms (e.g., Sharp 1991;Wolfe and Sharp 1993;Akashi 1994;Gaut et al. 1996). We examined the amount of change in the first two vs. third codon positions, using this information as a rough proxy of non synonymous vs. synonymous changes. Despite inconstancy in the overall rate of molecular evolution (Fig. 2), changes in these two functionally defined codon-position classes are strongly correlated across the sampled branches of the monocot tree (r = 0.9036; P < 0.0001; Fig. 4).
We will not enter further into the debate about different measures of statistical support. However, a different framework for documenting the reliability of phylogenetic results is to demonstrate that analyses involving the same taxa for different genomic regions depict similar relationships (e.g., Penny et al. 1982). If there is instead well-supported incongruence among different genetic linkage groups, this may reflect deviations of particular gene trees from a consensus organismal pattern (e.g., Maddison 1997) or various phenomena that lead to strong systematic biases in the data (Naylor and Brown 1998). In general, any misleading effects of these on inferences of higher-order relationships using plastid data alone are expected to be rather small (Savolainen et al. 2002; see also Chase et al. 2006). However, determining whether this is indeed the case will at least require the availability of trees from multiple linkage groups (the plastid genome is a single linkage group), with each tree as well supported as possible.
The correlated change we observe among codon positions (Fig. 4)  Data points for each codon set represent individual branch length estimates, summed across relevant sites in protein-coding regions. All terminal and internal branches in monocot phylogeny were considered, including the branch immediately subtending the monocots (Fig. 2). Changes along each branch were computed using ACCTRAN optimization for the relevant character set. Chase et al. 2006;Pires et al. 2006). Where incongruent, the other studies generally have poor support for the conflicting relationship (see below for several exceptions). Using the phylogenetic data presented here, we have therefore come close to, but not yet fully attained, a phylogenetic backbone of monocot relationships that is well supported across all deep nodes. Other work (in progress) will address more of the details of this plastid framework by adding currently unsampled families.

Remaining Problematic Relationships in Monocot Phylogeny
A broad circumscription of the magnoliids ("eumagnoliids") in Soltis et al. (1999) and Soltis et al. (2000) includes the monocots, based on a weakly supported clade found in their analysis. In contrast, APG II (2003) was more wary, accepting only Canellales, Laurales, Magnoliales, and Piperales as magnoliids. Although a sister-group relationship between the magnoliids (in a narrow sense) and monocots cannot be ruled out from our bootstrap analysis, classifications that depend on this or other placements should be treated very cautiously. Duvall et al. (2006) use various optimality criteria to analyze four concatenated genes and find several placements of the monocots in angiosperm phylogeny; for example, their maximum parsimony analysis depicts the eudicots as the sister group of monocots plus Ceratophyllum (with weak support for these relationships), while their Bayesian analysis places the mono cots as the sister group of the magnoliids, with posterior probabilities of 0.97-1.00 for all relevant branches. Given the potential for Bayesian support values to be inflated or misleading (discussed above), the relatively uncertain position of the monocots observed in our analysis, and their variable position in other studies (e.g., Hilu et al. 2003;Davis et al. 2004;, the jury is thus still out on where the monocots belong in flowering-plant phylogeny. We also observed a moderately depressive effect on the strength of support for monocot monophyly with Ceratophyllaceae and Chloranthaceae included in analysis. Although straightforward to demonstrate, the cause of this intriguing phenomenon is not clear, although it is conceivably associated with whatever is causing the uncertain placement of these two families in all current angiosperm-wide studies.
' The monophyly of the monocots, and the position of Acorus as the sister group of all other extant monocots are both strongly supported here. Acorus has not been uniformly supported as the sister group of all monocots in all recent studies (e.g., Qiu et al. 1999), but when it is not, this generally is a function of poor statistical support in individual studies. However, Duvall and Ervin (2004) and Duvall et al. (2006), documented problems with the nuclear 18S rDNA locus for this taxon. A further exception to this uniform picture is a recent study by Davis et al. (2004) who examined monocot deep phylogeny using rbcL and the mitochondrial locus atpA. In combined analyses of these two genes, they found a strongly supported sister-group relationship between Acoraceae and a major clade of Alismatales consisting of all the taxa they sampled in this order, except Araceae and Tofieldiaceae. Their analysis of rbcL alone depicted Acorus as the sister group to all other monocots, but that of atpA alone depicted Acoraceae and associated alismatid taxa on very long branches, in a position nested deep in monocot phylogeny, as part of a small clade that included several taxa from Asparagales (Ixioliriaceae, Iridaceae, and a member of Agavaceae; see Fig. 4 in Davis et al. 2004, one of their mostparsimonious trees). This result, and the discordant placement of Acorus in their combined analyses, may be an artifact (see also . Very short branches are more prone to the effects of sampling error, which may consequently contribute to poor support for some deep internal branches in monocot phylogeny (e.g., Chase et al. 2000Chase et al. , 2006. However, some of the "short" branches referred to in earlier studies still are weakly supported in our expanded data sampling and in comparable recent studies (e.g., Chase et al. 2006), despite not being clearly different in inferred length from neighboring branches that have strong support (compare, for example, the lengths and support for branches subtending Asparagales vs. Dioscoreales-Pandanales; Fig. 1B, 2). Relative unevenness of branch lengths can potentially contribute to erroneous or unstable phylogenetic inference due to long-branch attraction (Felsenstein 1978;Hendy and Penny 1989). However, if there are any problematic long branches remaining in the current data set, their effect here may be primarily to destabilize local estimates of bootstrap support, rather than to lead to erroneous placement of the affected clades. Although it is not always possible to do so, additional taxon sampling may often (although not always; e.g., Rannala et al. 1998;Poe and Swofford 1999) help ameliorate the effects of long branches by breaking them up.
Nonetheless, disparity in rates of evolution may contribute to difficulties in inferring some relationships accurately. The elevated rate in the grasses commented on by other workers (e.g., Gaut et al. 1992) is evidently not unique to them within Poales (rate elevation compared to other monocots is evident from visual inspection of branches subtending Cyperaceae, Ecdeiocoleaceae, Mayacaceae, Poaceae, Restionaceae, and Xyridaceae; Fig. 2). However, the branches immediately subtending Bromeliaceae, Sparganiaceae-Typhaceae, and Flagellariaceae are short relative to other members of Poales (Fig. 2). This disparity might explain why the relative positions of Bromeliaceae and Sparganiaceae-Typhaceae are unclear with our current taxon sampling (Fig. 1B), and why the backbone relationships inferred with regards to Flagellariaceae, Restionaceae, and the other graminid families show moderately strong conflict with those inferred in Chase et al. (2006) (they see the reciprocal relative arrangement of Flagellariaceae and Restionaceae).
A few other higher-order groupings are in moderately strong disagreement with clades reported in Chase et al. (2006). These include the relationships among three major clades of Alismatales. We find Tofieldiaceae to be the sister group of a moderately well-supported clade consisting of Araceae and the remaining Alismatales. This result was also observed in some analyses of the three-gene data set of Chase et al. (2000), but conflicts with a well-supported linkage between Tofieldiaceae and alismatid families in Chase et al. (2006). In addition, we observe Mayacaceae to be the sister group of Cyperaceae and Xyridaceae of taxa sampled in the cyperid clade, with strong support. Chase et al. (2006) instead find moderate support for a closer relationship between Mayacaceae and Cyperaceae. Addressing such conflicts may require improved taxon sampling from the current data, work that is currently in progress. Fay et al. (2000) posited that addition of more data per taxon would be required to solve the problematic positions of Aphyllanthaceae and Ixioliriaceae in Asparagales, as both had labile positions in their analyses and both are relatively isolated taxa. Despite the addition of four more genes to the complement employed by Fay et al. (2000), the position of Aphyllanthes in the analysis of Chase et al. (2006) and Pires et al. (2006) is still labile and weakly supported. McPherson et al. (submitted) used simulation studies to demonstrate that inference of the phylogenetic position of Aphyllanthes in Asparagales has a high error rate. Uncertainty in the placement of Aphyllanthes in Asparagaceae s.l. may be a function of it being on a long terminal branch (e.g., Fay et al. 2000;Mc-Pherson et al. submitted). Its inclusion in analysis does not appear to affect the underlying relationships inferred for other taxa in this family, although it depresses bootstrap support values in the local clade that includes it (Asparagaceae s.l.-Alliaceae s.l.; Fig. 3).
We inferred strong support for a sister-group relationship between Ixioliriaceae and Tecophilaeaceae, a result that was seen with poor support in some trees inferred by Fay et al. (2000), and with moderate support in the analysis of Pires et al. (2006). This clade is contradicted by the four-and seven-gene analyses of Chase et al. (2006), and the two-gene analysis by Davis et al. (2004), who instead find strong to moderate support for a sister-group relationship between Ixioliriaceae and Iridaceae. Our analyses robustly resolved this midpoint of the Asparagales backbone (Fig. IB), but the strongly discordant positions of Ixioliriaceae among studies clearly require further attention. Further taxon sampling (to density levels comparable to Pires et al. 2006) among the relatives of Ixioliriaceae and Aphyllanthaceae may help clarify their phylogenetic status.
Unusual and disparate placements of Burmanniaceae were observed in early analyses of monocot rbcL data. Gaut et al. (1992) found Burmanniaceae nested in what is now referred to as the commelinid monocots, and Duvall et al. (1993) found it nested in Asparagales. Both placements may have been due to the long branch associated with this family, coupled with the relatively limited taxon sampling used in both studies. A relatively long branch also subtends our exemplar taxon from Burmanniaceae (Burmannia capitata; Fig. 2). We find strong bootstrap support for its position as the sister group of Dioscoreaceae, of taxa examined here (Fig. 1B). Some members of Burmanniaceae are completely mycoheterotrophic and achlorophyllous, including some taxa in Burmannia. Although photosynthesis has not been characterized physiologically in B. capitata, this species is chlorophyllous (Imhof 1999). It also has uninterrupted reading frames for all 16 protein-coding regions examined here (ndhF was not examined for this species), including all ten photo system II genes, atpB and rbcL. This suggests that these loci produce gene products that are functional in photosynthesis. Nonetheless, partial heterotrophy (if present in this taxon) may contribute to its relatively long subtending branch. Whatever the cause, this long branch could also result in misleadingly high bootstrap support. However, the position we inferred for Burmanniaceae (Fig. 1B) is congruent with other recent phylogenetic studies based on molecular and morphological data, which lends more credence to the idea that this taxon has been correctly placed among the deep branches of monocot phylogeny. Chase et al. (1995bChase et al. ( , 2000Chase et al. ( , 2006 and Caddick et al. (2002) found Burmanniaceae to be nested in a redefined Dioscoreales (APG 1998, APG II 2003, although in contrast to our study they found only poor support for this clade as a whole. Improved taxon sampling in Dioscoreales using the plastid regions sampled here should be valuable for further clarifying relationships among the constituent taxa in this order.

Contributions to Our Knowledge of Monocot Higher-Order Relationships
One of our most significant findings is the well-supported placement of Petrosaviaceae in monocot phylogeny. This family (represented here by Japonolirion osense; see Cameron et al. 2003) is strongly supported as the sister group of all monocots except Acorus and Alismatales, and is thus (apparently) the sole extant descendant of a very early split in monocot phylogeny. This supports the idea that the family should be recognized in its own order (Petrosaviales Takht.) in rank-based classifications (see also . The family's position was only partly resolved in the plastid-based study of Chase et al. (2000), but the moderate to strong support for the relationship observed here is also seen in the seven-gene sampling of Chase et al. (2006), and the twogene sampling of Tamura et al. (2004). However,  used only two outgroup taxa in total, and this sparse sampling may have inflated support levels for their basal monocot inferences.
Additional major findings include a well-supported relationship between Commelinales and Zingiberales (see also Tamura et al. 2004;Chase et al. 2006;Pires et al. 2006), and strong support for a sister-group relationship between Asparagales and the commelinid monocots. The latter relationship has been seen in several other studies with comparable taxon samplings, but with only poor support (e.g., Tamura et al. 2004;Chase et al. 2006). Our most densely sampled major clade is Asparagales. All "unbracketed" families of Asparagales (in the sense of APG II 2003) have now been sampled for the regions considered here, and most of their interrelationships are inferred with strong support (Fig. 1B). The spine of inferred relationships in Asparagales largely parallels other recent studies Chase et al. 2006;Givnish et al. 2006;Pires et al. 2006), but is generally better supported here. The orchids are well supported here as a member of Asparagales (see also Tamura et al. 2004;Chase et al. 2006), and their inclusion helps define the deepest nodes in that order (McPherson et al. submitted).
The position of Doryanthaceae (as the sister group of a large group of Asparagales that includes Alliaceae s.l., As-paragaceae s.l., Iridaceae, Xanthorrhoeaceae s.l., and Xeronemataceae; Fig. IB) is strongly supported in our analysis, and is also found with weak support in Fay et al. (2000) and Pires et al. (2006). Although their precise interrelationships are unclear, Agapanthaceae, Alliaceae S.S., and Amaryllidaceae are linked in a strongly supported clade (Alliaceae s.l.), a relationship seen by Meerow et al. (1999), but without strong support. Fay et al. (2000) found this relationship with strong support, although a reanalysis using standard parsimony by McPherson et al. (2004) found only moderate support using the same data. Pires et al. (2006) also find this clade with strong support, and find good support for Agapanthaceae as the sister group of a clade consisting of AIliaceae s.s. and Amaryllidaceae s.s. The larger clade (Alliaceae s.l.) is unperturbed by the inclusion of Aphyllanthes in our analyses, although its level of support is somewhat adversely affected (Fig. 3). Other aspects of Asparagales phylogeny inferred from our data are discussed in more detail in McPherson et al. (submitted).
Most other relationships that are strongly supported are also largely or completely congruent with the comparable taxonomic sampling by Chase et al. (2006). These include most relationships within Liliales, those among the four sampled families of Pandanales (Cyclanthaceae, Pandanaceae, Stemonaceae, and Velloziaceae) and most of the relationships within the commelinids, including relationships within Commelinales, and most relationships within Poales. Our sequences of Petermannia R Muell. (Petermanniaceae) were not derived from the misidentified sample included in Rudall et al. (2000); see Chase et al. (2006). The distinct position of Petermannia in Liliales could be dealt with in the APG system by recognition of Petermanniaceae (Fig. 1B), or perhaps by a substantial expansion in the circumscription of Co1chicaceae. Further inference of phylogenetic relationships within these orders will be addressed in more detail elsewhere using improved taxon sampling.
The phylogenetic status of Dasypogonaceae is still unclear, although our data and others (e.g., Chase et al. 2000Chase et al. , 2006Davis et al. 2004;Givnish et al. 2006) indicate that it is the sole extant representative of a lineage that diverged very early in the history of the commelinid monocots. As in other recent studies, we did not find strong support for any particular arrangement of this family and Arecales, Commelinales-Zingiberales, and Poales at the base of the commelinid clade (Fig. 1B). Various arrangements of these four lineages have been observed (e.g., Davis et al. 2004;Chase et al. 2006;Givnish et al. 2006). Tamura et al. (2004) found moderate support for a sister-group relationship between Poales and Commelinales-Zingiberales, but they did not sample Dasypogonaceae. The relationships observed here, with Dasypogonaceae as the sister group of Commelinales-Zingiberales, and Arecales as the sister group of Poales, have very poor support (38% and 33% BP, respectively). However, if correct, this scenario would either require elevation of Dasypogonaceae to ordinal status, or a substantial reworking of current ordinal boundaries in the commelinid monocots. A sister-group relationship between Commelinales and Zingiberales has strong support, so it would not be acceptable to sink Dasypogonaceae in either order. If Dasypogonaceae, Commelinales, and Zingiberales were instead combined in a single order, Commelinales is the name at the rank of order with the earliest use (see APG II 2003). Most other arrangements at the base of the commelinid monocots would also require recognition of Dasypogonaceae as a distinct order in the APG system of classification. However, sister-group relationships between Dasypogonaceae and Poales or Arecales have minor bootstrap support (36% BP for the former, 11 % BP for the latter), and if either relationship is eventually shown to be correct, Dasypogonaceae could be included in the respective order. While Dasypogonaceae are confirmed here to be part of a deep-diverging split in the commelinid monocots, it is clear that more data are needed to satisfactorily resolve the phylogenetic and taxonomic status of this problematic family.

Looking Forward
The data presented here should contribute to an ongoing renaissance in our understanding of monocot systematics and evolution, which was sparked by the morphological work of Rolf Dahlgren and colleagues (e.g., Dahlgren et a1. 1985) and further promoted by the first large-scale molecular studies (e.g., Chase et a1. 1993). Our data provide a more robust framework for making inferences about molecular and morphological evolution, and should help with the fine-tuning of higher-order monocot classification. The largely robust framework of monocot deep phylogeny presented here, which includes exemplar taxa from all major chlorophyllous clades and many of the most problematic taxa, demonstrates clearly the value of an expanded plastid genomic sampling, by yielding results that are both well supported and congruent with those inferred in other recent studies. We are continuing taxon sampling using the current genomic set in undersampled groups (such as Alismatales, Liliales, and Poales). We need to improve our understanding of relationships within each of these major clades, and to address the remaining weak nodes along the spine of monocot phylogeny (particularly the relative positions of Asparagales-commelinids, Dioscoreales, Liliales, and Pandanales). Improved taxon sampling outside the monocots may also help address the position of the monocots in the angiosperms. We expect that at least some of these problems will require collection of further data per taxon (from the plastid and/or other genomes), and likely substantially more data in a subset of cases (perhaps of the order of whole plastid genome sampling).
Our study demonstrates the benefits of continued expansion in plastid genome sampling for addressing unresolved problems in monocot deep phylogenetics. While there are obvious benefits to examining other genomic regions (e.g., for evidence of intergenomic incongruence, or for finding strongly supported placements of achlorophyllous mycoheterotrophic monocot groups such as Triuridaceae) and morphological characters (e.g., for finding characters that support new groupings implied by molecular data), the time has not yet come to decelerate sampling in the plastid genome for monocot systematics. Indeed, the rate of increase in plastid-based studies at all levels of plant phylogeny has not yet leveled off (e.g., Shaw et a1. 2005). This genome has proven to be a workhorse of modern monocot systematics, and we predict that it will likely remain so for the foreseeable future.