University of Iowa Iowa Research Online Theses and Dissertations 2011 The evolutionary history of meiotic genes: early origins by duplication and subsequent losses Arthur William Pightling University of Iowa Copyright 2011 Arthur Pightling This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/2960 Recommended Citation Pightling, Arthur William. "The evolutionary history of meiotic genes: early origins by duplication and subsequent losses." PhD (Doctor of Philosophy) thesis, University of Iowa, 2011. http://ir.uiowa.edu/etd/2960. Follow this and additional works at: http://ir.uiowa.edu/etd Part of the Biology Commons THE EVOLUTIONARY HISTORY OF MEIOTIC GENES: EARLY ORIGINS BY DUPLICATION AND SUBSEQUENT LOSSES by Arthur William Pightling An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Biology in the Graduate College of The University of Iowa May 2011 Thesis Supervisor: Associate Professor John M. Logsdon, Jr. 1 Meiosis is necessary for sexual reproduction in eukaryotes. Genetic recombination between non-sister homologous chromosomes is needed in most organisms for successful completion of the first meiotic division. Proteins that function during meiotic recombination have been studied extensively in model organisms. However, less is known about the evolution of these proteins, especially among protists. We searched the genomes of diverse eukaryotes, representing all currently recognized supergroups, for 26 genes encoding proteins important for different stages of interhomolog recombination. We also performed phylogenetic analyses to determine the evolutionary relationships of gene homologs. At least 23 of the genes tested (nine that are known to function only during meiosis in model organisms) are likely to have been present in the Last Eukaryotic Common Ancestor (LECA). These genes encode products that function during: i) synaptonemal complex formation; ii) interhomolog DNA strand exchange; iii) Holliday junction resolution; and iv) sister-chromatid cohesion. These data strongly suggest that the LECA was capable of these distinct and important functions during meiosis. We also determined that several genes whose products function during both mitosis and meiosis are paralogs of genes whose products are known to function only during meiosis. Therefore, these meiotic genes likely arose by duplication events that occurred prior to the LECA. The Rad51 protein catalyzes DNA strand exchange during both mitosis and meiosis, while Dmc1 catalyzes interhomolog DNA strand exchange only during meiosis. To study the evolution of these important proteins, we performed degenerate PCR and extensive nucleotide and protein sequence database searches to obtain data from representatives of all available eukaryotic supergroups. We also performed phylogenetic analyses on the Rad51 and Dmc1 protein sequence data obtained to evaluate their utility as phylogenetic markers. We determined that evolutionary relationships of five of the six currently recognized eukaryotic supergroups are supported with Bayesian phylogenetic analyses. Using this dataset, we also identified ten amino acid residues that are highly conserved among Rad51 and Dmc1 protein sequences and, therefore, are likely to confer protein-specific functions. Due to the distributions of these residues, they are likely to have been present in the Rad51 and Dmc1 proteins of the LECA. 2 To address an important issue with the gene inventory method of scientific inquiry, we developed a heuristic metric for determining whether apparent gene absences are due to limitations of the sequence search regimen or represent true losses of genes from genomes. We collected RNA polymerase I (Pol I), Replication Protein A (RPA), and DNA strand exchange (SE) sequence data from 47 diverse eukaryotes. We then compared the numbers of apparent absences to a single measure of protein sequence length and sequence conservation (SmithWaterman pairwise alignment (S-W) scores) obtained by comparing yeast and human protein sequence data. Using Poisson correlation regression to analyze the Pol I and RPA subunit datasets, we confirmed that S-W scores and apparent gene absences are correlated. We also determined that genes encoding products that are critical for interhomolog SE in model organisms (Rad52, Rad51, Dmc1, Rad54, and Rdh54) have been lost frequently during eukaryotic evolution. Saccharomyces cerevisiae null rad52, dmc1, rad54, and rdh54 mutant phenotypes are suppressed by rad51 overexpression or mutation. If rad51 overexpression or mutation affects other eukaryotes in a similar fashion, this phenomenon may account for frequent losses of genes whose products are critical for the completion of meiosis in model organisms. Finally, we place this work into greater context with a review of hypotheses for the selective forces and mechanisms that resulted in the origin of meiosis. The review and the data presented in this thesis provide the basis for a model of the origin of meiotic genes in which meiosis arose from mitosis by large-scale gene duplication, following a preadaptation that served to reduce increased numbers of chromosomes (from diploid to haploid) caused by erroneous eukaryotic cell-cell fusions. Abstract Approved: _______________________________ Thesis Supervisor _______________________________ Title and Department _______________________________ Date THE EVOLUTIONARY HISTORY OF MEIOTIC GENES: EARLY ORIGINS BY DUPLICATION AND SUBSEQUENT LOSSES by Arthur William Pightling A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Biology in the Graduate College of The University of Iowa May 2011 Thesis Supervisor: Associate Professor John M. Logsdon, Jr. Copyright by ARTHUR WILLIAM PIGHTLING 2011 All Rights Reserved Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL _______________________ PH.D. THESIS _______________ This is to certify that the PH.D. thesis of Arthur William Pightling has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Biology at the May 2011 graduation. Thesis Committee: ___________________________________ John M. Logsdon, Jr., Thesis Supervisor ___________________________________ Stephen D. Hendrix ___________________________________ Robert E. Malone ___________________________________ Bryant F. McAllister ___________________________________ Hallie J. Sims For my family ii ACKNOWLEDGMENTS I would like to express my gratitude towards the many people at the University of Iowa that have contributed to my growth as a scientist. I would like to thank my thesis advisor Dr. John Logsdon for his mentorship, time, unfettered access to his laboratory, and insightful feedback. I would especially like to thank him for allowing me to work on projects he conceived, while also extending me the freedom to develop concepts of my own. I am fortunate that he funded my participation in two workshops at which I learned new molecular phylogenetic and evolutionary analyses, in addition to techniques for collecting, identifying, and culturing eukaryotic microorganisms (protists). I am very grateful to the other members of my thesis advisory committee, Dr. Stephen Hendrix, Dr. Bryant McAllister, Dr. Robert Malone, and Dr. Hallie Sims for their invaluable time, patience, constructive criticism, and helpful advice. I would also like thank Dr. John Logsdon, Dr. Bryant McAllister, Dr. Josep Comeron, Dr. Ana Llopart, Dr. Jeff Klahn, and Dr. Maurine Neiman for the opportunity to teach the Evolution course with them nearly every semester in the past 6 years. My teaching experience has clearly contributed to my growth as an evolutionary biologist. I am thankful for the contributions of my collaborators to my projects. I would like to thank Matthew Brockman for his enthusiasm and unfailing extensive computer support, without which most of this thesis would not have been possible. Cindy Brochu, Abram Doval, Nicole Adams, Lauren Stefaniak, and Nevin Sebastian are thanked for their technical assistance and sequencing. I would also like to acknowledge former and current members of the Logsdon Lab for illustrative discussions. I would especially like to thank Dr. Banoo Malik for training me initially in the lab and with phylogenetic analyses, for helpful discussions, for helpful comments on chapters 2 and 3 of this thesis, and for encouraging me to collaborate with her. iii The research in this thesis would not have been possible without financial support from various sources. Funds for an honorable mention for the Sally Casanova Predoctoral Fellowship at the California State University enabled me to apply to the University of Iowa’s Ph.D. program, and purchase textbooks and phylogenetic software used in the initial years of my doctoral thesis projects to launch my research. During the summers of 2004 and 2005, the University of Iowa’s Avis Cone Graduate Summer Fellowships supported me while performing laboratory research, and in 2010 a University of Iowa Graduate College Graduate Summer Fellowship supported my bioinformatic research. In 2008 I received a travel award for participation in the Bodega Bay Applied Phylogenetics Workshop, and student travel awards from the International Society of Protistologists and International Society of Evolutionary Protistology for presenting my work as a talk at the Protist 2008 conference. I was also supported as a teaching assistant in fall 2004, fall and spring of 2006, 2007, 2008, 2009 and 2010 by the Biology Department at the University of Iowa. Otherwise my research has been supported by funding to my thesis advisor from the National Science Foundation, grants # MCB-0216702 and EF-0431117. iv ABSTRACT Meiosis is necessary for sexual reproduction in eukaryotes. Genetic recombination between non-sister homologous chromosomes is needed in most organisms for successful completion of the first meiotic division. Proteins that function during meiotic recombination have been studied extensively in model organisms. However, less is known about the evolution of these proteins, especially among protists. We searched the genomes of diverse eukaryotes, representing all currently recognized supergroups, for 26 genes encoding proteins important for different stages of interhomolog recombination. We also performed phylogenetic analyses to determine the evolutionary relationships of gene homologs. At least 23 of the genes tested (nine that are known to function only during meiosis in model organisms) are likely to have been present in the Last Eukaryotic Common Ancestor (LECA). These genes encode products that function during: i) synaptonemal complex formation; ii) interhomolog DNA strand exchange; iii) Holliday junction resolution; and iv) sister-chromatid cohesion. These data strongly suggest that the LECA was capable of these distinct and important functions during meiosis. We also determined that several genes whose products function during both mitosis and meiosis are paralogs of genes whose products are known to function only during meiosis. Therefore, these meiotic genes likely arose by duplication events that occurred prior to the LECA. The Rad51 protein catalyzes DNA strand exchange during both mitosis and meiosis, while Dmc1 catalyzes interhomolog DNA strand exchange only during meiosis. To study the evolution of these important proteins, we performed degenerate PCR and extensive nucleotide and protein sequence database searches to obtain data from representatives of all available eukaryotic supergroups. We also performed phylogenetic analyses on the Rad51 and Dmc1 protein sequence data obtained to evaluate their utility as phylogenetic markers. We determined that evolutionary relationships of five of the six v currently recognized eukaryotic supergroups are supported with Bayesian phylogenetic analyses. Using this dataset, we also identified ten amino acid residues that are highly conserved among Rad51 and Dmc1 protein sequences and, therefore, are likely to confer protein-specific functions. Due to the distributions of these residues, they are likely to have been present in the Rad51 and Dmc1 proteins of the LECA. To address an important issue with the gene inventory method of scientific inquiry, we developed a heuristic metric for determining whether apparent gene absences are due to limitations of the sequence search regimen or represent true losses of genes from genomes. We collected RNA polymerase I (Pol I), Replication Protein A (RPA), and DNA strand exchange (SE) sequence data from 47 diverse eukaryotes. We then compared the numbers of apparent absences to a single measure of protein sequence length and sequence conservation (Smith-Waterman pairwise alignment (S-W) scores) obtained by comparing yeast and human protein sequence data. Using Poisson correlation regression to analyze the Pol I and RPA subunit datasets, we confirmed that S-W scores and apparent gene absences are correlated. We also determined that genes encoding products that are critical for interhomolog SE in model organisms (Rad52, Rad51, Dmc1, Rad54, and Rdh54) have been lost frequently during eukaryotic evolution. Saccharomyces cerevisiae null rad52, dmc1, rad54, and rdh54 mutant phenotypes are suppressed by rad51 overexpression or mutation. If rad51 overexpression or mutation affects other eukaryotes in a similar fashion, this phenomenon may account for frequent losses of genes whose products are critical for the completion of meiosis in model organisms. Finally, we place this work into greater context with a review of hypotheses for the selective forces and mechanisms that resulted in the origin of meiosis. The review and the data presented in this thesis provide the basis for a model of the origin of meiotic genes in which meiosis arose from mitosis by large-scale gene duplication, following a vi preadaptation that served to reduce increased numbers of chromosomes (from diploid to haploid) caused by erroneous eukaryotic cell-cell fusions. vii TABLE OF CONTENTS LIST OF TABLES ...............................................................................................................x LIST OF FIGURES ........................................................................................................... xi CHAPTER 1. GENERAL INTRODUCTION ........................................................................1 The origin of eukaryotes ...................................................................................3 A comparison of mitotic and meiotic divisions ................................................6 The origin and evolution of meiotic genes .....................................................10 Components of meiotic interhomolog DNA strand exchange ........................14 Current state of the eukaryotic phylogeny ......................................................20 Summary .........................................................................................................24 2. A PAN-EUKARYOTIC INVENTORY OF DNA STRAND EXCHANGE COMPONENTS REVEALS PATTERNS OF CONSERVATION AND LOSS .....................................................................35 Abstract ...........................................................................................................35 Introduction.....................................................................................................36 Methods ..........................................................................................................39 Data acquisition .......................................................................................39 Phylogenetic analyses ..............................................................................41 Inventory assembly ..................................................................................41 Results and discussion ....................................................................................42 Limits of sequence detection and distribution of strand exchange genes among eukaryotes ..........................................................................44 Suppressors of strand exchange component mutant phenotypes in Saccharomyces cerevisiae .......................................................................49 Conclusions .............................................................................................51 3. PHYLOGENOMIC ANALYSIS OF RECA HOMOLOGS RAD51 AND DMC1 FROM ALL SUPERGROUPS PROVIDES EVIDENCE FOR MEIOSIS IN THE LAST COMMON ANCESTOR OF EUKARYOTES ..............................................................................................95 Background .....................................................................................................95 Results and discussion ....................................................................................99 Phylogenetic analysis of Dmc1 ...............................................................99 Phylogenetic analysis of Rad51.............................................................100 Phylogenetic analyses of Rad51 and Dmc1 ..........................................100 Characteristics of Rad51 and Dmc1 protein sequences ........................104 Conclusions ...........................................................................................106 Methods ........................................................................................................108 Database searches ..................................................................................108 Degenerate PCR ....................................................................................109 Phylogenetic analyses ............................................................................111 viii 4. MEIOSIS-SPECIFIC GENES AROSE BY DUPLICATION PRIOR TO THE LAST COMMON ANCESTOR OF EUKARYOTES ..................132 Abstract .........................................................................................................132 Introduction...................................................................................................133 Results and discussion ..................................................................................135 Distributions of meiotic genes ...............................................................135 Assessment of distributions ...................................................................138 Case study: the Spo11 genes..................................................................138 Conclusions ...........................................................................................139 Methods ........................................................................................................140 Database searches ..................................................................................140 Phylogenetic analyses ............................................................................142 Inventory assembly ................................................................................142 5. CONCLUDING REMARKS........................................................................173 Why meiosis?................................................................................................173 Meiosis arose from mitosis ...........................................................................180 A model for the evolution of meiotic DNA strand exchange genes .............184 A model for the origin of meiosis .................................................................187 Future directions ...........................................................................................193 REFERENCES ................................................................................................................198 ix LIST OF TABLES Table 2.1 DNA strand exchange component absences from eukaryotic groups ......................75 2.2 Protein sequence comparisons between Saccharomyces cerevisiae and Homo sapiens ......................................................................................................................83 2.3. Protein sequence comparisons between Homo sapiens and Oryza sativa. ...............84 2.4 Protein sequence comparisons between Oryza sativa and Saccharomyces cerevisiae ..................................................................................................................85 2.5 Functions of strand exchange protein with Saccharomyces cerevisiae null mutant phenotypes and suppressors ..........................................................................88 2.6 The most complete genomes of the genera searched during this study with web addresses ...........................................................................................................93 3.1 Support for eukaryotic supergroups and first order groups from phylogenetic analyses of Rad51, Dmc1, and concatenated protein sequence data ......................113 3.2 Degenerate primers and their positions ..................................................................130 3.3 Proposed functions of residues identified during this study ...................................131 4.1 Proteins involved in four general categories of meiosis and their functions ..........167 4.2 Observed numbers of sequence absences from 46 genomes, Smith-Waterman pairwise alignment scores, predicted numbers of absences, and the proportion of observed absences likely due to detection failures for 20 proteins that function during meiosis ..........................................................................................170 4.3 Genome sequence databases searched with web address and references ...............171 x LIST OF FIGURES Figure 1.1 Evolutionary relationships among prokaryotes, members of six currently recognized eukaryotic supergroups and Apusozoa according to multigene phylogenetic analyses ...............................................................................................27 1.2 The three-kingdom tree of life with relative order of major events during eukaryotic evolution .................................................................................................29 1.3 General schematic of mitosis and meiosis ................................................................30 1.4 General model of interhomolog DNA strand exchange during meiosis ...................32 1.5 A model for the origin of meiotic function by gene duplication ..............................34 2.1 Phylogenetic distribution among eukaryotes of DNA strand exchange genes .........55 2.2 Unrooted phylogenetic tree of 47 Replication Protein A – 1 (RPA1) homologs .....57 2.3 Unrooted phylogenetic tree of 42 Replication Protein A – 2 (RPA2) homologs .....58 2.4 Unrooted phylogenetic tree of 36 Replication Protein A – 3 (RPA3) homologs .....59 2.5 Unrooted phylogenetic tree of 44 Replication Protein A – 1 (RPA1) homologs .....60 2.6 Unrooted phylogenetic tree of 29 Rad52 homologs .................................................61 2.7 Unrooted phylogenetic tree of 46 Rad51 homologs .................................................62 2.8 Unrooted phylogenetic tree of 41 Rad51 homologs .................................................63 2.9 Unrooted phylogenetic tree of 34 Rad55 homologs .................................................64 2.10 Unrooted phylogenetic tree of 42 Rad57 homologs .................................................65 2.11 Unrooted phylogenetic tree of 38 Rad57 homologs .................................................66 2.12 Unrooted phylogenetic tree of 34 Dmc1 homologs ..................................................67 2.13 Unrooted phylogenetic tree of 38 Hop2 homologs ...................................................68 2.14 Unrooted phylogenetic tree of 41 Mnd1 homologs ..................................................69 2.15 Unrooted phylogenetic tree of 34 Mnd1 homologs ..................................................70 2.16 Unrooted phylogenetic tree of 29 Rdh54 homologs .................................................71 2.17 Unrooted phylogenetic tree of 34 Rad54 homologs .................................................72 2.18 Unrooted phylogenetic tree of 13 Rad59 homologs .................................................73 xi 2.19 Unrooted phylogenetic tree of 46 sets of 13 concatenated strand exchange homologs...................................................................................................................74 2.20 Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-A) from 54 diverse eukaryotes. ......................................................................................76 2.21 Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-B) from 54 diverse eukaryotes .......................................................................................77 2.22 Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-C) from 54 diverse eukaryotes .......................................................................................78 2.23 Multiple sequence alignment of RPA2 ssDNA binding domain (DBD-D) from 45 diverse eukaryotes .......................................................................................80 2.24 Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-F) from 54 diverse eukaryotes................................................................................................81 2.25 Multiple sequence alignment of RPA3 ssDNA binding domain (DBD-E) from 36 diverse eukaryotes................................................................................................82 2.26 Phylogenetic distribution among eukaryotes of RNA Polymerase I core complex subunit genes ..............................................................................................86 2.27 Number of detection failures for RNA polymerase I, RPA and SE proteins as predicted by Poisson regression analysis compared with observed numbers of detection failures .......................................................................................................88 3.1 Graphic representation of Rad51 or Dmc1 gene sequence fragments amplified with degenerate PCR from representatives of four eukaryotic supergroups and Apusozoa relative to Saccharomyces cerevisiae Rad51 protein sequence .............114 3.2 Unrooted phylogenetic tree of 47 Dmc1 homologs ................................................115 3.3 Unrooted phylogenetic tree of 47 Dmc1 homologs with accession numbers.........116 3.4 Unrooted phylogenetic tree of 54 Dmc1 and RadA homologs with accession numbers...................................................................................................................117 3.5 Unrooted phylogenetic tree of 105 Rad51 and Dmc1 homologs............................118 3.6 Unrooted phylogenetic tree of 112 Rad51, Dmc1, and RadA homologs ...............119 3.7 Unrooted phylogenetic tree of 157 Rad51, Dmc1 and RadA homologs ................121 3.8 Unrooted phylogenetic tree of 52 Rad51 homologs ...............................................122 3.9 Unrooted phylogenetic tree of 58 Rad51 homologs with accession numbers ........123 3.10 Unrooted phylogenetic tree of 65 Rad51 and RadA homologs with accession numbers...................................................................................................................124 3.11 Unrooted phylogenetic tree of 40 Concatenated Rad51 and Dmc1 homologs .......125 xii 3.12 Unrooted phylogenetic tree of 40 Concatenated Rad51 and Dmc1 homologs with accession numbers (Dmc1/Rad51) .................................................................126 3.13 Protein sequence alignment of prokaryotic and eukaryotic recA orthologs with amino acids conserved among 158 protein sequences indicated ....................127 3.14 p-distance matrix of prokaryotic and eukaryotic recA orthologs ...........................129 4.1 Distribution of 20 homologs that function during meiosis among 46 eukaryotes representing all eukaryotic supergroups ...............................................143 4.2 Presence of 20 homologs that function during meiosis in the last eukaryotic common ancestor (LECA) inferred by their distribution among eukaryotic supergroups .............................................................................................................145 4.3 Unrooted phylogenetic tree of 50 eukaryotic Hop1 and Rev7 homologs...............146 4.4 Unrooted phylogenetic tree of 49 eukaryotic Rad21 and Rec8 homologs .............147 4.5 Unrooted phylogenetic tree of 69 eukaryotic Spo11-1, Spo11-2, and Spo11-3 homologs with 6 archaebacterial Top6A homologs ...............................................149 4.6 Unrooted phylogenetic tree of 69 eukaryotic Spo11-1, Spo11-2, and Spo11-3 homologs.................................................................................................................151 4.7 Unrooted phylogenetic tree of 81 eukaryotic Rad51 and Dmc1 homologs with 6 archaebacterial RadA homologs ..........................................................................152 4.8 Unrooted phylogenetic tree of 81 eukaryotic Rad51 and Dmc1 homologs with....154 4.9 Unrooted phylogenetic tree of 82 eukaryotic Hop2 and Mnd1 homologs .............155 4.10 Unrooted phylogenetic tree of 131 eukaryotic Mlh1, Mlh2, Mlh3, and Pms1 homologs with 4 archaebacterial MutL homologs .................................................156 4.11 Unrooted phylogenetic tree of 131 eukaryotic Mlh1, Mlh2, Mlh3, and Pms1 homologs.................................................................................................................158 4.12 Unrooted phylogenetic tree of 113 eukaryotic Mer3, Brr2, and Slh1 homologs with 6 archaebacterial Ski2 homologs ....................................................................159 4.13 Unrooted phylogenetic tree of 113 eukaryotic Mer3, Brr2, and Slh1 homologs....161 4.14 Unrooted phylogenetic tree of 183 eukaryotic Msh2, Msh3, Msh4, Msh5, and Msh6 homologs with 5 archaebacterial MutS homologs........................................163 4.15 Unrooted phylogenetic tree of 183 eukaryotic Msh2, Msh3, Msh4, Msh5, and Msh6 homologs ......................................................................................................165 4.16 Radial tree topologies of archaebacterial and eukaryotic homologs ......................166 4.17 Number of detection failures as predicted by Poisson regression analysis or RNA polymerase I and Replication Protein A subunits with observed numbers of detection failures for 18 meiotic genes ................................................169 xiii 5.1 General model for the evolution of DNA strand exchange genes ..........................195 5.2 Alignment of conserved Rad51 and Dmc1 residues ...............................................196 5.3 Model for mitotic ploidy reduction in ancestral eukaryotes ...................................197 xiv PREFACE This thesis describes research on the origins and evolution of eukaryotic gene homologs involved in different stages of meiosis. Chapters 2, 3 and 4 of this thesis are written in the form of manuscripts intended for submission to peer-reviewed journals for publication in the very near future. The chapters are formatted according to the requirements of each journal. The exception is the references, which are formatted consistently throughout in the style of Molecular Biology and Evolution. When referring to gene names italics are used, the first letter is uppercase for archaebacterial and eukaryotic genes (e.g. RadA and Rad51), genetic mutants of genes are presented in lower case (e.g. radA and rad51), and proteins are presented in normal font with the first letter in uppercase (e.g. RadA and Rad51). The research presented in this thesis builds upon a now published phylogenomic study of 29 genes involved in meiosis in 5 of 6 currently recognized supergroups of eukaryotes (Malik et al. 2008), which itself is not included in this thesis. Dr. Banoo Malik was the primary author; I was the second author, followed by Lauren Stefaniak, Dr. Andrew Schurko, and Dr. John Logsdon. I cloned and sequenced Trichomonas vaginalis mutL homologs during my laboratory rotation project, and later analyzed homologs of the Rad52, recA and mutL gene families, contributed 40% of Figure S1 and Tables S1.1, S1.2 and S1.3, and helped revise the manuscript. Banoo advised me on the laboratory and phylogenetic techniques used in my contribution to this publication. Other specific contributions and acknowledgements for this project are detailed in the publication. The entire content of this thesis was initiated following from this project, or motivated by discussions with my advisor, Dr. John Logsdon, who provided advice and technical expertise throughout, with additional feedback over the years from former and current members of the Logsdon lab, as well as members of my supervisory committee. xv Chapter 2 describes a bioinformatic study and is organized as a manuscript for submission to the journal Molecular Biology and Evolution. This chapter details the phylogenomic distributions of 13 genes encoding proteins that catalyze DNA strand exchange during interhomolog recombination among 47 diverse genera. This project builds upon the previously published study (Malik et al. 2008) by further investigating the distributions of Rad51, Dmc1, Rad52, Hop2, Mnd1 and other “strand exchange” genes among 34 diverse eukaryotes. These data were also used to explore a heuristic metric for determining the limits of sequence detection versus bona fide gene loss. I began developing this project in 2006 during the “Writing in the Natural Sciences” graduate course offered by Dr. Stephen Hendrix, with feedback on the initial draft manuscript offered by Dr. Hendrix and my classmates Rebecca Hart-Schmidt, Mike Peglar, Banoo Malik, and Min Wu. The analyses in the current version of the manuscript really took shape immediately before, during and after my December 2009 two-week visit to New York to accompany my wife during her surgery, when I was a guest in Dr. Jane Carlton’s laboratory at New York University’s Department of Medical Parasitology. Helpful discussions with Dr. Malik, Dr. Carlton and Dr. Steven Sullivan led me to devise my criteria for selecting the genomes scrutinized in this chapter, and led me to further utilize NCBI’s BLAST tools to search local databases (that I built myself on my own computer) by PSI-tBLASTn and HMMer. I conceived the project, performed all the analyses, and am the primary author of the manuscript. My advisor, Dr. John Logsdon, is the senior author, provided advice on the research design and implementation, and revised the manuscript. My thesis committee members provided helpful comments throughout the project, including some time-consuming detailed technical suggestions and advice on statistical analyses of regression provided by Dr. Bryant McAllister and Dr. Stephen Hendrix. Chapter 3 is organized as a manuscript for submission to the journal BioMednet Central – Evolutionary Biology. I am the primary author, followed by Rebecca Hernan, xvi Dr. Nidhi Sahni and Dr. John Logsdon. The project builds further upon my advisor’s evolutionary analyses of Rad51 and Dmc1 protein sequences from animals, plants, and fungi initially published by Stassen, et al. (1997), and Dr. Logsdon’s unpublished work on protist Rad51 and Dmc1 genes conceived and begun during his own postdoctoral research. This chapter reports bioinformatic analyses of Rad51 and Dmc1 sequence data obtained from searches of public gene and genome sequence databases and with help of my co-authors by degenerate PCR experiments in the laboratory. I amplified and cloned 69% of the reported degenerate PCR products, oversaw the laboratory research of my undergraduate assistant, Rebecca Hernan, and Dr. Nidhi Sahni during her laboratory rotation, and I searched public databases, performed phylogenetic analyses and wrote the manuscript. Nidhi amplified and cloned 14% of the reported degenerate PCR products, and Rebecca amplified and cloned 14% of the degenerate PCR products. Nevin Sebastian amplified and cloned 3% of the degenerate PCR products, overseen by Dr. Andrew Schurko. Degenerate PCR products that I isolated for several organisms were superseded by the public release of genome sequence data, and so these sequenced PCR products are excluded from the chapter. DNA samples were obtained by collaboration with Jeff Cole and Dr. Robert Molestina at the American Type Culture Collection (ATCC, Manassas, VA) and Dr. Laura Katz and her assistant Jessica Grant (Smith College, Northampton MA). Research assistants Cindy Brochu, Abram Doval, Nicole Adams, Lauren Stefaniak and Nevin Sebastian sequenced the clones. Dr. John Logsdon conceived and initiated the project, developed the initial set of degenerate PCR primers, advised on degenerate PCR strategies and phylogenetic analysis, provided helpful discussion of research design and implementation and revised the manuscript. In Chapter 4, a phylogenomic analysis of the distribution among 46 diverse eukaryotes of 20 genes whose products function during meiosis in model organisms is presented. The chapter is organized as a manuscript for the journal Molecular Biology and Evolution. It represents the culmination of the studies presented here and follows xvii from Dr. Banoo Malik’s doctoral research while she was in the Logsdon lab (Malik et al. 2008). I am the primary author of this chapter; Dr. Malik (now at Dalhousie University) is co-primary author, followed by Dr. John Archibald (Dalhousie University) and Dr. John Logsdon. Banoo’s thesis indicated that several genes encoding proteins that are known to function only during meiosis in model animals, fungi and plants actually arose early during eukaryotic evolution by gene duplication. I have expanded the taxonomic sampling to include more putatively basal lineages in the diverse eukaryotic groups, I learned and made use of several new applications for phylogenetic analysis and gene sequence search methods, and I wrote the manuscript. Banoo identified meiotic gene models in Bigelowiella natans, provided her initial multiple sequence alignments of meiotic proteins from 2008, helped with taxon selection and in considering key discussion points. B. natans is the first sequenced representative of the Rhizaria, the only eukaryotic supergroup for which we lacked genetic information in our previous phylogenomic analyses. Dr. John Archibald, his co-investigators (Dr. M.W. Gray, Dr. G.I. McFadden, Dr. P.J. Keeling, and Dr. C. Lane), and the Joint Genome Institute provided access to their data for the first Rhizarian genome sequence (of Bigelowiella natans) prior to its public release. Dr. John Logsdon advised Banoo and I on the research design and implementation, provided helpful discussion, and revised the manuscript. My thesis committee members, Dr. Logsdon and Dr. Malik all provided helpful comments or discussion for Chapters 1 and 5. xviii 1 CHAPTER 1 GENERAL INTRODUCTION All known extant eukaryotes descended from an ancestor (Darwin 1859) that lived approximately 2.1 - 2.7 billion years ago, according to geochemical and fossil evidence (Han and Runnegar 1992; Brocks et al. 1999). Based upon the distributions of traits among eukaryotes, the last common ancestor was most likely a free-living, unicellular eukaryote that occupied moderate (mesophilic), aerobic environments and obtained nutrients by engulfing other organisms (phagocytosis) (Cavalier-Smith 2002a). Today, a wide variety of unicellular and multicellular eukaryotes are observed that live in diverse habitats (e.g. aerobic, anaerobic, extremophilic, and mesophilic) and fulfill many different lifestyles (e.g. symbiotic, free-living, sexual, and asexual) (Knoll 2003; Adl et al. 2005). Remarkably, all extant eukaryotic lineages began their evolutionary journeys with the same genetic material (i.e. a common ancestral genome) that was subsequently shaped by random genetic mutations (Watson and Crick 1953) and natural selection (Darwin 1859). However, which genes were present within that ancestral genome and how those genes subsequently evolved are open questions. Elucidating the origins of genes that encode products responsible for important biological processes provides a means of comparing extant eukaryotes to their ancestors (Villeneuve and Hillers 2001). Although direct observation of the ancestor of extant eukaryotes is obviously impossible, inferring which genes were likely to have been present within its genome is possible (Dacks and Doolittle 2001). By comparing inferred suites of genes in the last common ancestor of eukaryotes to the suites of genes present in extant eukaryotes we can study the evolutionary histories of the genes themselves. In this way, we can gain insight into the origins and evolution of important biological reactions and we can begin to establish the order of events that occurred during the early evolution of eukaryotes (Roger 1999). 2 An approach to studying the origin and evolution of genes is to search for them within the genomes of diverse organisms (Dacks and Doolittle 2001). Among eukaryotes, animals, fungi, and plants are estimated to represent the global majority of named species (Fenchel and Finlay 2004). Currently six eukaryotic groups (supergroups) have been proposed on the basis of ultrastructural, genetic, and phylogenetic analyses (Figure 1.1) (Cavalier-Smith 2004; Baldauf 2008). Animals, fungi, and plants occupy only two eukaryotic supergroups (Opisthokonta and Archaeplastida), while protists (eukaryotic organisms with unicellular, colonial, filamentous, or parenchymatous organization that lack vegetative tissue differentiation, except for reproduction (Adl et al. 2005)) are present in all six eukaryotic supergroups and are the predominant or sole occupants of four of them (Amoebozoa, Chromalveolata, Excavata, and Rhizaria; see Current state of the eukaryotic phylogeny below) (Adl et al. 2005; Adl et al. 2007). Therefore, including diverse protists in evolutionary studies is important in order to sample the full breadth of eukaryotes (Ramesh, Malik, and Logsdon 2005). The presence of orthologs (genes inherited from common ancestors) (Ridley 2004) among groups of eukaryotes implies that those genes were present in their last common ancestor (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005). If genes are detected in the genomes of representatives of all known eukaryotic groups then they are inferred to have been present in the last common ancestor of all known eukaryotes (Koonin 2010). Apparent absences of particular genes from the genomes of eukaryotes may be observed if either the gene arose later during eukaryotic evolution (after the evolutionary divergence of lineages from other eukaryotes) or it was subsequently lost (Dacks and Doolittle 2001). The interpretation of apparently missing genes depends upon our current understanding of the evolutionary relationships among eukaryotes (i.e. the eukaryotic phylogeny). 3 The origin of eukaryotes All living organisms share characteristics which indicate they arose from a common cellular ancestor (Darwin 1859) or a population of ancestral cells promiscuously exchanging molecules (Doolittle et al. 2008). To name a few, all living things are cellular, using ATP for energy, DNA as the hereditary genetic material, a common genetic code (for the most part), and similar transcription and translation machinery including RNA (Griffiths et al. 2000; Knoll 2003). The phylogenetic tree of life, which depicts the genealogical relationships of all living organisms, is composed of three major branches (domains) occupied by eubacteria (Bacteria), archaebacteria (Archaea), and eukaryotes (Eucarya) (Figure 1.2) (Woese and Fox 1977; Woese, Kandler, and Wheelis 1990; Brown and Doolittle 1995). The position of the root of the tree of life, with Bacteria on one side and Archaea and Eucarya on the other, was determined by phylogenetic analyses (Gogarten et al. 1989; Iwabe et al. 1989). This tree topology, which proposes that eubacteria are the earliest-diverging forms of life (Pool 1990), is supported by fossil and biogeochemical data (Brocks et al. 1999; Knoll 2003). However, the relationship between Archaea and Eucarya is currently in dispute. The three domain hypothesis proposes that archaebacteria and eukaryotes are sisters (monophyletic groups that share a common ancestor) (Cavalier-Smith 1987a), while the eocyte hypothesis proposes that eukaryotes are not sisters of but arose from within the archaebacterial lineage (Crenarchaeota) (Lake et al. 1984). Compelling arguments put forward by Cavalier-Smith (1987 and 2002) point out that the most parsimonious interpretation of the distributions of homologous features among the three domains is that Archaea and Eucarya are sisters (Cavalier-Smith 1987a; Cavalier-Smith 2002c). However, a relatively recent set of robust phylogenetic analyses appear to support the eocyte hypothesis (Archibald 2008; Cox et al. 2008). As this issue is currently unresolved and the effect of this distinction to the following discussion is subtle, by convention, I will continue with the three domain model. 4 All extant eukaryotes share features that distinguish them from other forms of life and support their common ancestry (Maynard Smith and Szathmary 1995). The most relevant of these features to the subject of meiosis are linear chromosomes, contained within a nucleus that is part of an endomembrane system (which includes the nuclear envelope, endoplasmic reticulum, Golgi apparatus, and lysosomes) and an endoskeleton (Cavalier-Smith 2002a; Cavalier-Smith 2010). Eukaryotes also possess mitochondria (Roger 1999) (although some have highly derived forms of mitochondria called hydrogenosomes (Muller 1993) and mitosomes (Tovar, Fischer, and Clark 1999) instead) and many eukaryotic cells also contain photosynthetic plastids (Adl et al. 2005). Mereshkowsky (1910) and Koso-Polyanski (1924) first proposed that mitochondria and chloroplasts are symbionts that arose from the engulfment of bacteria by an ancestral eukaryotic cell; this forgotten concept was later independently revived by Lynn Margulis (Sagan 1967; Maynard Smith and Szathmary 1995; Knoll 2003). It is now widely accepted (Embley and Martin 2006; Poole and Penny 2007) that mitochondria and chloroplasts are the endosymbiotic descendants of bacteria (α-proteoand cyanobacteria, respectively) that were engulfed by eukaryotes (Margulis 1970; Gray and Doolittle 1982; Gray 1989). The most convincing support for the origins of organelles by endosymbioses comes from phylogenetic analyses that indicate genes from mitochondria or chloroplasts are more closely related to bacteria than to eukaryotes (Poole and Penny 2007). While endosymbioses of cyanobacteria have likely occurred multiple times after the divergence of eukaryotes from their last common ancestor (Yoon et al. 2004), the engulfment of α-proteobacteria probably occurred one time prior to the divergence of all extant eukaryotes (Roger 1999). These observations have led to hypotheses in which the nucleus is also proposed to be an endosymbiont, usually an archaebacterium (Lake and Rivera 1994; Horiike et al. 2001; Shinozawa, Horiike, and Hamada 2001). 5 If we compare the cytological and phylogenetic features of mitochondria and chloroplasts to the nucleus, several important differences are apparent (Poole and Penny 2007). Unlike mitochondria and chloroplasts, which have at least two different membranes (i.e. the original bacterial membrane surrounded by the eukaryotic endomembrane), the nuclear envelope is in dynamic continuity with the rest of the endomembrane apparatus (Margulis 1970). In addition, although endosymbioses of eubacterial and eukaryotic (secondary plastids) cells within eukaryotic host cells are well known, there are no known cases of archaebacterial intracellular endosymbionts (Poole and Penny 2007). Finally, although many genes within eukaryotic genomes have been identified as eubacterial or archaebacterial homologs (Koonin 2010), phylogenetic analyses consistently retrieve topologies in which eukaryotic genes form distinct monophyletic groups and do not arise from within eubacterial or archaebacterial groups (Poole and Penny 2007). These topologies differ from evolutionary relationships inferred from phylogenetic analyses of mitochondrial and plastid genes. The cytological and phylogenetic differences between these organelles are best explained by autogenous models of nuclear formation (Martin 1999). According to the neomuran (“new walls”) hypothesis the most important event during eukaryotic evolution was the replacement of the peptidoglycan murein in eubacterial cell walls with N-linked glycoproteins in the common ancestor of archaebacteria and eukaryotes, resulting in a “…more flexible surface coat…” (CavalierSmith 2002a). Initially, this change may have provided resistance to antibiotics similar to penicillin that disrupted peptidoglycan synthesis (Maynard Smith and Szathmary 1995). Archaebacteria may have substituted eubacterial acyl ester lipids with prenyl ether lipids, resulting in a new exoskeleton, while proto-eukaryotes retained the flexible surface (Cavalier-Smith 1987a; Cavalier-Smith 2002a). The eukaryotic membrane, along with a complex cytoskeleton, allowed for the evolution of a phagocytic lifestyle, in which particles (including other cells) are engulfed within a vacuole (Cavalier-Smith 1987a; 6 Cavalier-Smith 2002a). In short, subsequent invaginations resulted in the formation of the endomembrane system, including the nuclear envelope (Cavalier-Smith 1987a; Cavalier-Smith 1988; Cavalier-Smith 2002d; Cavalier-Smith 2010). Internal compartmentalization may have provided spatial and temporal control of transcription and translation and protected the genomic integrity of the proto-eukaryote (CavalierSmith 1987a; Cavalier-Smith 1988; Cavalier-Smith 2002d; Cavalier-Smith 2010). The origin of nucleated cells, which are dramatically different from eubacterial or archaebacterial cells that have no endomembrane system or organelles, necessitated the evolution of distinctly eukaryotic modes of nuclear division (Maynard Smith and Szathmary 1995). A comparison of mitotic and meiotic divisions Among eukaryotes two types of nuclear division are possible, mitosis and meiosis (Figure 1.3) (Griffiths et al. 2000). The mitotic nuclear division, in which two genetically identical cells arise from one, is the only mode of replication for somatic or vegetative cells in multicellular organisms and serves as a form of asexual reproduction in unicellular organisms (Flemming 1878; Weismann, Parker, and Ronnfeldt 1893; Huxley 1942). The generalized meiotic nuclear division, during which the genetic content of the cell products is halved, is the sole source of the cells necessary for sexual reproduction (e.g. spores or gametes) in eukaryotes (Weismann, Parker, and Ronnfeldt 1893; Churchill 1970). This halving of the genetic material during meiosis serves to maintain appropriate numbers of chromosomes when cells are subsequently fused, restoring the parental state (Weismann, Parker, and Ronnfeldt 1893). During both mitotic and pre-meiotic interphases, the genomes of cells are replicated once (the synthetic or S-phase) (John 1990). Although mitotic and premeiotic S-phases are similar in that the duplication of chromosomes results in paired, nearly identical, chromosome copies (sister chromatids), there are differences that distinguish them (John 1990; DePamphilis 1996). For example, in yeast, pre-meiotic S- 7 phase is 2-3 times longer than the mitotic S-phase in diploids (approximately 30 and 65 minutes, respectively) (Williamson et al. 1983). This phenomenon has also been witnessed in different animals, such as the newt Triturus (Callan 1972) and the fruit fly Drosophila (Chandley 1966). These differences may be attributed to variation in numbers and activation of replicon origins and the rate of replication fork migration (John 1990; DePamphilis 1996). It is interesting, however, that when cells undergoing pre-meiotic S-phase were removed from the anthers of Lilium and Trillium and placed in a culture medium, the cells successfully completed mitosis (Lima-de-Faria 1969; Ito and Takegami 1982), indicating that although pre-meiotic S-phase is different than mitotic S-phase, they are similar enough in these organisms that mitosis still proceeds (John 1990). Following interphase, cells undergoing either mitotic or meiotic divisions enter a stage called prophase (John 1990; DePamphilis 1996; Griffiths et al. 2000). During mitotic prophase, the paired sister chromatids contract into a series of coils, packaging them for alignment along the metaphase plate and subsequent segregation to opposite sides of the nucleus later during the mitotic cell cycle (Figure 1.3 A. - ii.) (John 1990; Griffiths et al. 2000). Thus mitosis is distinguished by a single round of DNA replication followed by a single nuclear division (Flemming 1878). Because the numbers of homologous chromosome sets (ploidy – e.g. diploid or 2n) is maintained, the mitotic division is called an equational division (John 1990). In meiosis, two rounds of division (Meiosis I and Meiosis II) follow a single round of DNA replication (Figure 1.3 B.) (Churchill 1970). The first meiotic division results in the reduction of ploidy (diploid (2n) to haploid (n)) (John 1990), and, so, is called the reductional division, while the second meiotic division is equational because the haploid state is maintained (Weismann, Parker, and Ronnfeldt 1893; Churchill 1970). Thus, in a single diploid (2n) cell, meiosis yields four haploid (n) products (although in female meiosis not all products survive as gametes) (John 1990). 8 Meiotic prophase I is much longer and more complex than mitotic prophase, due primarily to the formation of bivalents (paired homologous chromosomes) during meiosis that do not form during mitosis (Figure 1.3 B. - iii). In addition, the formation of synaptonemal complexes (Carpenter 1987) and/or crossing over (chiasma) (Ruckert 1892; Janssens 1909) between homologous chromosomes are required in most (but not all) organisms for appropriate pairing and segregation (John 1990). However, it is possible that the binding of microtubules to chromosomes may be even more important than the formation of bivalents in determining the segregation patterns of the chromosomes (Simchen and Hugerat 1993). Sets of microtubular structures (spindles), arising from barrel-shaped organelles (centrioles) located at opposite sides of the nucleus, attach to protein structures (kinetochores) that are associated with sister chromatid contact points (centromeres) (Figure 1.3 A. – iii and B. – iii and iv) (John 1990; Griffiths et al. 2000). During mitosis, each sister chromatid is attached to an opposing set of microtubules, resulting in alignment of the chromosomes between two poles (Simchen and Hugerat 1993). These microtubules exert opposing forces toward the poles (John 1990; Simchen and Hugerat 1993). When sister chromatid cohesion lapses, sisters segregate to opposite poles (John 1990; Simchen and Hugerat 1993). In meiosis I, microtubules connect to only one sister chromatid per chromosome, on opposite sides of bivalents, when attachments between homologous chromosomes lapse, sisters co-segregate to opposite poles (John 1990; Simchen and Hugerat 1993). Thus the interactions of opposing microtubules with one or both sister chromatids (unipolar or bipolar attachment) determine whether reductional or equational divisions occur (Simchen and Hugerat 1993). The same process that is used for equational divisions during mitosis is used during meiosis I for reductional divisions, in the presence of bivalents, modified kinetochores, and persistent sister chromatid cohesion (Nicklas 1977). Indeed, yeast (Saccharomyces cerevisiae) cells that are in the process of meiosis can be transferred to a vegetative medium, resulting in diploid colonies with 9 recombined genetic markers (Sherman and Roman 1963). These cells likely formed bivalents and crossovers, (resulting in genetic recombination at high, meiotic levels) followed by mitotic (equational) divisions (Simchen and Hugerat 1993). Given they are both equational divisions, it is tempting to conclude that the second meiotic divisions are the same as mitotic divisions (John 1990). It is true that the second meiotic division does not require many proteins necessary for completion of the first meiotic division, during which several specialized proteins known to function only during pairing of homologous chromosomes and formation of synaptonemal complexes and crossovers are necessary (Paques and Haber 1999). However, there are at least three important differences; 1) during mitosis, sister chromatids are associated along their entire lengths but, during meiosis II, sister chromatid cohesion is maintained only around centromeres, resulting in splayed chromosome arms; 2) paired sister chromatids in mitosis are genetically identical, while, in meiosis II, sister chromatids are not identical, due to genetic recombination; and 3) nuclei are diploid (2n) during mitosis but haploid (n) during meiosis II (John 1990; DePamphilis 1996). Therefore, although mitotic and meiotic equational divisions are, in principle, the same, they are not identical (John 1990). In total, all of these observations indicate that, although mitosis and meiosis are very similar, hinting at a close evolutionary relationship, they are also distinguished by important functional differences. The apparent similarities presented here are confirmed by studies which indicate that many proteins necessary for the completion of mitosis are also important for completion of meiosis (Marcon and Moens 2005). Likewise, the presence of proteins known only to function during meiosis sheds light on an unparalleled evolutionary history. The functions of these proteins have been studied most often in animals, fungi, and plants (representing the eukaryotic supergroups Opisthokonta and Archaeplastida (Figure 1.1; and discussed further below), although other organisms, such as the ciliate Tetrahymena thermophila (Cole et al. 1997) 10 (Chromalveolata) and the amoeba Entamoeba histolytica (Lopez-Casamichana et al. 2008) (Amoebozoa) have also been studied. Though there are differences in meiosis among different eukaryotes, the proteins studied here are highly conserved in both sequence and function. Although not all eukaryotes have been studied, the similar functions of these proteins among four of the six currently recognized eukaryotic supergroups and the high degree of amino acid sequence conservation, strongly implies that the proteins fulfill the same functions in unstudied extant organisms. Furthermore, the inferred presence of genes encoding these proteins in the common ancestors of eukaryotes strongly implies that mitotic and meiotic functions were also occurring (Ramesh, Malik, and Logsdon 2005). The origin and evolution of meiotic genes Due to the presence of mitosis in all extant eukaryotes, it is widely accepted that this nuclear division was likely to have been present in their last common ancestor (Cavalier-Smith 1981b). Furthermore, it is widely accepted that genes encoding proteins that function during mitosis were present in the common ancestor of all extant eukaryotes (Eme et al. 2009; Wickstead, Gull, and Richards 2010). More contentious is the notion that meiosis, and genes encoding proteins that function during meiosis, were present in the ancestor of all extant eukaryotes (Malik et al. 2008). This is due primarily to the fact that, although mitosis has been observed in all extant eukaryotes, meiosis is not observed in some putatively asexual eukaryotes (Schurko and Logsdon 2008; Schurko, Neiman, and Logsdon 2009). Specifically, the apparent absences of meiosis and sexual reproduction during the lifecycles of Giardia intestinalis, Trichomonas vaginalis, and Vairimorpha necatrix led to the speculation that these organisms diverged prior to the origin of meiosis (Cavalier-Smith 1989). Molecular phylogenetic analyses of small ribosomal subunit and translation elongation factor EF-1 alpha nucleotide sequences yielded tree topologies in which these supposed “primitive” organisms were depicted as the earliest diverging eukaryotes (Leipe et al. 1993; Kamaishi et al. 1996; Hashimoto et 11 al. 1997). These eukaryotic phylogenies appeared to support the Archezoa hypothesis (Cavalier-Smith 1989), in which organisms with presumed ancestral features and no observed meiosis emerged early during eukaryotic evolution. Such features include: prokaryote-like transcriptional apparatus (van Keulen et al. 1991a; van Keulen et al. 1991b) and the apparent absence of mitochondria (Tovar, Fischer, and Clark 1999). Complex organisms (i.e. animals, fungi, and plants), would form a “crown” at the top of the eukaryotic tree of life (Sogin, Elwood, and Gunderson 1986; Woese, Kandler, and Wheelis 1990; Brinkmann and Philippe 2007). Subsequent phylogenetic studies with more sophisticated methods and data, using more realistic models of protein substitution, have revealed that the placement of organisms at the base of the original phylogenetic trees were the result of a statistical anomaly called long-branch attraction (Edlind et al. 1996; Keeling and Doolittle 1996; Hirt et al. 1997; Hirt et al. 1999; Felsenstein 2004). Although the possibility remains that T. vaginalis and G. intestinalis may be among the earliest-diverging eukaryotes, V. necatrix (a Microsporidian) is now known to be a fungus (Hirt et al. 1999). The presumption that the Archezoa lack mitochondria has also been proven erroneous (Roger 1999). Instead, derived and highly reduced forms of mitochondria (mitosomes and hydrogenosomes) have been discovered in representatives of each putatively earlydiverging eukaryotic lineage (Germot, Philippe, and LeGuyader 1997; Tovar, Fischer, and Clark 1999; Tovar et al. 2003; van der Giezen 2009). However, sexual reproduction has yet to be observed among any of these lineages. Direct observation of meiosis is often difficult or impossible with many diverse eukaryotes (Schurko and Logsdon 2008). However, we may determine if organisms have the potential to undergo meiosis by the presence of genes (Schurko and Logsdon 2008). We can also use the distribution of meiotic genes to infer their presence in the common ancestors of different eukaryotic groups (including the ancestor to all eukaryotes) (Dacks and Doolittle 2001; Villeneuve and Hillers 2001; Ramesh, Malik, 12 and Logsdon 2005; Malik et al. 2008). By using phylogenetic analysis it is possible to determine when meiosis-specific genes, and meiosis, arose and to determine if any eukaryotes diverged evolutionarily prior to the origins of these meiotic genes (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). That is, it is possible to determine if apparent gene absences are primitive or derived states (Dacks and Doolittle 2001). Some eukaryotes may have diverged prior to the origin of sexual reproduction in eukaryotes while others may utilize a primitive type of meiosis (Cleveland 1947; Cleveland 1956; Cavalier-Smith 1981b; Archetti 2004). Previously, a set of “core meiotic recombination machinery” (Spo11, Rad50, Mre11, Dmc1, Rad51, Msh4, Msh5, and Mlh1) was defined by Villeneuve and Hillers (2001) as a collection of highly conserved orthologs present in animals, fungi, and plants (Villeneuve and Hillers 2001). This list includes some components known to function only during meiosis in model organisms (Spo11, Dmc1, Msh4, and Msh5) (Bishop et al. 1992; Lichten 2001; Snowden et al. 2004). Thus the authors rightly pointed out that at least three events were important for the evolution of meiosis: endogenous double-strand DNA breaks (Spo11 (Keeney, Giroux, and Kleckner 1997)), interhomolog DNA strand exchange (Dmc1 (Bishop et al. 1992)), and resolution of Holliday junctions as crossovers (Villeneuve and Hillers 2001). Furthermore, the distribution of these genes among animals, fungi, and plants implies that they arose prior to the divergence of the eukaryotes considered. Villeneuve and Hillers suggest, therefore, that the genes arose in the common ancestor of all eukaryotes (Villeneuve and Hillers 2001). However, given current hypotheses for rooting the eukaryotic phylogeny (Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003a; CavalierSmith 2010), only the placement of the root between the Bikonta and the Unikonta (Figure 1.1) would support the conclusion that the ancestor of animals, fungi, and plants is also the last common ancestor of eukaryotes. That is, if either members of the Metamonada or Discoba are the earliest-diverging eukaryotes (discussed further in 13 Current state of the eukaryotic phylogeny below) then the meiotic genes could have arisen after their divergence. Then, the meiotic genes would be present in the ancestor of animals, fungi, and plants but not the ancestor of all eukaryotes. More complete testing of the eukaryotes would need to be performed to determine if any organisms diverged prior to the origin of meiosis or if the last common ancestor of eukaryotes was capable of meiosis. Additional studies tested the specific hypotheses that G. intestinalis and T. vaginalis diverged prior to the origin of meiosis by determining which genes that encode products necessary for completion of meiosis (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). In total; 29 genes (9 meiosis-specific) were studied among eukaryotes representing five of the six currently recognized eukaryotic supergroups. Several genes tested are present in G. intestinalis and T. vaginalis (21 and 27, respectively), including the meiosis-specific genes (6 and 8, respectively). These studies indicate that G. intestinalis and T. vaginalis are not candidates for ancient asexuality. Although previous studies have failed to produce a candidate lineage for ancient asexuality or primitive meiosis, they provided the basis for addressing open questions about when and how meiosis arose and how it subsequently evolved (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). That is, rather than using a candidate organism approach, in which different species that may represent early-diverging eukaryotic lineages are studied, here we have employed a gene-centric approach in which the distribution and phylogenetic analyses of genes are the focus. Thus the goals of this thesis were not to detect meiosis or to identify putative ancient asexual organisms per se but, instead, to study the evolution of meiotic genes in their own right. Specifically, I address the following questions: 1) Were meiotic gene present in the genome of the last eukaryotic common ancestor?; 2) By what genetic mechanisms did meiotic genes arise?; 3) Could the products that meiotic genes encoded in the last eukaryotic ancestor have 14 functioned during meiosis?; and 4) How have the suites of meiotic genes observed in various extant eukaryotic genomes evolved? The answers to these questions will contribute to our understanding of the origin and evolution of meiosis and, more generally, the evolution of eukaryotes. Components of meiotic interhomolog DNA strand exchange To add resolution to our view of the evolution of complex processes that occur during meiosis, the genes encoding proteins central to interhomolog DNA strand exchange are the focus of this thesis. Interhomolog recombination (X in Figure 1.3 – B – iii) can result in gene conversion and/or crossing-over that greatly increases the efficacy of natural selection (Rice and Chippindale 2001; Agrawal 2006; Otto and Gerstein 2006). That is, it produces novel combinations of genes (Figure 1.3 – B – v, black and grey regions of chromosomes) that, when combined with other products of meiosis (e.g. fertilization), enabling eukaryotes to respond evolutionarily to changing environments more rapidly than asexual organisms (Fisher 1930; Muller 1932; Van Valen 1973). Although many benefits of recombination are observed at the population level, its origin and persistence during the evolution of eukaryotes is due, more likely, to the selective benefits of the appropriate pairing and segregation of homologous chromosomes in maintaining genomic integrity (Kleckner 1996; Villeneuve and Hillers 2001; Cavalier-Smith 2002d). In most cases, interhomolog DNA strand exchange (Figure 1.3 – B – iii) is necessary for the formation of bivalents and correct segregation of homologous chromosomes to opposite poles during meiosis I (Moore and OrrWeaver 1998). Following these observations, it has been declared that “…the very essence of sex is meiotic recombination.” (Villeneuve and Hillers 2001). Different models of meiotic recombination have been proposed, such as the synthesis-dependent strand annealing and double-strand break repair models (Paques and Haber 1999). In each of these models, the interhomolog DNA strand exchange reaction that occurs 15 between homologous chromosomes during meiosis I (Figures 1.3 and 1.4) is central to meiotic interhomolog DNA strand exchange (Paques and Haber 1999). Thus, the very essence of meiotic recombination is the interhomolog DNA strand exchange reaction. Therefore, the best way to gain a more complete understanding of the evolution of genes involved in meiosis and to detect any sort of “primitive” meiosis is to study the interhomolog DNA strand exchange reaction; the components involved in interhomolog DNA strand exchange during meiosis are the main focus of this thesis. Genetic recombination between sister chromatids is important for repair of DNA double-strand (dsDNA) breaks that may be caused by mutagens or by collapsed or damaged replication forks (John 1990). During entry into meiosis, dsDNA cuts introduced by meiosis-specific proteins (Spo11-1 or Spo11-2) are repaired with recombination between either sister chromatids or homologous chromosomes (Keeney, Giroux, and Kleckner 1997; Hartung et al. 2002). In animals, fungi, and plants, several genes whose products are important for both sister chromatid and interhomolog DNA strand exchange (Rad52, Rad59, Rad51, Rad55, Rad57, Rad54, and Rdh54) and some that are known to function only during meiosis in model organisms (Dmc1, Hop2, and Mnd1) have been studied extensively (see Table 2.5 and the description of Figure 1.4 for references). A general model of the interactions of thirteen proteins that function during interhomolog DNA strand exchange is presented (Figure 1.4). This model illustrates four important steps: i) formation of a Rad51/Dmc1-ssDNA pre-synaptic filament on a 3’ ending DNA strand (A-D); ii) capture of a DNA duplex by the presynaptic filament (E and F); iii) search by the pre-synaptic filament for regions of DNA duplex homology (F.); and iv) invasion of DNA duplex by the pre-synaptic filament and D-loop formation (G) (Filippo, Sung, and Klein 2008). Studies in which searches for genes that encode Rad51, Rad52, Dmc1, Hop2, and Mnd1 proteins among diverse eukaryotes (i.e. all eukaryotic supergroups except Rhizaria; Figure 1.1) indicate that, due to their distributions, these strand exchange components must have arisen very 16 early during eukaryotic evolution (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). Interestingly, some of these genes have not been found within the genomes of many diverse, less well-studied, eukaryotes (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). These apparent absences may be due either to true losses of genes from genomes or they may represent instances of non-detection (i.e. type II error). Previous investigation of the distributions of genes cannot easily distinguish between these possibilities (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). This limitation inhibits studies of the distribution of genes among eukaryotes since different suites among diverse organisms cannot be verified bioinformatically and functional studies may be difficult or impossible to perform, especially with large numbers of eukaryotes. To address this issue, the distributions of 13 genes that encode meiotic interhomolog DNA strand exchange components (Figure 1.4) among 47 diverse eukaryotes (representing all supergroups except Rhizaria, Figure 1.1) were determined. In addition, these data were useful for the development of a heuristic metric for determining the likelihood that observed absences represent true losses of genes from genomes. For the first time, we were able to assess our confidence in the suites of genes observed in diverse, relatively unstudied, eukaryotic genomes. This new insight allowed us to study patterns in the distributions of strand exchange genes across eukaryotes and to formulate an evolutionary hypothesis explaining these patterns. This project is presented in Chapter 2 of this thesis. Of particular interest are the eukaryotic Rad51 (Shinohara, Ogawa, and Ogawa 1992) and Dmc1 (Bishop et al. 1992) genes whose products catalyze homologous DNA strand exchange during genetic recombination (Paques and Haber 1999). Both Rad51 and Dmc1 are related (orthologous (Ridley 2004)) to the eubacterial recA and the archaebacterial RadA genes, whose products function during homologous recombination and DNA repair (Cox 1993; Clark and Sandler 1994; Camerini-Otero 17 and Hsieh 1995; Sandler et al. 1996). The master recombinase (Rad51) forms righthanded helical filaments on single-stranded and double-stranded DNA (Conway et al. 2004) during repair of all double-strand breaks (Shinohara, Ogawa, and Ogawa 1992). The Dmc1 proteins function similarly, promoting interhomolog strand exchange only during meiosis in model organisms (Figure 1.1) (Bishop et al. 1992). Among animals, fungi, and plants, rad51 mutants experience reduced recombination, resulting in decreased resistance to mutagens, and diminished sporulation or fertility, while mutations in vertebrates cause embryonic lethality (Bishop 1994; Bleuyard, Gallego, and White 2006). In animals, fungi, and plants, dmc1 mutants reduce or eliminate homologous recombination during meiosis (Bishop et al. 1999; Tsubouchi and Roeder 2003). Given the evolutionary relationships of eukaryotic Rad51 and Dmc1 genes to eubacterial recA and archaebacterial RadA genes and the central role of DNA strand exchange catalysis to the DNA damage repair in all organisms and meiosis in eukaryotes, elucidating the evolutionary histories of Rad51 and Dmc1 is important to understanding the origin and evolution of meiosis. Previous studies, including the one presented in Chapter 2, in which the distributions of Rad51 and Dmc1 genes were studied, were limited by the availability of genome sequence data from diverse eukaryotic lineages. We have concluded that these genes arose very early during eukaryotic evolution but not whether they were present in the last common ancestor of all eukaryotes. Determining more accurately when Rad51 and Dmc1 arose during eukaryotic evolution and whether there are any other organisms or groups of organisms that lack these genes may provide valuable insight into the evolution of interhomolog DNA strand exchange during meiosis. The presence of the Dmc1 gene may also serve as proxy for the presence of meiosis itself (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). That is, although the absence of Dmc1 does not indicate that meiosis is absent, the presence of a functional copy indicates that meiosis is likely to be present (Schurko and Logsdon 2008). If the Dmc1 gene is found in 18 representatives of all eukaryotic groups we can infer that their ancestor also possessed a Dmc1 gene (Dacks and Doolittle 2001; Koonin 2010) and was likely to have been capable of meiosis (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008; Schurko and Logsdon 2008). In Chapter 3, using a combination of extensive searches of gene and protein sequence repositories and degenerate PCR, we demonstrate that both Rad51 and Dmc1 genes are present in genomes of organisms representing all known eukaryotic supergroups (Figure 1.1) and were, therefore, likely to have been present in the genome of the last eukaryotic common ancestor. To understand the importance of specific amino acids to the functions of Rad51 and Dmc1, we aligned protein sequence data from all known eukaryotic supergroups and identified amino acids that are highly conserved among them. We also identified several amino acid residues that may confer Rad51- or Dmc-specific functions, due to their conservation in one set of proteins but not the other, and that were likely to have been present in the last common ancestor of all extant eukaryotes. Collectively, these data imply that Rad51 and Dmc1 genes present within the genome of the last common ancestor of eukaryotes encoded proteins that functioned during both mitosis and meiosis. Finally, although the distributions of genes among diverse eukaryotes allow us to infer when meiotic genes may have arisen, more analyses are necessary to determine how they arose. What known genetic mechanisms yielded genes that encode products that function during meiosis? Phylogenetic studies of meiotic genes can provide insight into their evolutionary histories and may inform their origins (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). For example, studies indicate that Rad51 and Dmc1 genes are paralogs (genes resulting from gene duplication events (Ridley 2004)) (Ramesh, Malik, and Logsdon 2005; Lin et al. 2006; Malik et al. 2008). Since Rad51 and Dmc1 genes are orthologous to both the eubacterial recA and the archaebacterial RadA genes that are known to be important for DNA damage repair in prokaryotes 19 (Marcon and Moens 2005), two genes, one encoding proteins that are known to catalyze DNA strand exchange during both mitosis and meiosis (Rad51) and the other encoding proteins that are known to catalyze interhomolog DNA strand exchange only during meiosis in model organisms (Dmc1), arose from a single gene that most likely encoded products involved in DNA damage repair early during eukaryotic evolution (Figure 1.5) (Ramesh, Malik, and Logsdon 2005; Lin et al. 2006; Malik et al. 2008). Thus we can study the evolutionary histories of these genes to determine when meiosis may have arisen during eukaryotic evolution and whether any organisms diverged prior to the duplication event yielding the meiosis-specific gene (Dmc1) (Figure 1.5). Similarly, this pattern has been observed in studies of the Spo11 paralogs (Malik et al. 2007). The Spo11-1 and Spo11-2 genes encode meiosis-specific products (Atcheson et al. 1987; Keeney, Giroux, and Kleckner 1997; Hartung et al. 2002) that are paralogs of the Spo11-3 gene (Malik et al. 2007) whose products function only during vegetative growth DNA in Arabidopsis thaliana (Hartung and Puchta 2001; Sugimoto-Shirasu et al. 2002; Yin et al. 2002). The Spo11 homologs are orthologous to the archaebacterial Top6A gene (Atcheson et al. 1987), a type II topoisomerase that functions to separate replicated chromosomes (Bergerat et al. 1997; Nichols et al. 1999; Corbett and Berger 2003). Like Rad51 and Dmc1, the evolutionary history of Spo11 genes is similar to the model presented in Figure 1.5 (Malik et al. 2007). Indeed, Malik (2007) demonstrated that many meiosis-specific genes fit this pattern (Malik 2007). We hypothesized, therefore, that meiosis may have arisen in toto by large-scale gene duplications, early during eukaryotic evolution. Consistent with this hypothesis, It has been shown that large-scale gene duplication events may have occurred early during eukaryotic evolution (Zhou, Lin, and Ma 2010). Due primarily to the availability of more sensitive gene sequence search methods (see Chapter 4 Methods), more realistic models of protein substitution (e.g. LG) for phylogenetic analyses, and unprecedented access to genome sequence data for 20 diverse eukaryotic groups, we were able to extend the previous study on the origins of meiosis-specific genes by duplication to include representatives of all known eukaryotic supergroups (Figure 1.1). We determined the eukaryote-wide distributions of twenty genes that encode products that perform five important functions during meiosis: 1) pairing of homologous chromosomes; 2) sister chromatid cohesion; 3) dsDNA cuts; 4) interhomolog DNA strand exchange; and 5) Holliday junction resolution (Table 4.1). Eighteen out of 20 genes tested fit the pattern presented in Figure 1.5. Furthermore, given their phylogenetic distributions among eukaryotes groups, these paralogs are inferred to have been present in the ancestor of all extant eukaryotes. Current state of the eukaryotic phylogeny An important motivation for the studies presented in Chapters 3 and 4 was to determine which components were likely to have been present in the common ancestor to eukaryotes. In order to correctly interpret the distributions of genes among eukaryotes and the phylogenetic analyses of their products, an accurate understanding of the evolutionary relationship of eukaryotes is required. The following discussion provides the appropriate framework for such studies. Eukaryotes can be divided into at least six “supergroups” (Opisthokonta, Amoebozoa, Archaeplastida, Chromalveolata, Rhizaria, and Excavata) and at least one group of unclassified organisms (Apusozoa) on the basis of phenotypic, ultrastructural, and phylogenetic studies (Figure 1.1) (Baldauf 2003; Simpson and Roger 2004; Roger and Hug 2006; Baldauf 2008; Cavalier-Smith 2010; Roger and Simpson 2009). Two major divisions of eukaryotes (Unikonta and Bikonta) are recognized (Cavalier-Smith 2002a). The Unikonta (Opisthokonta + Amoebozoa) are named for the ancestral possession of a single flagellum and are distinguished by the fusion of three genes that encode enzymes that synthesize pyrimidine nucleotides (Cavalier-Smith 2002a; Stechmann and Cavalier-Smith 2002). The Bikonta (named for the presence of two flagella in their last common ancestor) share a similar two gene fusion (dihydrofolate 21 reductase and thymidylate synthase) (Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003b; Stechmann and Cavalier-Smith 2003a). Recently, phylogenetic analyses including sequence data from representatives of an unclassified group of eukaryotes (Apusozoa) have challenged the monophyly of Unikonta by retrieving topologies in which Apusozoa and Opisthokonta are closely related (Cavalier-Smith and Chao 2010; Parfrey et al. 2010). The presence of the previously mentioned two gene fusion and two flagella would appear to support the placement of Apusozoa within Bikonta (Stechmann and Cavalier-Smith 2002). These conflicting data make the inclusion of representative Apusozoa important for determining which genes may have been present in the last common ancestor of eukaryotes, for they may very well be the earliest-diverged eukaryotes. While the monophyly of some proposed eukaryotic supergroups is widely accepted, other groups remain controversial. The supergroup Chromalveolata is composed of two smaller groups, Chromista (Cryptomonads, Haptophytes, and Stramenopila) and Alveolata whose ancestor engulfed and enslaved red algae (secondary endosymbiosis) (Figure 1.1) (Cavalier-Smith 1981a; Fast et al. 2001; Yoon et al. 2002). Chromalveolates may be sister to Archaeplastida (Cavalier-Smith 2003a). However, whether Chromalveolata are truly monophyletic has been disputed (Parfrey et al. 2006; Parfrey et al. 2010). Although evolutionary relationships inferred from phylogenetic analyses of plastid genes support monophyly, phylogenetic support for monophyly from topologies determined from analyses of nuclear genes is tenuous (Parfrey et al. 2006). In addition, recent phylogenetic analyses indicate that Rhizaria share a more recent common ancestor with the Stramenopiles and Alveolates, prompting calls for the placement of Rhizaria within the Chromista (Burki et al. 2010; Burki et al. 2007; Hackett et al. 2007; Cavalier-Smith 2010). The relationships among Cryptomonads and Haptophytes to other chromalveolates are also unknown as phylogenetic analyses are somewhat conflicting and ambiguous (Patron, Inagaki, and 22 Keeling 2007; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Reeb et al. 2009). What we do know is that the evolutionary relationships among chromalveolates and rhizarians are for more complicated than previously supposed. Therefore, including exemplar chromalveolates and rhizarians is important to these studies. The supergroup Excavata (Discoba and Metamonada) was proposed on the basis of a ventral feeding groove (Figure 1.2) (Simpson and Patterson 1999). The Excavate hypothesis remains controversial due to conflicting evolutionary relationships implied by phylogenetic analyses. While some analyses support the monophyly of excavates (Hampl et al. 2009; Parfrey et al. 2010) others refute it (Parfrey et al. 2006), retrieving polyphyletic groups instead (a group with multiple ancestors). Recall that apparent absences of mitochondria from excavate taxa (Trichomonas vaginalis and Giardia intestinalis) and a primitive looking small subunit rRNA sequence in G. intestinalis initially supported the notion that the excavates are the earliest-diverging eukaryotes (Figure 1.2) (Cavalier-Smith 1987b). However, more recent studies show that relics of mitochondria (mitosomes and hydrogenosomes) are found in T. vaginalis and G. intestinalis (Muller 1993; Tovar, Fischer, and Clark 1999). These observations make the status of these excavates as the earliest-diverged eukaryotes questionable. It has also been proposed recently that the Euglenozoa (Discoba) are the extant representatives of the earliest-diverging eukaryotes as they lack an origin recognition complex and four genes (Tom40, CenpA, Smc5 and Smc6) that are thought to be present in all other eukaryotic groups (Cavalier-Smith 2010). The point is that the question of whether Excavata are monophyletic is tied to the determination of the earliest-diverging eukaryotes. If any excavates are the earliest-diverging eukaryotes, the root of the eukaryotic phylogeny lies between either Discoba and/or Metamonada and all other eukaryotes (Figure 1.1). This placement of the root requires that Excavata are paraphyletic (a group including an ancestor and some, but not all, of its descendants (Ridley 2004)). Excavates are only monophyletic if none of them are 23 the earliest-diverging eukaryotes, and that is currently unknown. Therefore, including representative Metamonada and Discoba is important for inferring which genes were present in the last common ancestor of all eukaryotes. The root of eukaryotes has also been proposed to lie between Unikonta and Bikonta on the basis of the ultrastructural data discussed previously (Figure 1.1) (Stechmann and Cavalier-Smith 2002). However, the reliability of these features for placement of the root of eukaryotic life has recently been called into question by the possible evolutionary relationship of the Apusozoa to Opisthokonta discussed previously (Cavalier-Smith 2010). Placement of the root on the eukaryotic phylogeny remains one of the most vexing questions in molecular phylogenetics. It seems most likely that, despite the presence of highly derived mitochondria (mitosomes), G. intestinalis is still the best candidate for the earliest-diverged eukaryote. Whatever the answer, the ability to polarize the eukaryotic phylogeny will have significant impacts upon studies of the ancestor of eukaryotes. Since the eukaryotic phylogeny remains incompletely resolved, genes must be detected in all eukaryotes to infer its presence in the ancestor to all eukaryotes. However, this approach is certain to underestimate the numbers of components present in the last common ancestor of eukaryotes, due to subsequent gene losses. That is, if genes were lost in some lineages than we cannot, without knowing how they are related, determine whether the genes were present in the ancestor of eukaryotes. Therefore, searches for components among diverse eukaryotes combined with phylogenetic analyses that add resolution to the eukaryotic phylogeny are necessary (Cavalier-Smith 2010). In addition, this method is always prone to the discovery of putative earlydiverging eukaryotes. That is, the last eukaryotic common ancestor is defined by our knowledge of extant eukaryotes and if new lineages are discovered then inferences regarding the common ancestor of extant eukaryotes will obviously need to be reassessed. David Patterson (1999) has estimated that there are approximately 220 24 known genera whose evolutionary relationships have yet to be completely resolved (Patterson 1999). Although most of those unclassified genera have ultrastructural identities similar to eukaryotes with well resolved evolutionary relationships (Patterson 1999), indicating that the full breadth of eukaryotes has probably been discovered, the possibility that one (or more) may represent previously unknown lineages always exists. We performed phylogenetic analyses upon Rad51 and Dmc1 protein sequence data to determine if they are effective markers for resolving the eukaryotic phylogeny. These data are presented in Chapter 3. Products of Rad51 and Dmc1 gene sequences are well conserved among animals, fungi, and plants, with a great degree of similarity and retention of functional motifs (Stassen et al. 1997). The Rad51 gene is present in the genomes of all but one eukaryote studied (G. intestinalis) and both Rad51 and Dmc1 genes are present in single-copy in most organisms (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). These qualities make Rad51 and Dmc1 protein sequences attractive markers for phylogenetic reconstruction. Since Rad51 and Dmc1 genes are paralogs, we also attempted to determine which eukaryotes are the earliest-diverging by reciprocally rooting them (paralogous rooting) (Gogarten et al. 1989; Iwabe et al. 1989; Schlegel 1994; Brown and Doolittle 1995; Baldauf, Palmer, and Doolittle 1996). Although we failed to positively place the root of eukaryotes using these methods, Rad51 and Dmc1 protein sequence data was useful for resolving five of six eukaryotic supergroups and several first order groups (Table 3.1). Summary The origin of meiosis is likely to have been one of the most important events in eukaryotic evolution. The effects of this event can be observed at the genetical, cytological, organismal, and population levels of eukaryotic biology. Indeed, meiosis and sexual reproduction may have provided the genetic grist which, when subsequently acted upon by natural selection, resulted in the rapid evolution of the diverse eukaryotic lineages observed today. This thesis presents a body of work in which the distributions 25 of meiotic genes and phylogenetic analyses of the proteins they encode were used to study the origin and evolution of meiotic genes. The study presented in Chapter 2 shows that at least eight genes whose products are known to be involved in interhomolog DNA strand exchange during meiosis arose very early during eukaryotic evolution. In addition, we applied a heuristic metric to determine if apparent gene absences are due to limitations of the gene sequence search regimen or to bona fide absences of genes from genomes. These analyses indicate that some genes are detected far less frequently than predicted and are likely to indicate true gene losses (Figure 2.1). Interestingly, some organisms have retained all of the genes tested (e.g. Saccharomyces cerevisiae and Homo sapiens) and others have retained relatively few (e.g. Caenorhabditis elegans) (Figure 2.1). Based upon these observations we propose a general hypothesis in which overexpression or mutation of Rad51 gene may allow other components to be lost due to relaxed selection (Chapter 5). In addition, genes that encode components known to function only in complexes may be vulnerable to loss when another component of the complex is lost. The study presented in Chapter 3 focuses on the eukaryotic RecA homologs Rad51 and Dmc1. Rad51 protein functions during both mitotic DNA repair and during meiosis, while Dmc1 catalyzes interhomolog DNA strand exchange only during meiosis in model organisms. We collected nucleotide and protein sequence data from databases and by using degenerate PCR. The dataset contains Rad51 and Dmc1 protein sequences from all available eukaryotic supergroups. Therefore, Rad51 and Dmc1 genes were likely present in the ancestor to all extant eukaryotes. We also analyzed an alignment of 98 Rad51 and 51 Dmc1 protein sequences to determine which amino acid residues are conserved and, therefore, might have conferred Rad51- or Dmc1-specific activities in the common ancestor of eukaryotes (Figure 3.13). We found 18 sites among Rad51 protein sequences and 15 sites among Dmc1 that are completely conserved and likely to have been present in the ancestor of eukaryotes. In addition, we detected 10 sites that 26 are highly conserved in one protein and conserved but different in the other. These residues are likely to facilitate Rad51- or Dmc1-specific activities; their distributions indicate that these functions were present in the common ancestor of eukaryotes The study presented in Chapter 4 was designed to determine the distribution of genes that encode proteins that function during different stages of interhomolog DNA strand exchange in model organisms: 1) synaptonemal complex formation; 2) interhomolog strand exchange; 3) sister chromatid cohesion; and 4) resolution of Holliday junctions as crossovers. We studied the distributions of 20 genes: 10 of that encode proteins that are known to function only during meiosis in model organisms. We determined that 19 of the genes tested are likely to have been present in the common ancestor of eukaryotes. Furthermore, phylogenetic analyses of the protein sequences indicate that all of the putative meiosis-specific genes arose by gene duplication and that they are often paralogs of genes that encode products which function during general DNA damage repair in mitotic cells. Together, the results of these studies have culminated in general models for the origin and subsequent evolution of meiotic genes that are presented in Chapter 5. 27 Figure 1.1: Evolutionary relationships among prokaryotes, members of six currently recognized eukaryotic supergroups and Apusozoa according to multigene phylogenetic analyses. Relationships that are well supported in the literature have solid branches while unsupported or conflicting relationships are represented by dotted lines. Although the monophyly of Rhizaria, Stramenopila, and Alveolata is well supported, the relationships among Archaeplastida, Cryptomonads, and Haptophytes within the photosynthetic ‘megagroup’ have not been established. Current hypotheses for placement of the root of eukaryotes are shown. (Baldauf and Palmer 1993; Baldauf et al. 2000; Stechmann and Cavalier-Smith 2002; CavalierSmith and Chao 2003a; Cavalier-Smith and Chao 2003b; Cavalier-Smith and Chao 2003c; Stechmann and Cavalier-Smith 2003b; Stechmann and Cavalier-Smith 2003a; Simpson and Roger 2004; Cavalier-Smith and Chao 2006; Kim, Simpson, and Graham 2006; Burki et al. 2007; Moreira et al. 2007; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Yoon et al. 2008; Reeb et al. 2009; Roger and Simpson 2009; Cavalier-Smith and Chao 2010; Parfrey et al. 2010) 28 Apusozoa Excavata 28 13 Bacteria Archaea Eucarya Archezoa e.g. Giardia Trichomonas 1 2 meiosis actin, tubulin cytoskeleton, phagocytosis, nucleus, and mitosis surface N-linked glycoproteins replace murein peptidoglycans ATP, DNA, RNA, genetic code, transcription and translation machinery 29 Figure 1.2: The three-kingdom tree of life with relative order of major events during eukaryotic evolution. A simplified universal tree of life as determined by phylogenetic analyses of ribosomal RNA nucleotide sequence data is presented (Woese and Fox 1977; Gogarten et al. 1989; Iwabe et al. 1989; Woese, Kandler, and Wheelis 1990; Brown and Doolittle 1995). Neither branch lengths nor distances between arrows depict specific amounts of time but indicate only the proposed relative order of events. The dashed arrows are intended to illustrate two competing hypotheses; 1) the Archezoa hypothesis, in which some organisms diverged prior to the origin of meiosis (or some meiotic functions) and 2) that meiosis (including all currently known steps) was present in the last common ancestor of eukaryotes. The events are taken primarily from Cavalier-Smith’s “neomuran” hypothesis (Cavalier-Smith 1987a; Cavalier-Smith 1988; Cavalier-Smith 2002d; Cavalier-Smith 2002c; Cavalier-Smith 2002a; Cavalier-Smith 2010). However, Cavalier-Smith seems to imply that meiosis arose at the same time as mitosis, whereas meiosis is shown here to have arisen after mitosis. In Eucarya, one dashed branch is intended to indicate putative primitive eukaryotes (Archezoa (Cavalier-Smith 1989)) and the other branch represents all other eukaryotes. 30 Figure 1.3: General schematic of mitosis and meiosis. A. Mitosis – Chromosomes in a diploid (2n = 2) cell (i) (chromosomes are shown here condensed for convenience but are, in reality, unwound, appearing as threads) replicate (yielding 2 x 2n products) and condense, sister chromatids are tightly associated (ii). Pairs of sister chromatids (chromosomes) line up on the metaphase plate and microtubules bind kinetochores of both sister chromatids (iii) in preparation for the mitotic (equational) division (yielding 2n products) (iv). B. Meiosis - Chromosomes in a diploid (2n = 2) cell (i) (chromosomes are shown here condensed for convenience but are, in reality, unwound, appearing as threads) replicate (yielding 2 x 2n products) and condense, sister chromatids are tightly associated (ii). Homologous chromosomes pair, creating bivalents, synaptonemal complexes form (grey bars), interhomolog DNA recombination (crossing-over) occurs (chiasmata are indicated with an X; only one crossover event is shown but at least one event per chromosome arm occurs in most organisms studied), and microtubules bind only one sister chromatid per pair (iii). This is followed by the first meiotic (reductional) division (2 x n). Pairs of non-identical sister chromatids (chromosomes), with chromosome arms splayed and only the centromeres tightly associated, align (iv) for the second meiotic (equational) division, yielding four non-identical haploid products (n) (v). This image is adapted from (Schurko, Neiman, and Logsdon 2009) with permission. Additional details were provided by (John 1990; Simchen and Hugerat 1993; Kleckner 1996). 13 A. 2n 2n 2x2n i. iv. 2x2n ii. iii. Interphase Mitosis 2xn B. i. 2n iv. 2x2n n v. 2x2n ii. iii. x Interphase Meiosis I Meiosis II 31 32 Figure 1.4: General model of interhomolog DNA strand exchange during meiosis. This model (based upon details from studies of animals, fungi, and plants) presents interactions of 13 proteins and illustrates four steps of interhomolog DNA strand exchange during meiosis: formation of a presynaptic filament on a 3’ ending DNA strand (A-D), capture of a DNA duplex by the pre-synaptic filament (E and F), search by the pre-synaptic filament for regions of DNA duplex homology (F), and invasion of the DNA duplex by the pre-synaptic filament and D-loop formation (G). Components with blue labels are known to function only during meiosis in model organisms. Exact stoichiometry is not implied. The interactions between Rad51 proteins, Rad52 and 59 proteins, and single-stranded DNA (A and B), the formation of Rad52/Rad59 heteroheptamers (A-C), and extension of a Rad51-ssDNA nucleoprotein filament by the Dmc1 protein (C and D) are speculative. (Brill and Stillman 1991; Bishop et al. 1992; Kadyk and Hartwell 1992; Milne and Weaver 1993; Bishop 1994; Bai and Symington 1996; Noble and Guthrie 1996; Klein 1997; Nishinaka et al. 1998; Petukhova, Stratton, and Sung 1998; Shinohara et al. 1998; Arbel, Zenvirth, and Simchen 1999; Bai, Davis, and Symington 1999; Bishop et al. 1999; Chen et al. 1999; Paques and Haber 1999; Petukhova et al. 1999; Borts, Chambers, and Abdullah 2000; Muniyappa, Anuradha, and Byers 2000; Shinohara et al. 2000; Davis and Symington 2001; Gasior et al. 2001; Masson and West 2001; Bochkareva et al. 2002; Brush 2002; Fortin and Symington 2002; Kiianitsa, Solinger, and Heyer 2002; Krejci et al. 2002; Miyagawa et al. 2002; Pellegrini et al. 2002; Solinger, Kiianitsa, and Heyer 2002; Symington 2002; Tsubouchi and Roeder 2002; Cox 2003; Davis and Symington 2003; Sugawara, Wang, and Haber 2003; Anuradha and Muniyappa 2004a; Anuradha and Muniyappa 2004b; Bishop and Zickler 2004; Chen et al. 2004; Dudas and Chovanec 2004; Grishchuk et al. 2004; Krogh and Symington 2004; Sehorn et al. 2004; Ishibashi et al. 2005; Sauvageau et al. 2005; Bleuyard, Gallego, and White 2006; Chi et al. 2006; Enomoto et al. 2006; Flaus et al. 2006; Fung et al. 2006; Henry et al. 2006; Holzen et al. 2006; Ishibashi, Kimura, and Sakaguchi 2006; Cox 2007; Feng et al. 2007; Nimonkar et al. 2007; Chen, Yang, and Pavletich 2008; Filippo, Sung, and Klein 2008; Lopez-Casamichana et al. 2008; Mozlin, Fung, and Symington 2008; Octobre et al. 2008; Pannunzio, Manthey, and Bailis 2008; Sarai et al. 2008; Chang et al. 2009; Fung, Mozlin, and Symington 2009; Kudoh et al. 2009; Sakaguchi et al. 2009; Seong et al. 2009; Latypov et al. 2010; Okorokov et al. 2010; Szekvolgyi and Nicolas 2010) 33 3’ 5’ 3’ 5’ A. •RPA binds ssDNA, preventing secondary structure formation •Rad51 and Rad52/59 present as heptamers B. •Rad52/59 recruits Rad51 •Rad52/59-Rad51 complex binds RPA-ssDNA complex •RPA’s displaced by Rad52/59 C. •Rad51 binds ssDNA, forming presynaptic filament •Dmc1 binds, extending presynaptic filament •Rad55/57 heterodimer mediates filament assembly D. •presynaptic filament extension displaces remaining RPA •Hop2/Mnd1 heterodimer stabilizes presynaptic filament E. •Hop2/Mnd1 heterodimer captures dsDNA F. •Hop2/Mnd1 heterodimer stabilizes interactions between homologous DNA sequences G. 3’ 5’ 5’ 3’ 5’ 3’ •Rad54 and Rdh54 stimulate D-loop formation and may remove recombinational intermediates RPA 1-3 Rad51 Rad52/59 Rad55/57 Dmc1 Hop2/Mnd1 Rad54 Rdh54 34 Protist X Protist Y Archaeplastida Protist Z Animals Fungi Protist X Protist Y Archaeplastida Protist Z Animals Fungi MEIOSIS MITOSIS/GENERAL Origin of meiotic function Protist A Archaea Bacteria Figure 1.5: A model for the origin of meiotic function by gene duplication. This model hypothesizes that gene duplication events yielding meiosis-specific components mark the origins of their respective meiotic functions. In addition, it suggests that some organisms (most likely protists) diverged prior to the gene duplication events and, therefore, prior to the origins of some meiotic functions. Some organisms may have primitive meiosis or none at all as the ancestral state. 35 CHAPTER 2 A PAN-EUKARYOTIC INVENTORY OF DNA STRAND EXCHANGE COMPONENTS REVEALS PATTERNS OF CONSERVATION AND LOSS Abstract: Recombination is critical for repair of DNA double-strand breaks, and the DNA strand exchange (SE) reaction is central to recombination. We present a phylogenetic inventory of ten SE component proteins (Rad52, Rad59, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, Rad54, and Rdh54) among 47 genera representing five eukaryotic supergroups. We aligned SE protein sequences, verified their homology by phylogenetic analyses, and used these alignments to create hidden Markov model (HMM) profiles and position-specific scoring matrices (PSSM), which we used to further scrutinize public nucleotide sequence databases. Phylogenetic analyses of all the resulting sequences confirmed orthology of the evolutionarily diverse SE component proteins. Eight of ten SE proteins (Rad52, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, and Rdh54) are present in five of six eukaryotic supergroups and were likely present in the common ancestor of extant eukaryotes. An evolutionary analysis of the heterotrimeric Replication Protein A complex (RPA1, RPA2, and RPA3) is also presented. Since RPA subunit protein sequences and their single-stranded DNA binding domains are well conserved, apparent absences of RPA-coding genes from genomes most likely result from detection failures due to limitations of the search regimen. To validate the approach, we fitted a Poisson regression model to the numbers of observed RNA Polymerase I (Pol I) subunit detection failures. We then compared the numbers of RPA subunit detection failures observed to the numbers predicted by the Pol I regression analysis. The results demonstrate that the frequencies of RPA subunit detection failures and their Smith-Waterman alignment scores are strongly correlated. We then applied this approach to the SE proteins by comparing the numbers of detection failures for SE components, given their Smith- 36 Waterman scores. Detection failures of six proteins (Rad52, Rad59, Rad51, Dmc1, Rad54, and Rdh54) occurred more frequently than predicted, indicating the likely loss of these genes from some completely sequenced genomes. The inferred losses of these genes can be explained if compensatory changes (e.g. overexpression or functional mutations) of Rad51 suppress SE component mutant phenotypes. Introduction: In eukaryotes, meiosis is necessary for sexual reproduction (Weismann, Parker, and Ronnfeldt 1893; Churchill 1970). During meiosis, a single round of genome-wide DNA replication is followed by two nuclear divisions (reductional and equational) (Churchill 1970). A diploid organism typically produces four haploid cells that combine with other haploid products of meiosis (e.g. spores and gametes) (Weismann, Parker, and Ronnfeldt 1893). In this manner, the chromosomes of organisms are recombined while maintaining the appropriate numbers of chromosomes (Weismann, Parker, and Ronnfeldt 1893; Cavalier-Smith 2002d). Although there are important differences between meiosis and mitosis, during which one nuclear division follows one round of DNA replication (Flemming 1878), many proteins that function during meiosis also function during mitosis (Marcon and Moens 2005). The pairing of non-sister homologous chromosomes during the first (reductional) division, followed by their segregation to opposite spindle poles, is unique to meiosis (Simchen and Hugerat 1993; Paques and Haber 1999; Dudas and Chovanec 2004; Krogh and Symington 2004; Filippo, Sung, and Klein 2008). However, the second (equational) division that occurs during meiosis is similar (though not identical) to the single equational division of mitosis, during which sister chromatids segregate to opposite spindle poles (Nicklas 1977). Genetic recombination between homologous chromosomes is essential in most organisms for appropriate pairing and segregation during the reductional division of meiosis (Moore and Orr-Weaver 1998; Paques and Haber 1999; Dudas and Chovanec 2004; Krogh and Symington 2004; Filippo, Sung, and Klein 2008). The importance of 37 homologous recombination may also be observed at the population level as gene conversions and/or cross-over events may occur that increase the efficacy of natural selection (Fisher 1930; Muller 1932; Hill and Robertson 1966), allowing eukaryotes to respond evolutionarily to changing environments (Van Valen 1973; Rice and Chippindale 2001; Agrawal 2006; Otto and Gerstein 2006). Several models of recombination have been proposed, such as the double-strand break repair, synthesis-dependent strand annealing, and break-induced replication (Paques and Haber 1999; Dudas and Chovanec 2004; Krogh and Symington 2004; Filippo, Sung, and Klein 2008). Central to all of these models is the DNA strand exchange (SE) reaction, in which 3’ ends of single stranded DNA (ssDNA) invade intact DNA duplexes (Paques and Haber 1999; Dudas and Chovanec 2004; Krogh and Symington 2004; Filippo, Sung, and Klein 2008). Double-strand DNA breaks (DSB) in mitotic cells are generally caused by mutagens and collapsed or damaged replication forks (Paques and Haber 1999; Dudas and Chovanec 2004; Krogh and Symington 2004; Filippo, Sung, and Klein 2008). During meiosis, DSBs are introduced by the Spo11 transesterase, followed by resection of the 5’ strand by nuclease activity (Lichten 2001; Krogh and Symington 2004). Several proteins important for SE activity have been studied in animals, fungi, and plants with genetics, molecular biology, and biochemistry (Brush 2002; Krogh and Symington 2004; Sakaguchi et al. 2009); however, less is known about the origins or evolution of SE components. An approach to studying the evolution of genes is to search for and compare them among diverse eukaryotes (Dacks and Doolittle 2001). The presence of orthologs among groups of eukaryotes indicates that the genes must have been present in their last common ancestor, while absences might represent either ancestral or derived states (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005). It is important to include diverse protists in order to estimate when genes most likely arose during eukaryotic evolution (Ramesh, Malik, and Logsdon 2005). Previous analyses indicate 38 that some SE proteins (Rad52, Rad51, Dmc1, Hop2, and Mnd1) likely arose very early during eukaryotic evolution (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). These studies provided a much needed “snapshot” of the distribution of these components in representative animals, fungi, and plants and some (mainly parasitic) protists, yet more specific conclusions regarding the evolutionary histories of SE components could not be made due to limited availability of genome sequence data from diverse lineages within these groups (Adl et al. 2005). In addition, several important mediator proteins that interact with Rad51 or Dmc1 recombinase proteins were excluded from the prior analyses. Here, we present an expanded inventory of the SE machinery (Rad52, Rad51, Dmc1, Hop2, and Mnd1 studied previously, and Rad59, Rad55, Rad57, Rad54, Rdh54, RPA1, RPA2 and RPA3 (Krogh and Symington 2004; Sakaguchi et al. 2009)) with broad taxonomic sampling (47 eukaryotes representing five eukaryotic supergroups (Adl et al. 2005)). This study also addresses the ambiguous interpretation of apparent gene absences, which has been and important limitation in prior phylogenetic analyses (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). In addition, the data collected during this study was used to address the issue of the ambiguous interpretation of apparent gene absences, which has been an important limitation to prior phylogenetic inquiries (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). Current approaches do not distinguish between instances of non-detection that result from (i) failures of the search methods employed, or (ii) true absences (e.g. losses) of genes from completely sequenced genomes (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). We describe a heuristic metric for detecting potential gene absences that can be applied to a broad range of diverse eukaryotes (Adl et al. 2005). The distribution of Replication Protein A complex subunits (RPA1, RPA2, RPA3) provides an empirical basis for determining the limits of sequence detection. RPA subunits are known to function only as heterotrimeric complexes that bind ssDNA and interact with 39 Rad52 proteins during recruitment of Rad51 proteins to pre-synaptic filaments in animals, fungi, and plants (Brill and Stillman 1991; Sakaguchi et al. 2009). In addition, we searched for ten RNA Polymerase I subunits (Kuhn et al. 2007) among the 47 taxa studied (Adl et al. 2005) and compared their distributions to the RPA and SE protein datasets. We determined that absences of at least four SE proteins in our inventory (Rad51, Dmc1, Rad54, and Rdh54) most likely represent true gene losses. Methods: Data acquisition Keyword searches (e.g. Saccharomyces cerevisiae Rad51) of the National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov/)non-redundant protein sequence database retrieved SE protein sequences of RPA1, RPA2, RPA3, Rad52, Rad59, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, Rad54, and Rdh54 (Krogh and Symington 2004; Sakaguchi et al. 2009) from representatives of animals, fungi, and plants. We also searched the clusters of euKaryotic Orthologous Groups of proteins (KOGs) database for each protein (Tatusov et al. 2003). The identities of retrieved protein sequences were initially verified by evaluating the results of bi-directional searches with the tBLASTn (Altschul et al. 1997) option of the Basic Local Alignment Search Tool (BLAST), in which the translated non-redundant nucleotide database is searched using a protein query. The set of protein (and protein-coding) sequences collected in this manner were subsequently used as queries to search additional protein, nucleotide, and expressed sequence tag (EST) databases at NCBI and other public genome sequence databases by BLASTp, tBLASTn, or BLASTn (Table 2.6). Searches were performed for all homologous protein-coding sequences available between December 2009 and June 2010. In an effort to identify apparently missing homologous sequences from distantly-related organisms, additional searches were performed using protein sequence queries from organisms likely to share more recent common ancestors. For example, Trypanosoma 40 brucei protein sequences were used as additional queries for searches of sequences for a closely related kinetoplastid protist, Leishmania major (Adl et al. 2005). Identities of sequences were again confirmed with bi-directional BLASTp, BLASTx and tBLASTn searches. When multiple sequences were found for a species, only the most complete open reading frame or protein prediction was retained for our analyses. If no previously annotated protein sequence was available in a database, we annotated the nucleotide sequences manually, using Sequencher v4.5 (Genecodes, Ann Arbor, MI). Exons were identified with reference to multiple protein sequence alignments, inferred translations from BLASTx pairwise comparisons to the NCBI protein sequence database, and the locations of putative intron splice donor and acceptor sites (Griffiths et al. 2000). Multiple amino acid sequence alignments were calculated using MUSCLE v3.7 (Edgar 2004) and observed with BioEdit v7.0.5.3 (Hall 1999). To further scrutinize publicly available genome sequence data for the presence of SE protein-coding genes, we created local databases of nucleotide and predicted protein sequences for completely sequenced (Sanger sequence coverage of 8x and greater or sequenced from end-to-end) genomes and searched them using HMMER v2.3.2 (Sonnhammer et al. 1998) and tBLASTn (Altschul et al. 1997). Multiple sequence alignments of homologous amino acid sequences positively identified by reciprocal BLAST searches and phylogenetic analysis were used to calculate hidden markov models (HMM, global and local) with HMMER v2.3.2 and position specific scoring matrices (PSSM) using a local installation of the suite of NCBI BLAST programs. These HMM files were then used to search protein sequence data with HMMER, and the PSSM files used to search protein and nucleotide sequence data for homologs using PSI-BLAST and tBLASTn. 41 Phylogenetic analyses We aligned all protein sequences of potential eukaryotic orthologs using MUSCLE v3.7 (Edgar 2004), manually edited them by removing ambiguously aligned columns and gaps in BioEdit v7.0.5.3 (Hall 1999), and performed phylogenetic analyses on the multiple protein sequence alignment. Optimal protein substitution models and parameters were determined for each alignment independently with Modelgenerator v0.8 (Keane et al. 2006). Constant sites were excluded from analyses. Phylogenetic trees were calculated using PhyML v3.0 (Guindon et al. 2009) for 1000 replicates, and PhyloBayes v3. (Lartillot, Lepage, and Blanquart 2009), which used at least two independent chains in which maximum differences observed across all bipartitions were less than 0.10, an indicator that the chains have good convergence (Lartillot, Lepage, and Blanquart 2009). Every other tree after burnins (selected to minimize the differences across all bipartitions) was used to calculate consensus tree topologies in Phylobayes (Lartillot, Lepage, and Blanquart 2009). Analyses were also performed by reciprocally rooting all paralogs (e.g. Rad54 and Rdh54) for positive identification (Ridley 2004). All strand exchange protein sequence alignments were concatenated end-to-end using BioEdit v7.0.5.3 (Hall 1999). Both unpartitioned analyses and analyses partitioned to each protein in the concatenated dataset were performed with RAxML v7.2.7 (Stamatakis, Ludwig, and Meier 2005) for 1000 replicates at the CIPRES Science Gateway v3.0 (Miller et al. 2009). Inventory assembly Genes were determined to be present in an organism when putative orthologs were discovered and identified with bi-directional BLAST and phylogenetic analyses (Figures 2.2-2.18) (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). Protein sequence data for genes that encode all SE proteins (including RPA subunits) in Homo sapiens, Saccharomyces cerevisiae, and Oryza sativa (or its relative)were aligned and their Smith-Waterman pairwise alignment scores (Smith and Waterman 1981) were 42 calculated with the PRSS/PRFX tool (http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=shuffle) (Tables 2.3 and 2.4). In some cases protein sequences were either not available for one representative or we were unable to align them properly (Tables 2.3 and 2.4). We also determined SmithWaterman pairwise alignment scores (Smith and Waterman 1981) for protein sequences from genes that encode RNA Polymerase I proteins (A190, A135, AC40, AC19, AC12.2, Rpb5, Rpb6, Rpb8, Rpb10, and Rpb12 (Kuhn et al. 2007)) from H. sapiens and S. cerevisiae (Table 2.2). Poisson regression analyses (Allison 1999) were calculated on counts of detection failures among 34 genomes and their respective Smith-Waterman pairwise alignment scores (Smith and Waterman 1981) for the RNA Polymerase I dataset and a combined RNA Polymerase I/RPA1-3 dataset using the genmod procedure in SAS v. 9.2 (SAS Institute Inc., Cary, NC). Graphs were created from the resulting parameter estimates and Wald 90% confidence limits using Microsoft Excel 2010, with the observed numbers of detection failures superimposed for comparison. In addition, parameter estimates and Wald 90% confidence limits from regression analyses were used to calculate the predicted numbers of detection failures given protein Smith-Waterman pairwise alignment scores listed in Tables 2.2-2.4 (Allison 1999). Results and discussion: An inventory of the presence of 13 component proteins predicted to catalyze the DNA strand exchange (SE) reaction (RPA1, RPA2, RPA3, Rad52, Rad59, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, Rad54, and Rdh54 (Krogh and Symington 2004; Sakaguchi et al. 2009)) among 47 diverse eukaryotes is presented here (Figure 2.1) (Adl et al. 2005). This inventory includes representatives of five of the six currently recognized eukaryotic supergroups (Adl et al. 2005) (Excavata, Chromalveolata, Archaeplastida, Opisthokonta, and Amoebozoa; but not Rhizaria). Completed genome sequence and other nucleotide databases (including ESTs) were rigorously searched using HMM profiles and PSSMs created from phylogenetically verified amino acid sequences 43 with HMMER v2.3.2 (Sonnhammer et al. 1998), PSI-BLAST, and tBLASTn (Altschul et al. 1997) (Figure 2.1). Identities of putative orthologs were confirmed with phylogenetic analyses (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008) performed with PhyML v3.0 (Guindon et al. 2009) and Phylobayes v3.1 (Lartillot, Lepage, and Blanquart 2009) (Figures 2.2-2.18). Although phylogenetic analysis of several SE proteins yielded poorly resolved phylogenies, the resolution is sufficient to establish orthology (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). To determine if either short amino acid sequences or substitution rate heterogeneity are causing phylogenetic artifacts (Felsenstein 2004), we analyzed the concatenated SE protein alignments (Figure 2.19) (Rokas et al. 2003). The concatenated protein sequence analyses strongly support the monophyly of several known groups: Opisthokonta, Chloroplastida, Stramenopila, Apicomplexa, Metamonada, Discoba and Amoebozoa (Adl et al. 2005). However, the unsupported topology, which places some Excavata with ciliates and Amoebozoa, are most likely due to methodological issues, such as long-branch attraction (Felsenstein 2004), and are unlikely to indicate cases of lateral gene transfer (Syvanen 1985). Ten SE components (RPA1, RPA2, Rad52, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1 and Rdh54 (Krogh and Symington 2004; Sakaguchi et al. 2009)) are present in every supergroup tested (Figure 2.1) (Adl et al. 2005). Therefore, they most likely arose very early during eukaryotic evolution, prior to the divergence of nearly all known eukaryotes (Dacks and Doolittle 2001; Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). The absence of the Rad54 gene from the eukaryotic supergroups Excavata and Amoebozoa indicates that it may have arisen later, after the divergence of Excavata or Amoebozoa from other eukaryotes (Figure 2.1 and Table 2.1) (Dacks and Doolittle 2001; Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). We detected the Rad59 gene in Opisthokonta and Amoebozoa that form the metagroup Unikonta (Cavalier-Smith 2002a); the most parsimonious 44 explanation is that Rad59 arose more recently during eukaryotic evolution than the last eukaryotic common ancestor, possibly in the last common ancestor of Unikonta (Dacks and Doolittle 2001; Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). Limits of sequence detection and distribution of strand exchange genes among eukaryotes We selected the three Replication Protein A (RPA) subunits (Sakaguchi et al. 2009) for phylogenetic comparison with the other SE proteins and with the subunits of RNA Polymerase I (Kuhn et al. 2007), with the goal of establishing a threshold for detection of their component proteins. Replication Protein A, a complex composed of RPA1 (70kDa), RPA2 (32kDa), and RPA3 (14kDa) subunits, binds ssDNA (Brill and Stillman 1991; Bochkareva et al. 2002; Brush 2002; Ishibashi et al. 2005; Ishibashi, Kimura, and Sakaguchi 2006; Chang et al. 2009; Sakaguchi et al. 2009). In humans, the RPA heterotrimer is critical DNA metabolic pathways, such as DNA replication, DNA repair, recombination, cell cycle, and DNA damage checkpoints (Zou et al. 2006). In yeast, RPA has been indicted in DNA replication, repair, and recombination (Wold 1997). RPA is also necessary for DNA damage repair in plants but may not be critical for DNA replication and homologous recombination in Oryza sativa (Ishibashi et al. 2005; Ishibashi, Kimura, and Sakaguchi 2006; Kimura and Sakaguchi 2006; Sakaguchi et al. 2009). During meiosis the RPA complex recruits Rad52 proteins during pre-synaptic filament formation in animals, fungi, and plants (Davis and Symington 2003; Krogh and Symington 2004; Sakaguchi et al. 2009). Two conserved domains (DBD-A and DBD-B) allow RPA1 proteins to bind ssDNA, preventing formation of secondary structures of DNA that inhibit SE (Wold 1997; Brush 2002). RPA1 monomers may bind ssDNA weakly (approximately 8 nucleotides) but binding of the RPA1 subunit interaction motif (DBD-C) with RPA2/RPA3 heterodimers (RPA2 DBD-D, RPA3 DBD-E) causes conformational changes that result in stable interactions (approximately 30 nucleotides) 45 (Bochkareva et al. 2002). The role of RPA3, which has a single binding motif (DBD-D), is currently unclear. In Saccharomyces cerevisiae RPA1 binds only to RPA2/RPA3 heterodimers (Sakaguchi et al. 2009). Inspection of amino acid sequences among animals, fungi, and plants indicates that RPA1 proteins sequences are longer and more conserved than RPA2 (Tables 2.2 and 2.3). In addition, the binding domains of RPA1 and RPA2 protein sequences (DBD - A-D and F) appear to be well conserved among all of the eukaryotes studied here (Figures 2.20-2.24). RPA3 has the least conserved protein sequences of the three subunits (Figure 2.25 and Tables 2.2-2.4), possibly attesting to differences in ssDNA binding among eukaryotes (Sakaguchi et al. 2009). Various RPA complexes have been observed but there are no known functions of any component outside of the trimerization core (Bochkareva et al. 2002). Also, there are no known mutant phenotype suppressors for any of the three components (Table 2.5). Together, these data indicate that RPA trimerization is likely to be required for successful SE in all eukaryotes. Therefore, apparently missing RPA subunit genes are best explained by limitations of the search methods used (i.e. type II error). We propose that a correlation exists between the numbers of RPA subunit gene sequence detection failures and their respective protein sequence amino acid lengths and degrees of conservation (Table 2.2); this possibility is explored further below. RPA1 protein sequences were obtained for all 47 organisms studied here, while genes that encode RPA2 and RPA3 homologs were not always identified in each study organism (Figures 2.1). Among the 34 organisms included in this study with at least one genome sequence of 8.0x whole-genome shotgun sequencing coverage, there are five RPA2 and ten RPA3 gene sequence detection failures. RPA1 and RPA2 genes were likely present in the last eukaryotic common ancestor due to their presence in every supergroup tested (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). The apparent absence of RPA3 genes from the Amoebozoa indicates that this subunit may have arisen later during eukaryotic evolution (Dacks and Doolittle 2001), 46 after the divergence of Amoebozoa from other eukaryotes. However, the Amoebozoa are not candidates as the earliest-diverging eukaryotes by current hypotheses for rooting the evolutionary tree of eukaryotes (Cavalier-Smith 2002a; Stechmann and Cavalier-Smith 2002; Roger and Simpson 2009; Cavalier-Smith 2010), making it more likely that RPA3 was lost in the common ancestor of D. discoideum and Entamoeba (Dacks and Doolittle 2001; Adl et al. 2005). To test the hypothesis that the numbers of RPA subunit sequence detection failures correlate with their protein sequence lengths and degrees of conservation, we compared RPA subunit protein sequence data to homologs of the ten RNA Polymerase I (Pol I) subunit protein sequences (A190, A135, AC40, AC19, AC12.2, Rpb5, Rpb6, Rpb8, Rpb10, and Rpb12 (Kuhn et al. 2007)) (Figures 2.3 and 2.26) by determining Smith-Waterman pairwise alignment (S-W) scores (Smith and Waterman 1981). We selected the S-W algorithm to score protein sequence length and conservation due to its ability to apply a similarity measure to protein sequences of variable lengths (Smith and Waterman 1981). S-W scores were calculated from pairwise alignment of protein sequences encoded by genes from Homo sapiens and S. cerevisiae genomes. Yeast and human gene products were selected for S-W score assessment on the basis that each genome contains all of the genes tested, providing consistency between comparisons. Poisson regression analysis (Cameron and Trivedi 1998) was performed on the numbers of sequence absences observed among 34 organisms with at least one genome sequence per supergroup of 8.0x whole-genome shotgun sequencing coverage, although Cyanidioschyzon merolae and Encephalitozoon cuniculi were included on the basis that their genomes have been sequenced from end-to-end and, therefore, are complete (Katinka et al. 2001; Matsuzaki et al. 2004) (Figure 2.1). Homo sapiens is also included since all SE, RPA and RNA Pol I component proteins were detected within its genome, although the reference human genome sequence has less than 8.0x sequence coverage (Venter et al. 2001). Regression analysis indicates that RNA Pol I S-W scores are good 47 predictors of the numbers of observed RNA Pol I subunit detection failures (p = 0.0017) (Figure 2.27-a.) (Allison 1999). The numbers of observed RPA subunit detection failures relative to their S-W scores are similar to the expected numbers of detection failures, as predicted by the 90% confidence interval for the regression of RNA Pol I data (Figure 2.27-b.). To increase the numbers of proteins included in the regression analyses, we then performed Poisson regression analyses on a combined RPA and RNA Pol I dataset (Figure 2.27-c.). Regression analysis of the combined dataset indicates that S-W scores are good predictors of RPA/ RNA Pol I subunit detection failures (p < 0.0001). We detected eight SE components (Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, and Rdh54) in all eukaryotic supergroups tested (Figure 2.1, black rows). Therefore, these components are all likely to have been present in the last eukaryotic common ancestor (Dacks and Doolittle 2001). We failed to detect the Rad54 gene in the Amoebozoa tested (Figure 2.1 and Table 2.1); our search for Rad54 genes was conducted in all available Amoebozoa sequence data (individual EST, nucleotide, and protein sequence submissions and incomplete genomes) and none was discovered. In addition, the Rad54 gene may be absent from Excavata; however, the genome coverages of Trichomonas vaginalis and several Leishmania species are below 8.0x whole-genome shotgun sequencing (Carlton et al. 2007; Aslett et al. 2010), reducing our confidence in this conclusion. The Rad59 gene appears to have arisen later during eukaryotic evolution, as it is present in only two supergroups (Opisthokonta and Amoebozoa) that form a monophyletic “metagroup” (Unikonta) (Figure 2.1) (Malik et al. 2008). The last common ancestor to extant eukaryotes may have had all of the components necessary for homologous recombination, despite the possible absence of Rad54 and Rad59 (Bai and Symington 1996; Klein 1997; Arbel, Zenvirth, and Simchen 1999). We then compared the numbers of observed SE component detection failures to those predicted by our analysis of the RNA Pol I/RPA dataset (Figure 2.27-c. and Table 2.2). Sequence detection failures for four SE components (Rad55, Rad57, Hop2, and 48 Mnd1) are within the 90% confidence interval of the predicted failure range. Interestingly, Rad55 and Rad57 proteins form heterodimers that stabilize Rad51-DNA filaments, and they are not known to function as either monomers or homodimers (Bleuyard, Gallego, and White 2006; Filippo, Sung, and Klein 2008). The same is true of Hop2 and Mnd1 proteins that stabilize Dmc1-DNA filaments (Chen et al. 2004; Henry et al. 2006). Therefore, the apparent absence of one of these proteins in the presence of the other as in Theileria annulata, Thalassiosira pseudonana, and Phaeodactylum tricornutum may be due to type II errors. Similarly, the absences of the genes that encode Hop2 and Mnd1 proteins and Dmc1 proteins may indicate that joint losses have occurred (e.g. Drosophila sp., Caenorhabditis sp., Neurospora crassa, Gibberella zeae, and Ustilago maydis) (Figure 2.1 and Table 2.1). Six SE components (Rad52, Rad59, Rad51, Dmc1, Rad54, and Rdh54) were detected less frequently than predicted among the taxa tested (Figure 2.2-c. and Table 2.2). Observed numbers of detection failures are 2-4 times higher than predicted for Rad52 and Rad59 genes and no sequence detection failures of Rad51, Dmc1, Rad54, and Rdh54 genes are predicted (Table 2.2). S-W scores may not adequately predict the numbers of detection failures due to either variation in genome coverage, or the true absence of genes from a genome. In order to minimize the effects of variation in genome quality, only organisms with completed genomes were used for this analysis (Figure 2.1) (Malik et al. 2008). Poisson regression analysis of the RPA/RNA Pol I dataset (Figure 2.27) indicates that there is a strong correlation between S-W scores and the numbers of absences observed (p < 0.0001) (Allison 1999). So, the effect of variation in genome quality is likely to be negligible among organisms with completed genomes. The quality of gene sequence annotations and true absences of genes from completed genomes are the most likely causes of a reduction in correlation between S-W scores and the numbers of detection failures. As mentioned previously, the Rad59 gene likely arose later during eukaryotic evolution (Malik et al. 2008), explaining some of the detection failures observed (Dacks 49 and Doolittle 2001). Similarly, some of the sequence detection failures of the Rad54 gene may be due to its emergence later during eukaryotic evolution (Dacks and Doolittle 2001) (after the divergence of Excavata and/or Amoebozoa (Adl et al. 2005)), although failures to detect the Rad54 gene among Chromalveolata, Archaeplastida, and Opisthokonta are most likely due to subsequent losses. All apparent absences of Rad51, Dmc1, and Rdh54 genes from the 34 genomes tested could indicate true gene losses. Despite the presence and inferred importance of SE component proteins in the earliest common ancestor of eukaryotes, lineage-specific losses seem pervasive; only some animals and fungi have retained all SE component proteins studied here. Suppressors of strand exchange component mutant phenotypes in Saccharomyces cerevisiae By interpreting our observed distributions of SE component proteins in comparison with functional studies of SE components in S. cerevisiae, it is possible that overexpression or mutation of Rad51 gene could suppress the mutation or loss of other SE protein-coding genes (Table 2.5). Extragenic suppressors of mutant phenotypes in S. cerevisiae are known for most of the SE components studied here except Rad51 (absent only in Giardia intestinalis) and RPA1-3 (Table 2.5). Overexpression of Rad51 gene in S. cerevisiae suppresses rad52, dmc1, rad55, rad57, hop2, and mnd1 mutant phenotypes (Milne and Weaver 1993; Klein 1997; Krejci et al. 2002; Tsubouchi and Roeder 2003; Henry et al. 2006; Schild and Wiese 2009). In addition, S. cerevisiae rad51 mutants demonstrate decreased recombination and reduced viability (Bishop 1994; Tsuzuki et al. 1996; Bleuyard, Gallego, and White 2006). These characteristics may account for the infrequent loss of Rad51 gene from eukaryotes. In fungi, rad52 mutants are perhaps the most deleterious of the SE machinery, but Rad52 genes may be absent from as many as 18 of 47 genomes, representing four of the five eukaryotic supergroups in our study (Feng et al. 2007). This apparent contradiction may be explained if compensatory changes in Rad51 gene overcome the inhibitory 50 effects of single stranded DNA-RPA complexes in nature (Milne and Weaver 1993; Krejci et al. 2002). Although critical for meiosis in many organisms studied, several organisms appear to be missing genes that encode Dmc1 (Bishop 1994; Bishop et al. 1999; Tsubouchi and Roeder 2003). However, in S. cerevisiae Rad51 protein is capable of completing homologous strand exchange during meiosis when Rad51 gene is overexpressed and does so without the assistance of Hop2 or Mnd1 proteins, which work in concert with Dmc1 (Table 2.2) (Tsubouchi and Roeder 2003). In addition, dmc1 mutant phenotypes may be suppressed with high copy numbers of the Rad54 gene, reducing the increased numbers of Rad51 foci that form, as in S. cerevisiae (Bishop 1994; Bishop et al. 1999). We cannot distinguish between the possibility that absences of Rad55 and Rad57 genes in our analyses are real or that their apparent absences are artifacts of the search methods used. However, if Rad51 gene expression is increased, it is possible that enough protein is available for successful pre-synaptic filament formation despite the destabilizing effects that rad55 and rad57 mutations may have on recombination, as in S. cerevisiae (Fung, Mozlin, and Symington 2009). This observation is consistent with the suppression of rad55 and rad57 mutant phenotypes by compensatory changes in the Rad52 gene, encoding products that recruit Rad51 to form pre-synaptic filaments in S. cerevisiae (Milne and Weaver 1993). The numbers of Hop2 and Mnd1 gene absences may also be due to failures of the search methods used, however it is interesting that compensatory changes in Rad51 gene also suppresses hop2 or mnd1 mutant phenotypes in S. cerevisiae, possibly by creating more Dmc1 foci (Bishop 1994). Alternatively, the elimination of some Rad51 protein functions suppresses S. cerevisiae mutant phenotypes of rad54 and rdh54 mutants (Klein 1997). The detrimental effects of rad54 and rdh54 mutants are almost certainly the result of accumulation of Rad51 proteins on DNA, since mutants of rad52, rad51, rad55, and rad57 suppress these 51 phenotypes, eliminating or reducing the number of Rad51 proteins bound to ssDNA in S. cerevisiae (Klein 1997). Finally, the Rad59 gene, a paralog of Rad52, encodes a protein that appears to overlap functionally with Rad52 but cannot suppress a rad52 mutant phenotype in S. cerevisiae (Bai and Symington 1996). The rad59 mutant confers the most benign mutant phenotype, mildly defective mitotic recombination and decreased resistance to ionizing radiation, which is suppressed by Rad52 overexpression (Davis and Symington 2001; Davis and Symington 2003; Pannunzio, Manthey, and Bailis 2008), possibly leading to frequent losses. Conclusions We found that 10 of 13 strand exchange reaction components (RPA1, RPA2, Rad52, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, and Rad54 (Krogh and Symington 2004; Sakaguchi et al. 2009)) are present in all of the eukaryotic supergroups (Adl et al. 2005) scrutinized and thus are likely to have been present in the last common ancestor to extant eukaryotes (Figure 2.1) (Dacks and Doolittle 2001). It is possible that one component (Rad54) may have arisen later during eukaryotic evolution (Dacks and Doolittle 2001), after the divergence of either Amoebozoa or Excavata (if either are the earliest diverging eukaryotes) (Adl et al. 2005). It is likely that Rad59 arose later during eukaryotic evolution (Dacks and Doolittle 2001), after the divergence of the Unikonta (composed of Opisthokonta and Amoebozoa) (Cavalier-Smith 2002a) from other eukaryotes. Rad54 (Petukhova, Stratton, and Sung 1998; Petukhova et al. 1999; Kiianitsa, Solinger, and Heyer 2002) and Rad59 (Bai and Symington 1996; Davis and Symington 2003; Pannunzio, Manthey, and Bailis 2008) proteins are both thought to function primarily during sister chromatid or intrachromosomal recombination. The requirement for trimerization of RPA protein subunits appears to be conserved among eukaryotes, as heterotrimers are observed among animals, fungi, and plants (Sakaguchi et al. 2009). The presence of RPA1 subunits in every organism studied 52 here strongly implies that RPA2 and RPA3 must also be present in these organisms; thus any apparent absences are inferred to be the result of search method detection limits (Figure 2.1). Detection of nucleotide and protein sequences is influenced by the length or degree of sequence conservation in different organisms (Pevsner 2009). The SmithWaterman pairwise alignment algorithm (S-W) scores protein sequence data using a similarity measure that incorporates protein sequence length (Smith and Waterman 1981). Thus, we hypothesized that the number of detection failures is correlated with protein S-W scores. We tested this hypothesis with searches of RNA Pol I core complex subunits (Figure 2.26) (Kuhn et al. 2007). In addition, we determined RNA Pol I subunit S-W scores with pairwise alignments of human and S. cerevisiae gene (Figures 2.1 and 2.26). Poisson regression analyses (Allison 1999) indicate that there is a strong correlation between S-W scores and the number of undetected sequences among RNA Pol I proteins (Figure 2.27-a. and Table 2.2). Furthermore, the number of detection failures predicted by the RNA Pol I regression analysis for the RPA subunits is similar to the observed numbers of RPA subunit detection failures (Table 2.2) (Allison 1999). These analyses indicate that absences among RPA components are likely due to failures of detection and may not represent true losses. We then combined the RNA Pol I and RPA and performed additional regression analyses (Figure 2.27-b.) and compared the numbers of predicted detection failures relative to S-W scores for the remaining SE components (Figure 2.27-c and Table 2.2). More detection failures were observed than predicted by the Pol I/RPA data for six SE components (Rad52, Rad59, Rad51, Dmc1, Rad54, and Rdh54), these absences may represent true losses of genes from genomes. Complicating the inference of the early origins of SE component proteins are the frequent absences of SE protein-coding genes observed among diverse eukaryotes (Figure 2.1 and Table 2.1). Only eight organisms (all Opisthokonts (Adl et al. 2005) encode all of the SE proteins studied: Homo sapiens, Mus mus, Gallus gallus, Xenopus laevis, Danio rerio, Nematostella vectensis, Saccharomyces cerevisiae, and 53 Kluyveromyces lactis. The organism with the fewest SE protein-coding genes (Caenorhabditis elegans with only four) is also an Opisthokont (Figure 2.1) (Adl et al. 2005). Therefore, animals represent the greatest range in the number of SE proteins encoded. The common ancestor to the putatively early diverging eukaryotes (Trichomonas vaginalis and Giardia intestinalis) (Woese, Kandler, and Wheelis 1990)most likely had at least nine SE genes present in its genome (RPA1, RPA2, RPA3, Rad52, Rad51, Dmc1, Rad57, Hop2, and Mnd1). In nature, frequent losses of SE protein coding genes may be facilitated by mutations in Rad51 gene (Schild and Wiese 2009). Overexpression or mutation of Rad51 genes suppresses the mutant phenotypes of several SE protein coding genes (rad52, rad59, rad55, rad57, dmc1, hop2, mnd1, rad54, and rdh54) in S. cerevisiae (Table 2.5). Rad51 mutations could result in the relaxation of selection on SE proteins, leaving the genes that encode them vulnerable to loss (Nei and Kumar 2000; Ridley 2004). Furthermore, when component proteins function in complexes, such as Hop2 and Mnd1 (Henry et al. 2006), loss of the gene encoding one component may expedite the loss of the gene encoding its partner protein, i.e. the genes that encode obligate complexes likely evolved together by coevolution and rely on one another to have functional value (Goh et al. 2000). As the Hop2-Mnd1 heterodimer is known only to function during meiosis, interacting with Dmc1-DNA filaments (Chen et al. 2004), the loss of the Dmc1 gene may hasten the loss of both Hop2 and Mnd1 genes. Alternately, it is imaginable that Hop2 and Mnd1 proteins could interact with Rad51 proteins in organisms missing Dmc1 (e.g. Encephalitozoon cuniculi and Paramecium tetraurelia) (Figure 2.1). Although the protein components of the strand exchange reaction appear ubiquitously across eukaryotes (Adl et al. 2005), likely present in their last common ancestor (Dacks and Doolittle 2001), the manner in which SE proceeds may vary greatly due to subsequent loss of a few component proteins. All extant eukaryotes inherited a 54 complex of SE machinery that has been differentially retained over evolutionary time since the last eukaryotic common ancestor. 55 Figure 2.1: Phylogenetic distribution among eukaryotes of DNA strand exchange genes. The names of genera studied are listed. Asterisks indicate organisms with completed genomes (on the basis that they have at least one isolate genome-sequence with 8.0x whole-genome shotgun coverage or a genome that was sequenced from end-to-end). Supergroups are presented in black rows with a summary of the genes deduced to be present in their common ancestor and grey rows provide summaries of major Opisthokont lineages. Light grey columns designate RPA1-3, which were used to determine the thresholds of detection. Meiosis-specific proteins are presented in dark grey columns. Symbols: ‘+’ indicates sequence was found and phylogenetically verified, ‘(-)’ indicates that sequence was not found and may be outside the threshold of detection, blank spaces indicate sequences were not found and the genome project has less than the equivalent of 8.0x Sanger whole genome shotgun coverage, ‘-‘ indicates sequence was not found, is within the calculated threshold of detection and the genome project has ≥ 8.0X coverage. The tree is a cartoon that summarizes current literature (Simpson, Inagaki, and Roger 2006; Baldauf 2008; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Kolisko et al. 2008; Timmermans et al. 2008; Minge et al. 2009; Reeb et al. 2009; Shadwick et al. 2009). 56 EXCAVATA Giardia* Trichomonas Trypanosoma* Leishmania Naegleria* CHROMALVEOLATA Plasmodium* Theileria* Cryptosporidium* Tetrahymena* Paramecium* Thalassiosira* Phaeodactylum* Phytophthora* ARCHAEPLASTIDA Arabidopsis* Oryza Physcomitrella* Chlamydomonas* Ostreococcus* Cyanidioschyzon* OPISTHOKONTA HOLOZOA Homo* Mus Monodelphis Gallus Xenopus Danio* Strongylocentrotus* Aedes Drosophila* Caenorhabditis* Apis Tribolium Nematostella Trichoplax* Monosiga* FUNGI Saccharomyces* Kluyveromyces Candida albicans* Neurospora* Gibberella* Magnaporthe Aspergillus* Schizosacch.* Coprinus* Ustilago* Encephalitozoon* AMOEBOZOA Dictyostelium* Entamoeba* Rpa1 Rpa2 Rpa3 Rad52 Rad59 Rad51 Rad55 Rad57 Dmc1 Hop2 Mnd1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) (-) (-) (-) + + + + + + + + + + + + + + + + + + (-) (-) (-) (-) + + + + + + (-) (-) + + + + + + + + + + + + + (-) + + (-) (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + (-) + + + (-) + + + + (-) + + + + + + + (-) + + + + + + + + + + + (-) + + + (-) (-) + + + + + + + + + + + + (-) (-) (-) (-) + + (-) (-) (-) (-) (-) + + + + (-) (-) (-) (-) + + + + + + + + + + (-) (-) (-) (-) (-) + + + + + + + + + + + + + + + + + + + + (-) + + + + (-) (-) (-) (-) (-) (-) (-) + + + (-) + + + + + + + + (-) (-) (-) (-) + + + + + + (-) + (-) + + + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + (-) (-) + + + + + + + + + + + + + + + + + + + + (-) (-) + + + (-) + + + + + + (-) (-) + + (-) (-) + + + + + + + + + + + + + + + + (-) + + + (-) (-) Rad54 Rdh54 - + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 57 Mus 74143871 Homo 17390283 Monodelphis 126314233 809/1.00 729/0.98 Gallus 57525314 1000/1.00 Xenopus 147901418 415/0.96 Danio 49619041 Strongylocentrotus 115958731 288/0.57 Apis 110756775 1000/0.99 595/0.52 Tribolium 91094635 Aedes 157133641 672/0.99 982/1.00 Drosophila 195390608 953/0.95 Trichoplax 196010750 643/0.52 Nematostella 156369841 Monosiga 167519220 592/0.49 Saccharomyces 6319321 1000/1.00 1000/0.99 Kluyveromyces 50302901 Candida albicans 68472948 904/0.90 Ustilago 71022145 744/0.92 Coprinus 169852738 327/0.98 Schizosaccharomyces 213408425 Aspergillus 145234512 930/0.98 Gibberella 46137213 1000/0.99 Neurospora 85102469 705/0.99 -/0.78 Magnaporthe 39974337 Oryza 115449015 1000/0.99 1000/0.99 Arabidopsis 15225129 889/0.98 Physcomitrella 168050100 Ostreococcus 145353884 338/0.37 873/1.00 Chlamydomonas 159491651 Cyanidioschyzon 151559128 172/0.16 Trichomonas 154413577 714/0.17 Caenorhabditis 17533299 -/0.23 Encephalitozoon 19074669 960/1.00 996/1.00 Rpa1 Giardia 253747183 Entamoeba 67468384 Dictyostelium 66804925 -/0.39 -/0.08 Tetrahymena 146163802 487/0.72 Paramecium 145514039 -0.25 Cryptosporidium 209879421 Theileria 84995782 573/0.90 999/0.97 Plasmodium 68077039 Thalassiosira 224006215 1000/1.00 Phaeodactylum 219123923 836/0.89 Phytophthora capsici jgi-116852 327/0.98 Naegleria jgi-45814 Trypanosoma 71756127 595//0.93 1000/1.00 Leishmania 40317150 -/0.11 427/0.28 0.5 substitutions/site Figure 2.2: Unrooted phylogenetic tree of 47 Replication Protein A – 1 (RPA1) homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+I+G+F) from 506 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 58 Homo 4506585 Mus 13435424 823/0.72 Monodelphis 126328771 922/0.92 Gallus 71894737 896/0.95 Xenopus 55742354 -/0.46 Danio 63102323 613/0.37 Strongylocentrotus 115929083 Nematostella 162101610 -/0.44 Aedes 157104136 970/0.90 745/0.80 Drosophila 194766307 -/0.48 Tribolium 91088823 817/0.99 Apis 110748861 -/0.37 Caenorhabditis 157746001 Trichoplax 196015539 Gibberella 46136417 690/0.69 865/0.92 Neurospora 85105463 970/0.99 Magnaporthe 145612441 601/0.90 Aspergillus 145240461 Coprinus 169853110 423/0.90 454/0.67 Schizosaccharomyces 63054444 Ustilago 71014541 Saccharomyces 6324017 1000/1.00 -/0.79 Kluyveromyces 50308557 924/0.95 Candida albicans 68482450 Entamoeba 167380559 339/0.43 124/0.31 Naegleria jgi-scaffold 32 183881-184777 -/0.15 Trichomonas 123482304 -/0.20 Leishmania 146081832 1000/1.00 Trypanosoma 71410456 -/0.33 266/0.26 Ostreococcus 145353240 Chlamydomonas 159483541 14/0.09 Arabidopsis 82621223 -/0.23 571/0.37 760/0.80 Oryza 9801268 Physcomitrella 168062552 231/0.53 Encephalitozoon 19074286 Monosiga jgi-36751 -/0.25 Phaeodactylum 219112023 986/0.99 Thalassiosira 224001160 -/0.08 Cyanidioschyzon 151559134 Theileria 71029898 357/0.32 918/0.97 Cryptosporidium 209882741 Phytophthora capsici jgi-116202 807/0.71 945/0.99 Rpa2 0.5 substitutions/site Figure 2.3: Unrooted phylogenetic tree of 42 Replication Protein A – 2 (RPA2) homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 158 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”)refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 59 Aspergillus 259480423 Gibberella 46130890 Magnaporthe 145607133 48/0.26 768/0.79 669/0.84 Neurospora 85075943 Schizosaccharomyces 19075628 Rpa3 Ustilago 71020163 -/0.18 Cryptosporidium 62638182 Thalassiosira 223992931 -/0.43 393/0.63 Coprinus 169850994 -/0.03 Candida albicans 238878621 565/0.89 Kluyveromyces 50302705 1000/1.00 Saccharomyces 6322288 -/0.12 Plasmodium – ApiDB 130920 Danio 47940029 481/0.76 Xenopus 166796524 523/0.69 Homo 4506587 810/0.98 Mus 13386122 881/0.96 734/0.96 Monodelphis 126343264 -/0.53 Gallus 50732531 Nematostella 156406582 32/0.69 Tribolium 91090226 477/0.91 Aedes 157115696 162/0.95 Drosophila 21064755 -/0.48 -/0.08 Strongylocentrotus 115930113 216/0.34 Apis 110761258 Chlamydomonas 159475703 Ostreococcus 144578945 -/0.14 Physcomitrella 168007302 -/0.29 -/0.05 Oryza 113532144 740/0.80 884/0.84 Arabidopsis 145332815 Tetrahymena 146171005 305/0.53 Paramecium 145508173 Phytophthora 262112589 15/0.07 Encephalitozoon 19074517 433/0.45 49/0.20 Naegleria jgi – scaffold 8 -/0.17 Trichomonas 123475178 724/0.99 0.5 substitutions/site Figure 2.4: Unrooted phylogenetic tree of 36 Replication Protein A – 3 (RPA3) homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 79 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 60 Magnaporthe 39974337 Neurospora 85102469 Gibberella 46137213 927/0.99 Aspergillus 45234512 925/0.98 Schizosaccharomyces 213408425 Coprinus 169852738 728/0.93 941/0.93 Ustilago 71022145 Candida albicans 68472948 1000/0.99 Kluyveromyces 50302901 1000/0.99 Saccharomyces 6319321 Gallus 57525314 721/0.99 Monodelphis 126314233 993/1.00 776/0.98 Homo 17390283 969/1.00 623/0.68 Mus 74143871 1000/0.99 Xenopus 147901418 420/0.95 Danio 49619041 Strongylocentrotus 115958731 314/0.58 Apis 110756775 999/1.00 Tribolium 91094635 523/0.56 Drosophila 195390608 679/0.99 992/1.00 Aedes 157133641 Nematostella 156369841 969/0.99 637/0.63 Trichoplax 196010750 Monosiga 167519220 Arabidopsis 15225129 999/0.99 1000/0.99 Oryza 115449015 880/0.99 Physcomitrella 168050100 Chlamydomonas 159491651 880/0.99 521/0.38 Ostreococcus 145353884 Cyanidioschyzon 151559128 -/0.32 Trichomonas 154413577 Entamoeba 67468384 781/0.38 Dictyostelium 66804925 -/0.44 331/0.50 Paramecium 145514039 474/0.70 Tetrahymena 146163802 -/0.28 Cryptosporidium 209879421 Theileria 84995782 593/0.89 997/0.98 Plasmodium 68077039 Phytophthora capsici jgi-116852 808/0.93 Phaeodactylum 219123923 1000/1.00 Thalassiosira 224006215 -/0.97 Leishmania 40317150 1000/1.00 Trypanosoma 71756127 635/0.93 Naegleria jgi-4814 -/0.69 742/0.97 1000/1.00 Rpa1 0.5 substitutions/site Figure 2.5: Unrooted phylogenetic tree of 44 Replication Protein A – 1 (RPA1) homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+I+G+F) from 506 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 61 Homo 75516404 Mus 148667208 351/0.71 Monodelphis 126340233 303/0.49 Gallus 730466 998/0.99 Xenopus 14822632 573/0.85 Danio 66773042 996/0.99 Trichoplax 196010005 Rad52 Strongylocentrotus 115916111 -/0.45 Nematostella jgi - 38126 Coprinus 169851317 453/0.70 Ustilago 71021811 Magnaporthe 52783233 702/0.58 -/0.46 Neurospora 85100991 868/0.99 Gibberella 46137285 -/0.15 839/0.98 Aspergillus 70994704 472/0.33 Schizosaccharomyces 19114119 Candida albicans 68489792 Kluyveromyces 403011 992/0.99 -/0.15 555/0.96 Saccharomyces 27808713 Encephalitozoon 85014303 -/0.25 Monosiga 167525549 -/0.52 Phaeodactylum 219126773 1000/1.00 Thalassiosira 224014646 -/0.73 Giardia 159112704 Naegleria jgi - 59017 -/0.51 Phytophthora ramorum jgi - 96312 Entamoeba 67476176 -/0.68 -/0.39 719/0.86 Dictyostelium 66825177 Cyanidioschyzon 151559144 788/0.88 786/0.82 0.2 substitutions/site Figure 2.6: Unrooted phylogenetic tree of 29 Rad52 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 127 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 62 Mus 6755276 Monodelphis 126277684 Homo 19924133 Gallus 299819 964/0.99 Xenopus 62858453 418/0.69 Danio 47086005 -/0.60 Stpu115610 Nematostella 156342885 -/0.75 -/0.31 Monosiga – jgi 6000172 -//0.29 Trichoplax – scaffold 6 Tribolium 91080301 -/47 70/0.26 Aedes 157112162 499/0.74 Apis 110756953 262/0.44 Drosophila 17864108 Dictyostelium 66822135 Ustilago 71018413 741/0.81 Coprinus 3237296 -/0.19 Schizosaccharomyces 397843 955/0.99 Saccharomyces 4275 -/0.31 959/0.99 Kluyveromyces 50309711 1000/1.00 722/0.71 Candida albicans 68485285 Aspergillus 83774056 532/0.89 Gibberella 46108550 -/0.18 1000/0.99 998/0.99 Neurospora 28926929 Magnaporthe 145614388 -/0.58 Cael378640 Entamoeba 67477127 Arabidopsis 18420327 760/0.53 960/0.99 Oryza 18874071 485/0.47 Physcomitrella 16605579 -/0.61 Chlamydomonas 45685351 Ostreococcus 145349400 Phytophthora sojae – jgi 1108595 258/0.76 Phaeodactylum 219119366 -/0.56 Thalassiosira – jgi 2|665690|666833 -/0.08 1000/0.99 Paramecium 145492218 948/0.99 Tetrahymena 118355624 360/0.32 Encephalitozoon 19069607 -/0.09 Naegleria – jgi 63|193771|194715 193/0.35 Leishmania 146091679 Trypanosoma 37778910 1000/0.99 Trichomonas 123408472 Theileria 84996361 567/0.66 1000/0.99 Plasmodium 124803581 -/0.47 Cryptosporidium 66357650 574/0.64 Cyanidioschyzon 151559143 -/0.67 -/0.53 311/0.49 725/0.93 Rad51 0.1 substitutions/site Figure 2.7: Unrooted phylogenetic tree of 46 Rad51 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 312 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 63 Mus 6755276 Monodelphis 126277684 308/0.58 Gallus 299819 695/0.96 Homo 19924133 961/1.00 Xenopus 62858453 430/0.74 Danio 47086005 -/0.37 Strongylocentrotus 115610811 -/0.66 Neve162070 231/0.75 Monosiga – jgi 6000172 Trichoplax – jgi 6|2098752|2100304 172/0.56 Drosophila 17864108 -/0.43 Tribolium 91080301 -/0.41 Aedes 157112162 198/0.54 544/0.72 Apis 110756953 Ustilago 71018413 697/0.89 Coprinus 3237296 Schizosaccharomyces 397843 931/0.100 Saccharomyces 4275 1000/1.00 953/1.00 -/0.44 Kluyveromyces 50309711 716/0.74 Candida albicans 68485285 518/0.85 Aspergillus 83774056 Magnaporthe 145614388 -/0.42 1000/1.00 Neurospora 28926929 999/1.00 782/0.56 Gibberella 46108550 Dictyostelium 66822135 Entamoeba 67477127 Trypanosoma 37778910 1000/1.00 Leishmania 146091679 -/0.33 Naegleria – jgi 63|193771|194715 Theileria 84996361 473/0.45 1000/1.00 Cryptosporidium 66357650 Plasmodium 124803581 139/0.21 Tetrahymena 118355624 956/1.00 Paramecium 145492218 Thalassiosira – jgi 2|665690|666833 1000/1.00 100/0.25 756/0.89 Phaeodactylum 219119366 Phytophthora sojae – jgi 1108595 438/0.50 Chlamydomonas 45685351 Physcomitrella 16605579 598/0.62 Oryza 18874071 911/0.99 769/0.71 Arabidopsis 18420327 -/0.59 349/0.56 Rad51 0.1 substitutions/site Figure 2.8: Unrooted phylogenetic tree of 41 Rad51 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 312 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 64 Mus 31543969 Homo 4885657 867/0.97 Monodelphis 126341244 845/0.65 Gallus 50732239 594/0.96 Danio 62202848 Rad55 529/0.38 Xenopus 49118098 Trichoplax 196003852 540/0.54 100/0.31 Nematostella 156376533 Strongylocentrotus 72152235 75/0.25 Aedes 157108777 -/0.43 -/0.33 Drosophila g. – FlyBase 155112 396/0.65 Tribolium 158703267 787/0.65 Apis 95104231 -/0.27 Plasmodium 68075425 429/0.82 Cryptosporidium 66358788 Dictyostelium 66803939 Saccharomyces 1321666 987/0.99 -/0.24 Kluyveromyces 50307995 Tetrahymena 146183407 Schizosaccharomyces 19114516 -/0.25 -/0.50 Neurospora 164423281 Gibberella 46136401 969/0.95 637/0.75 Magnaporthe 39951655 -/0.14 30/0.56 Aspergillus 83765373 352/0.35 Leishmania 146079674 -/0.13 Coprinus 169859075 328/0.46 173/0.70 Trypanosoma 71409752 Theileria 71028890 Naegleria – jgi scaffold 54000039 Chlamydomonas – jgi 413487 Ostreococcus 145356051 -/0.30 Physcomitrella 168016885 284/0.48 Oryza 125528524 839/0.97 -/0.54 Arabidopsis 30698040 897/1.00 903/0.99 0.5 substitutions/site Figure 2.9: Unrooted phylogenetic tree of 34 Rad55 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 125 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 65 Mus 148686665 Homo 20140428 Xenopus 62859281 740/0.99 942/0.99 Monodelphis 126290417 -/0.50 Gallus 50748752 500/0.58 Danio 55251032 -/0.53 Strongylocentrotus 115677903 436/0.40 Trichoplax jgi - 61282 558/0.63 Nematostella 162106246 Apis 110760303 496/0.46 Drosophila 125775489 -/0.55 Aedes 157109848 618/0.82 Tribolium 91082871 Magnaporthe 39978201 766/0.84 -/0.58 Neurospora 16416086 929/0.98 Aspergillus 41581328 646/0.91 Gibberella 46107922 Schizosaccharomyces 19114539 372/0.40 Candida albicans 68479930 399/0.73 Paramecium 145526625 -/0.60 -/0.45 Kluyveromyces 50309463 980/1.00 Saccharomyces 6320207 Monosiga jgi - 22811 Coprinus 169853855 201/0.13 Ustilago 71018023 Leishmania 154340868 989/0.99 -/0.04 -/0.21 Trypanosoma 71407982 -/0.10 Chlamydomonas jgi - 514873 Thalassiosira jgi - 2935 -/0.10 988/0.99 Phaeodactylum jgi - 53756 -/0.01 270/0.27 Ostreococcus 116056847 Oryza 50909545 832/0.85 Arabidopsis 15242137 -/0.02 -/0.29 Trichomonas 123402061 -/0.06 Plasmodium 68068013 192/0.25 431/0.71 Cryptosporidium 126644246 Phytophthora ramorum jgi - 84214 -/0.16 -/0.12 Dictyostelium 66810419 Cyanidioschyzon 151559139 -/0.04 Naegleria jgi – scaffold 18000071 Giardia 71079596 Entamoeba 167379316 988/0.99 773/0.70 Rad57 0.5 substitutions/site Figure 2.10: Unrooted phylogenetic tree of 42 Rad57 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 119 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 66 Monodelphis 126290417 Gallus 50748752 824/0.66 Xenopus 62859281 Mus 148686665 915/0.99 988/0.99 Homo 20140428 502/0.65 Danio 55251032 -/0.42 Strongylocentrotus 115677903 384/0.38 Trichoplax jgi 61282 491/0.57 Nematostella 162106246 Rad57 487/0.77 Apis 110760303 Tribolium 91082871 Aedes 157109848 622/0.96 -/0.96 Drosophila 125775489 743/0.77 Magnaporthe 39978201 -/0.73 -/0.47 Neurospora 16416086 946/0.95 Aspergillus 41581328 607/0.84 Gibberella 46107922 729/0.48 Schizosaccharomyces 19114539 35/0.16 Candida albicans 68479930 Kluyveromyces 50309463 491/0.69 268/0.74 996/0.99 Saccharomyces 6320207 21/0.06 Monosiga jgi - 22811 Coprinus 169853855 Dictyostelium 66810419 204/0.26 Cryptosporidium 126644246 Ustilago 71018023 -/0.01 Phytophthora ramorum jgi - 84214 -/0.23 Naegleria jgi – scaffold 18000071 -/0.25 Cyanidioschyzon 151559139 128960|127722| Trichomonas 123402061 987/0.99 Trypanosoma 71407982 343/0.21 Leishmania 154340868 -/0.01 Chlamydomonas jgi - 514873 Ostreococcus 116056847 -/0.04 820/0.91 Arabidopsis 15242137 -/0.06 Oryza 50909545 -/0.28 Phaeodactylum jgi - 53756 989/0.99 Thalassiosira jgi - 2935 -/0.44 744/0.94 0.5 substitutions/site Figure 2.11: Unrooted phylogenetic tree of 38 Rad57 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 119 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 67 Homo 13878923 Mus 6753650 646/0.89 Monodelphis 126339552 Gallus 118082782 616/0.65 Xenopus – jgi 69|1109612|1121870 -/0.57 401/0.76 Danio 63852092 Strongylocentrotus 115660762 90/0.95 Tribolium 91078458 502/0.47 133/0.26 Nematostella 156342885 Monosiga – jgi 11|659650|660866 4/0.87 94/0.49 Trichoplax – jgi 52181 Coprinus 6714639 Schizosaccharomyces 3176384 705/0.88 Aspergillus 121709155 601/0.75 Candida albicans 1706446 701/0.88 Kluyveromyces 50311197 890/0.92 1000/0.99 Saccharomyces 118683 Entamoeba 67482427 74/0.20 Arabidopsis 21903409 Physcomitrella – jgi 9|1771650|1771838 573/0.99 -/0.28 373/0.55 Oryza 18700485 Ostreococcus 145352283 383/0.72 -/0.51 Chlamydomonas 158272235 -/0.41 Phytophthora sojae – jgi 108|233543|235442 Tetrahymena 118382143 Theileria 71028324 669/0.72 936/0.99 Cryptosporidium 209879790 Plasmodium 156097941 213/0.41 Trypanosoma 71659624 999/0.99 Leishmania 72549845 -/0.26 Naegleria – jgi 1|500453|501457 Trichomonas 123408121 -/0.24 Giardia 30578211 -/0.36 469/0.52 Cyanidioschyzon 151559145 627/0.89 399/0.95 Dmc1 0.2 substitutions/site Figure 2.12: Unrooted phylogenetic tree of 34 Dmc1 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 312 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 68 Homo 7706577 Gallus 118103014 Mus 74225665 -/0.81 Monodelphis 126307900 -/0.84 Xenopus 148235831 Hop2 -/0.27 Danio 50344904 Monosiga 167533307 582/0.88 Nematostella 156406634 Strongylocentrotus 115774724 -/0.45 Apis 110748910 -/0.17 -/0.31 Trichoplax 196002715 232/0.24 Tribolium 270009699 205/0.26 607/0.65 Aedes 157114328 8/0.21 Entamoeba 67474883 Saccharomyces 9755333 -/0.37 671/0.99 Candida albicans 68481069 Kluyveromyces 50308877 196/0.42 Coprinus 116508641 Schizosaccharomyces 67989864 -/0.52 -/0.51 Aspergillus 67525977 Encephalitozoon 85691065 -/0.24 Dictyostelium 66813152 -/0.10 Cryptosporidium 209878810 Trichomonas 123468375 Phytophthora sojae – jgi 130587 -/0.46 -/0.11 Plasmodium 124805558 -/0.60 Paramecium 145484139 -/0.06 -/0.84 Tetrahymena 146164587 -/0.05 Giardia 71069891 Leishmania 73544615 -/0.41 -/0.06 Trypanosoma 71655487 990/0.99 Naegleria – jgi 3930 206/0.16 Cyanidioschyzon 151559141 Arabidopsis 15222250 846/0.77 499/0.56 Oryza 108710703 410/0.64 Physcomitrella 168061248 Ostreococcus 116057249 -/0.26 Chlamydomonas 159474466 -/0.75 -/0.47 -0.96 0.5 substitutions/site Figure 2.13: Unrooted phylogenetic tree of 38 Hop2 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 105 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 69 Homo 14149769 Mus 20380711 -/Danio 68354610 -/885/0.19 Monodelphis 126331473 Xenopus 147905528 -/Gallus 118089748 Nematostella 156402283 -/-/0.40 Strongylocentrotus 7211568 Trichoplax 196010958 -/Monosiga 167526144 -/-/Schizosaccharomyces 213408226 Cryptosporidium 209880586 Plasmodium 258549228 -/Encephalitozoon 19074771 -/Apis 66516862 979/0.97 Kluyveromyces 50306007 -/Saccharomyces 27808704 Theileria 71027623 -/Coprinus 116508756 -/Candida albicans 68487528 -/Aspergillus 6752273 913/0.99 -/Trypanosoma 71410776 Leishmania 68126682 -/Cyanidioschyzon 151559132 -/Aedes 157167978 -/Dictyostelium 66802388 -/Tetrahymena 118364583 Entamoeba 67470301 -/Tribolium 91088571 817/0.95 Oryza 115478342 997/0.95 Arabidopsis 30688234 Physcomitrella 168015455 -/997/0.95 Chlamydomonas 159474610 Ostreococcus 145350550 -/Phytophthora ramorum jgi - 97076 585/0.67 1000/1.00 Phaeodactylum 219115844 -/Thalassiosira 224014939 Paramecium 145503596 Giardia 71072040 -/-/Naegleria jgi - 31971 -/Trichomonas 154413267 -/- Mnd1 0.2 substitutions/site Figure 2.14: Unrooted phylogenetic tree of 41 Mnd1 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 137 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 70 Mus 20380711 Homo 14149769 Monodelphis 126331473 Mnd1 864/0.16 Xenopus 147905528 Gallus 118089748 -/0.32 Danio 68354610 -/0.20 Ostreococcus 145350550 -/0.14 Monosiga 167526144 -/0.31 -/0.05 Trichoplax 196010958 Strongylocentrotus 72111568 -/0.42 Nematostella 156402283 -/0.05 Entamoeba 67470301 -/0.17 Aspergillus 67522773 Aedes 157167978 341/0.35 Tribolium 91088571 -/0.01 Apis 66516862 -/0.01 -/0.08 Cyanidioschyzon 151559132 Encephalitozoon 19074771 -/0.16 Physcomitrella 168015455 Oryza 115478342 992/0.99 Arabidopsis 30688234 784/0.95 Kluyveromyces 503060 988/0.98 683/0.87 Saccharomyces 27808704 -/0.37 Candida albicans 68487528 Dictyostelium 66802388 -/0.09 Naegleria jgi - 31971. Leishmania 68126682 -/0.35 Trypanosoma 71410776 999/0.99 Cryptosporidium 209880586 Paramecium 145503596 588/0.84 -/0.68 Tetrahymena 118364583 Phytophthora ramorum jgi - 97076 -/0.48 Phaeodactylum 219115844 675/0.98 Thalassiosira 224014939 999/1.00 872/0.83 -/0.34 669/0.61 0.2 substitutions/site Figure 2.15: Unrooted phylogenetic tree of 34 Mnd1 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 134 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support fro 71 Mus 34785459 Homo 6912622 857/1.00 Monodelphis 126322473 983/0.99 Gallus 45382655 1000/1.00 Xenopus 148230804 765/0.97 Danio 125839739 Strongylocentrotus 115657922 809/0.93 Trichoplax – jgi 27976 858/0.94 Nematostella 156379220 Rdh54 650/0.94 888/0.98 Apis 110760280 996/1.00 Aedes 157128256 Monosiga – jgi 170 523/0.36 Ustilago 71019185 936/1.00 Coprinus 116508450 Schizosaccharomyces 63054489 914/0.99 -/0.53 Aspergillus 66850516 -/0.60 Candida albicans 68477713 Saccharomyces 151946464 1000/1.00 836/0.97 Kluyveromyces 49644752 Dictyostelium 66811190 802/0.60 Entamoeba 67475316 1000/0.99 Leishmania 146087788 -/0.40 1000/1.00 Trypanosoma 71651467 Ostreococcus 145350886 -/0.49 -/0.95 Physcomitrella 168048890 Paramecium 145482121 1000/1.00 Tetrahymena 118383249 Chlamydomonas 159467693 -/0.27 Naegleria – jgi 43000074 1000/1.00 999/1.00 0.5 substitutions/site Figure 2.16: Unrooted phylogenetic tree of 29 Rdh54 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 495 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support fro 72 Mus 1495708 Homo 1495483 689/0.97 Gallus 118094595 1000/0.98 Xenopus 47575794 756/0.98 Danio 41055574 567/0.59 Strongylocentrotus 72012428 Rad54 Nematostella 156369786 735/0.97 Caenorhabditis 17508659 346/0.70 Apis 110771180 Tribolium 189238349 956/0.99 Aedes 157130680 -/0.75 974/0.99 Drosophila 27819922 Kluyveromyces 49640265 999/0.99 1000/0.99 1000/0.99 Saccharomyces 151943650 691/0.83 Candida albicans 46444289 Coprinus 116505577 639/0.94 774/0.98 Ustilago 71008587 1000/0.99 Schizosaccharomyces 19115202 Aspergillus 40743497 Neurospora 7384851 -/0.55 Gibberella 46127169 -/0.68 1000/0.99 537/0.99 Magnaporthe 22775414 Monosiga 167527295 Theileria 84998504 782/0.95 1000/0.99 Plasmodium 124512694 Cryptosporidium 66361996 Thalassiosira – jgi 259430 1000/0.99 992/0.99 Phaeodactylum – jgi 37863 Phytophthora sojae – jgi 112767 Ostreococcus 116059418 864/0.99 985/0.98 Chlamydomonas 159489044 Physcomitrella – jgi 172305 889/0.98 Oryza 50913053 997/0.99 1000/0.99 Arabidopsis 9294624 1000/0.99 750/0.99 0.2 substitutions/site Figure 2.17: Unrooted phylogenetic tree of 34 Rad54 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 495 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support from 1000 PhyML bootstrap replicates followed by the posterior probability estimated using Phylobayes. 73 Mus 13385116 969/0.93 Homo 21717826 791/0.82 724/0.51 Monodelphis 126308554 772/0.48 Gallus 45383087 783/0.47 Xenopus 451583 Rad59 Danio 55925241 -/0.39 Nematostella 156377005 -/0.58 399/0.51 Trichoplax – jgi 158188 Entamoeba 56465048 Dictyostelium 66810566 Candida albicans 68492202 943/0.99 Saccharomyces 151941940 943/0.86 Kluyveromyces 50308361 0.5 substitutions/site Figure 2.18: Unrooted phylogenetic tree of 13 Rad59 homologs. Trees were estimated with maximum likelihood and Bayesian inference (LG+G) from 102 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. GenBank Geninfo Identifiers are given for all sequences unless otherwise noted (e.g. “jgi”) refers to the Joint Genome Institutes public sequence databases. The consensus topology of 2 Phylobayes chains is shown. Numbers at the nodes indicate support fro 74 100 100 100 100 100 99 93 82 94 100 100 98 100 55 Mus Homo Monodelphis Gallus Xenopus Danio Strongylocentrotus Nematostella Trichoplax Apis Tribolium Aedes Drosophila Monosiga Caenorhabditis Neurospora 100 Magnaporthe 100 Gibberella Aspergillus Schizosaccharomyces Coprinus Ustilago Candida Saccharomyces 100 100 Kluyveromyces 100 Arabidopsis 100 Oryza Physcomitrella Ostreococcus Chlamydomonas Phytophthora Phaeodactylum 100 Thalassiosira Cyanidioschyzon Cryptosporidium Plasmodium 100 96 Theileria 100 Paramecium Tetrahymena Giardia Trichomonas 100 Trypanosoma Leishmania Naegleria Dictyostelium Entamoeba 78 100 100 58 100 100 100 77 100 39 100 80 31 68 43 56 44 82 0.2 substitutions/site Figure 2.19: Unrooted phylogenetic tree of 46 sets of 13 concatenated strand exchange homologs. Trees were estimated with partitioned maximum likelihood inference from 3084 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, and Excavata in brown. The best tree from 1000 replicates is shown. 75 Table 2.1: DNA strand exchange component absences from eukaryotic groups. Gene Rad52 Rad59 Rad55 Dmc1 Hop2 Mnd1 Rad54 Rdh54 Eukaryotic group Alveolata (Plasmodium, Theileria, Cryptosporidium, Tetrahymena, Paramecium) Viridiplantae (Arabidopsis, Oryza, Physcomitrella, Chlamydomonas, Ostreococcus) Endopterygota (Aedes, Drosophila, C. elegans, Apis, Tribolium) Endopterygota (Aedes, Drosophila, C. elegans, Apis, Tribolium) Most Fungi (except Saccharomycetales – S. cerevisiae, Kluyveromyces, Candida albicans) Excavates (T. vaginalis, G. intestinalis) Chromista (Thalassiosira, Phaeodactylum, Phytophthora) Bacillariophyta (Thalassiosira, Phaeodactylum) Diptera (Aedes, Drosophila, C. elegans, Apis) Sordariomycetes (Neurospora, Gibberella, Magnaporthe) Bacillariophyta (Thalassiosira, Phaeodactylum) Sordariomycetes (Neurospora, Gibberella, Magnaporthe) Sordariomycetes (Neurospora, Gibberella, Magnaporthe) Ciliophora (Tetrahymena, Paramecium) Excavates (T. vaginalis, G. intestinalis) Most Chromalveolata (except Ciliophora – Tetrahymena, Paramecium) Embryophyta (Arabidopsis, Oryza) 76 EUKARYOTES Giardia Trichomonas Trypanosoma Leishmania Naegleria Plasmodium Theileria Cryptosporidium Tetrahymena Paramecium Thalassiosira Phaeodactylum Phytophthora Arabidopsis-A Arabidopsis-B Arabidopsis-C Oryza-A Oryza-B Oryza-C Physcomitrella-A Physcomitrella-B Chlamydomonas-B Chlamydomonas-C Ostreococcus-B Ostreococcus-C Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabditis Apis Tribolium Nematostella Trichoplax Monosiga Saccharomyces Kluyveromyces Candida albicans Neurospora Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilago Encephalitozoon Dictyostelium Entamoeba 10 20 30 40 50 60 70 80 90 100 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|.... IADLvDFG--QHTIRGRVLSRKPLATtragkp----wFSF-MLDDDSLDVKVDCFdD-AEKFSAQIsAGDIIQLEHIKISSKTPAdRRFDVSRS-DYKVSIKsGTVVTL IASLnQYLT-SWSLIVRIISKSQMRQfqgsrp--gklFSIIMRDKNNDEIKGTFFnQEAEKFENLVeQDKVYKVS--C-GRVKKAnERYNSTKS-EFEITFDsTSSIIE IDSLsPFLGGKWWIRARVTDKSEIRTwnkpts-qgklFSFTLIDESA-SIRATVFnEAVDMFNPLIvNGQVYYFS--G-GQVKNAnRKFSNVNN-DYELSFDnTCQISA IDSLtPFLGGKWWIRARVTDKTDVRTwnkpts-qgklFSFTLIDESA-AIRATVFnDAVDTFEPLIvNGQVYYFS--G-GQVKNAnRRFSNVNN-DYELTFDrSSEIML ISNLnPYDK-VWVIKARVTQKSDMKHwdkgts-kgslFSIELLDEYGGQIRATFFnDVAKKYYDAIkERSVYFFS--G-GKLKDAnRKFTTIPH-PYEITFDrDTVIQN INKLsQYST-KWIIKARVQSKDNIRKfytgnk-egkvFNIELCDESG-EIKVNVFgKAVDKWYDYLeVGKIYKIS--K-GNIKSAnKKFNTLKH-DCEITLDeNSILEL ISDLtLYTP-KWQIRARVVFKSEIRKfnnqrg-esqlFSVDLCDSNG-EIRAVFFgESVNKWYSFLeEGQVYSIS--G-GQLKPAnKRYNNLKH-SCELILDeSSYIQL IKSItSYLH-RWRIIGRVISKSDVRTfssskskegkvFSFEICDAEGSQIRATCFtKAVDKFYEFLkEGEIYSFS--K-GDVKEAnAKFNKTGH-GFEIIFNeDADIQS IRNLqPNGQ-PQTIKVRITKKGDLKSfkek---qgklFSIDVIDKFGDECSISFFnEIAEQYDGLFkVGQVIVLK--Q-FSVKV-nNNHQYNKG-DHTVTVNkESKILI ISELyPGMR-GFKIKGRITSKTDITQfkngkg---ylFTIEIIDSDKQTIQGVFFnKLCDKFYDFIdIGKVYYFE--N-ASVKTNrYSSKNQNQSDYQIHFEdFSKISI ISGLnMYSN-RWVIRAKVTNKSDVKTwsnakg-egslFSVTLLDSSGYDVKCTFFkEAVDKFYNMLeEGRVYTFS--G-GRLKVAnMAYNNCKS-QFEITFDqNSEIHL ISNLnMYAN-KWTIQARVTSKSDIRTwsnakg-egslFSVELLDQTQ-DVRATFFkEAVDKFYSFLqIGSVYTFS--G-GRLKVAnAQYNTCQS-NFEITFDqNSEIHL IQSLnPYAGGRWTIKARVTTRSPIKNwtnarg-sgklFSVDLLDAKGGEIRATFFkEGVDKFYDTLrEGGVYYFS--G-GKIKMAnRRFSSVDN-DYEITFDtHSDISP IAALnPYQG-RWAIKARVTAKGDIRRynnakg-dgkvFSFDLLDYDGGEIRVTCFnALVDRFYDVTeVGKVYLIS--K-GSLKPAqKNFNHLKN-EWEIFLEsTSTVEL LVSLnPYQG-SWTIKVRVTNKGVMRTyknarg-egcvFNVELTDEEGTQIQATMFnAAARKFYDRFeMGKVYYIS--R-GSLKLAnKQFKTVQN-DYEMTLNeNSEVEE IAALnPYQG-RWTIKVRVTSKADLRRfnnprg-egklFSFDLLDADGGEIRVTCFnDAVDQFFDKIvVGNVYLIS--R-GNLKPAqKNFNHLPN-DYEIHLDsASTIQP ISALnPYQG-RWAIKARVTAKGDIRRyhnakg-dgkvFSFDLLDSDGGEIRVTCFnALLDRFYEVVeVGKVYVVS--R-GNLRPAqKNYNHLNN-EWEILLEnGSTVDL LISLnPYQG-NWIIKVRVTSKGNLRTyknarg-egcvFNVELTDVDGTQIQATMFnEAAKKFYPMFeLGKVYYIS--K-GSLRVAnKQFKTVHN-DYEMTLNeNAVVEE ITALnPYQP-KWTIKARVTAKSDIRHwsnars-sgtvFSFDLLDAQGGEIRAQCWkESADKFFGQIeVGRVYLIS--R-GSLKPAqKKYNTLNH-DYEITLDiLSTVEV IAALnPYQG-RWTIKARVTSKGEIRRfhnakg-egkvFSFDMLDADGGEIRATCFnNVVDQFHDRIeVGKVYLIS--K-GSLKAAqKNFNHLKN-DWEIFLEsQSTIEP ILSLnPYQG-NWTIKVRVTSKSPLRTfknarg-dgnvFNVELTDEDGTQIQATMFkEAADKFYDVLqLDKVYFIS--K-GSLRMAnKQYATVKN-DYEMTLNsNSEIVE IAQLhPYET-NWCIRAKVDRKAPLRAlpskp--dvkvMTVDLVDETGTAIQGTFWrGPAERMSEQLvEGKVYVFH--K-FKVKPAdKKYVTVKN-EYQIDFTdTTDVSE ISALnPYTA-RWAIRARVTSKGELRRwtnvrg-egkvFSFDLLDKDGGEIRATAFgAEADKFFEVVeAGAIYQIS--K-ASLTNKrPQFNHTNH-QYEIKLDrNSMVER LAALnPYRT-PWTVKVKLTNKGNVREyksarg-pgkvCSVDFVDEEGTAIGATLWrEAIEKYDSVLeVGKVYYVS--K-GSLKPAdKRYSTSGN-DYEMNLDgKCEIDV IHALnPYQN-RWTIRARITTPLELRSysnakg-egkvLGFQVLDADGTEIKCVCFnDTAVRLAGELrQGLVYEIS--KGAIVTPRdPRYAIY---QYEIKLDnHATFVP ISAAnPYQN-NVIIRGRVVQKGELRTysnakg-egklFSFEIADETG-NMRVTAFrEKALEAHQRIeLNGIYSIA--G-ASLKPAnAQFNHTGH-SFEMILDqNSVITQ IASLtPYQS-KWTICARVTNKSQIRTwsnsrg-egklFSLELVDESG-EIRATAFnEQVDKFFPLIeVNKVYYFS--K-GTLKIAnKQFTAVKN-DYEMTFNnETSVMP IASLtPYQS-KWTICARVTNKSQIRTwsnsrg-egklFSLELVDESG-EIRATAFnEQVDKFFPLIeVNKVYYFS--K-GALKIAnKQFSAVKN-DYEMTFNnETSVLP IASLnPYQS-KWTICARVTNKSQIRTwsnsrg-egklFSIEMVDESG-EIRATAFnDQVDKFFPLIdVNKVYYFS--K-GTLKIAnKQFTAVKN-DYEMTFNnETSVVL IASLnPYQS-KWTICARVTQKGQIRTwsnsrg-egklFSIELVDESG-EIRATAFnDQADKFFPLIeLNKVYYFT--K-GNLKTAnKQYTAVKN-DYEITFNnETSVVP IASLnPYQS-KWTVRARVTNKGQIRTwsnsrg-egklFSIEMVDESG-EIRATAFnEQADKFFSIIeVNKVYYFS--K-GTLKIAnKQYTSVKN-DYEMTFNsETSVIP IASLnPYQS-KWTIRARVTNKSAIRTwsnsrg-dgklFSMELVDESG-EIRATGFnNEVDKFFSLIeQGKVFYIS--K-GTLKIAnKQFSSLKN-DYEMTLNgETSIIP ------------------------------------------------------------------------------------------------------------INSLsPYQN-KWVIRARVMSKSGIRTwsnakg-egklFSMDVMDESG-EIRVTAFkDQCDKYYDMIeVDKVYYIT--K-CQLKPAnKQYSTLKN-DYEMTMTnDTIVQE ISSLsPYQN-KWVIKARVTSKTAIRTwsnarg-egklFSMDLMDESG-EIRATAFkEQCDKFYDLIeVDNVYFFS--K-CQLKPAnKQYSQLKN-DYEMTFTnETMVQP IAMVtPYVS-NFKIHGMVSRKEEIRTfpaknt---kvFNFEITDSNGDTIRCTAFnEVAESLYTTItENLSYYLS--G-GSVKQAnKKFNNTGH-DYEITLRsDSIIEA IVALsPYQN-RWVIKARVVSKSNIRTwsnsrg-egklFSMDLIDESG-EIRCTAFrNECDKFYDMLeIGKVYYIS--R-ATLKPAnKQFNNLKN-DYEMTLIgDSEIIP INALtPYHN-KWVIKARVTNKSDMRTwsnsrg-egklFSFDLMDDSG-EIRCTAFrDMADKYFNYLqVDKVYYIS--K-CQLKAAnKQFNTLKN-EYEMTIGnETIIEE ISGLtPYQN-RWTIRARITSKSNIRTwnnsrg-egrlFNVEMVDESG-EIRATGFnEAVDKFYQMLeVDKVFYIT--K-GSLRTAnKQYSSIRN-DYEMYLNnDTIIEP ISSLtPYQN-RWTIRTRVTSKSEIRKwsnsrg-egklFSVDLIDESG-EIRATAFrDQVEKFYDVLeVNKVYYIS--R-CSIKTAnKNFTSIKN-DYEMTFTnETAVEP LTSLnPYDR-RWAIRVRVVAKPPIRTynsdrg-egkiFSVDLVDASG-EIRATGFnADCDRLYPLFeKNKVYMIQ--G-GRIKPKnRRFNQLSH-EYEITFDsTTTVTE IEQLsPYQN-VWTIKARVSYKGEIKTwhnqrg-dgklFNVNFLDTSG-EIRATAFnDFATKFNEILqEGKVYYVS--K-AKLQPAkPQFTNLTH-PYELNLDrDTVIEE IEQIsPYQN-NWTIKARVSFKGDLKKwqnnrg-eghiLNVNLLDSSG-EIRATAFnDNAIKFNEILqEGKAYFVS--K-ARVQPAkPQFSNLKH-PYELSLErDCVVEE IETIsPYQN-NWTIKARVSYKGDLRTwsnskg-egkvFGFNLLDESD-EIKASAFnETAERAHKLLeEGKVYYIS--K-ARVAAArKKFNTLSH-PYELTFDkDTEITE IEGLsPFSH-KWTIKARVTSKSDIKTwhkasg-egklFSVNFLDESG-EIRATGFnDQVDQFYDLLqEGQVYYIS--TPCRVQLAkKQWSNLPN-DYELTFErDTVIEK IEGLsPFAH-KWTIKARVTAKSDIKTwhkatg-egklFSVNLLDESG-EIKATGFnDQCDALYDQLqEGSVYYIS--TPCRVQLAkKQFSNLPN-DYELTFErDTVVEK IESIsPYQH-KWTIKARVSQKSDIRTwhkasg-egklFSVNLLDETG-EIKATGFnDQCDKFYDILqEGQVYYIS--TPCRVQMAkKQFTNLPN-DYELTFEdGTQIEK IEAIsPYSH-KWTIKARCTSKTNIRTwhnrnt-egrlFSVNLLDDSG-EIRATGFnDQCDMLYDVFqEGGVYYIS--N-CRVQIAkKQFTNLNN-DYELTFErDTVVEK IEGLsPYQN-KWTIRARVTNKSEIKHwhnqrg-egklFSVNLLDESG-EIRATGFnEQVDAFYDILqEGQVYFIS--K-CRVNIAkKQFSNVQN-EYELMFErDTEIKK IEGLsPYQN-NWTIKARVTQKSEMKQwsnaqg-egklFNVTFMDDSG-EIRATAFnLVADDLYPKLeEGKVYYVS--K-ARVGLAkKKFSNIPN-DYELSLErNTEIEE IEGLsPYQN-RWTIKARVTSKSDIRHwsnqrg-egklFSVNLLDDSG-EIKATGFnDAVDRFYPLLqENHVYLIS--K-ARVNIAkKQFSNLQN-EYEITFEnSTEIEE INMLnPFHN-KWAIKGRVVMKSDIRRftnqkg-egkvFNFEVSDGTA-QVKIICFsDCVDIFFPIVeVGKVYTIA--K-GTVKMAnKQYSTNPF-DYEIILDkSSEVHR IESIaPGMNVQFTIRAMVRNKQPLKSwnkgangegklFSMELVDSTG-EIKCACFsDSINALYDCFeNGKVYFIQ--R-FFVKSAnKLYNTLSH-QSELSINsESRVMI FSLLtTFGS-KLIIKGRVVSKNDKFKya-----kgnlFSFVLQDKDGAEIKATCFnDVCDEKFDQIkVGETYYIT--KADYKQSNgKGYRSAKMIDLDMIIGkYTIIQK 76 Figure 2.20: Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-A) from 54 diverse eukaryotes. Genus names for Excavata are highlighted with brown, Chromalveolata with orange, Archaeplastida with green, Opisthokonta with purple, and Amoebozoa with blue. Shaded columns indicate amino acids are 75% identical. 77 EUKARYOTES Giardia Trichomonas Trypanosoma Leishmania Naegleria Plasmodium Theileria Cryptosporidium Tetrahymena Paramecium Thalassiosira Phaeodactylum Phytophthora Arabidopsis-A Arabidopsis-B Arabidopsis-C Oryza-A Oryza-B Oryza-C Physcomitrella-A Physcomitrella-B Chlamydomonas-B Chlamydomonas-C Ostreococcus-B Ostreococcus-C Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabditis Apis Tribolium Nematostella Trichoplax Monosiga Saccharomyces Kluyveromyces Candida albicans Neurospora Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilago Encephalitozoon Dictyostelium Entamoeba 10 20 30 40 50 60 70 80 90 100 110 120 130 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|. HTHICTILAGISDATLVYSspesarQWFRLDLQVVDKTg----SIKVSLWTE----QLEPFLSAYNMTKDdAPARLTGKIVVLSRVqFRCSER-ygIQCSCVgISKVFLVDSTpskqddpSNLKEFRLWWA TVDIISIITFIGDCQTI--ktksgsSIEKRNITVSDETg----TIEVTLWGS----SATEF----D--QKeSE--------IFCVRnVTVSDF-rgVSLNVGqSATIVV---Np-p---dNDVKNIRNWYE LVDVLAVVLNVEELGTI-VqrstgrELVRRTVKVADSTa----GIDVTLWNE----NAKEWP------HQpGT--------VLAMRqLKVGSF-dgVTLSTTmQSSFDV---Np-n---iPDVKKLREWFE LVDVLGVVLKVDEVSSI-TqkstgrELVKRNVKMGDMTa----AVEVTFWND----EAKAWC------YPvGT--------VVALRqLKVGSF-hgVPFSSTyQTKIDI---Nptd---lPDVKKLATWYV ILDVAGVVQNIGETKEF--ttknnrKTKRCNISLIDDSsspfcTVDLTIWGD----MCDTH----D--MQqGD--------VVILKsVRKSNY-ggVSLNTInSTRIFK---Dp-g---iPIYQQLSEWYQ LVDVIGVVLSFQELNQI-LikktgqYKEKKDLMLIDETn---eTINVTLWGE----NAVKMEEMN---ITeNC--------IICFKcLKVGEW-qgKKLESHpKTKVEI---Np-e---lDKAYTLKNWWI TIDVIAIVVTARDLQKI-NnkatgnNVEKRDFLLCDSTn---tTVWVTSWGQ----KTQLFNYEGD---NsHP--------LVCLKgVKVGEW-qgKKLDVQiSTQVIC---Ep-v---iPEALKLRKWWN SIDILGILWKASPIMTI-TikstgaDTQKRELTILDRSg---ySIDLTLWSERTNLDEGML--------AqNP--------MIAVKnAIIEEF-ngFKLKFGpNTSIEW---Npin---iEQADELRQWFQ LIDLIVVVKADTEVKTM-IlkkdnqQQSKRDIISFDESl---iETEITLWGE----TAKDY----D--AKqGD--------IIVFKdAKIGEFkdkKQINIGyGTQIFM---Np-deqlfPQIHDVKKWYL KCDVLGVIIDIKPTTQI-Mtk-sneNRSKKNITLYDQTq---rGIDIVLWGQ----QAEKWQ------FQkDE--------IVAFRgLKISDYqmvRNLTVTnSTIYEK---Nlsn---lKKINGFQEFYE YVDILAVVKHVGDVSTI-VskksgkEMTKVDLVVEDDSg---aDVKLTLWGNSAQNAENQF--------AnCP--------VVAFKkSRLGDY-ggRSLS---GGSPTV---Np-q---iPQTNQLMQWWG MVDLLAVVQAVGEVATI-VskksgqELTKCDLTLIDTSa---tQITLTLWGDKAASALTDY--------NqQP--------VVAIRrARVSDY-ggRSLSL--SGSIET---Np-d---iPQTAPLQTWWR NVDVIGVVRDVGQVNEI--mskagkQLFKRDILLVDDSn---aEIKCTMWNERAQEDCSGW---------qNQ--------VLAIKgCRVSEY-ngRSISTVsSSNFTV---Np-a---iPEAGHLVTWFS ILDVIGVVTSVNPSVPI--lrkngmETHRRILNLKDESg---kAVEVTLWGEFCNRDGRQLEEMVD--SAfHP--------VLAIKaGKVSDF-sgKSVGTIsSTQLFI---Np-d---fPEAHKLRTWFD LIDVIGVVQSVSPTMSI-RrkndneMIPKRDITLADETk---kTVVVSLWNDLATGIGQELLDM----ADnHP--------VIAIKsLKVGAF-qgVSLSTIsRSNVVI---Np-n---sPEATKLKSWYD TTDVIGIVSSISPTVAI--mrknltEVQKRSLQLKDMSg---rSVEVTMWGNFCNAEGQKLQNLCD--SGvFP--------VLALKaGRIGEF-ngKQVSTIgASQFFI---Ep-d---fPEARELRQWYE ILDIIGVVTSVNPCTTI--qrkngmETQKRTMNLKDMSg---rSVEVTMWGDFCNREGSQLQGMVE--RGiFP--------VLAVKaGKVSDF-sgKSVGTIsSTQFFI---Np-d---sAEAHSLRQWFD LVDVIGVVQSVSPTLSV-RrkidneTIPKRDIVVADDSs---kTVTISLWNDLATTTGQELLDM----VDsAP--------IIAIKsLKVSDF-qgLSLSTVgRSTIVV---Np-d---lPEAEQLRAWYD IVDLLGVVTSVSPSATI--mrkigtETRKRSIQLKDLSg---rSIEVTLWGNFCDAEGQQLQLQCD--SGsNP--------IIAFKgARVGDF-ngKSVSTIgSTQLII---Np-d---fPEVERLRQWYM MIDIIGVVMSITPTVTI--trknglETQKRSLQLKDMSn---rSVELTMWGNFCNKEGQELQDLCD--SGaNP--------VLAVKaGRVSDF-sgKSVGTIsSTQLVI---Np-d---hPEARKVRDWFN VADVLGVVQSVGPLTTV-NrksnndEIPKRDIVLLDQSr---qTVVLTLWNNMAVKEGASLADL----IAeSP--------ILMAKgLRLSDF-qgVSLSSTmNTMVLI---Np-v---iPDANELRTWYE PVDVMGVVLALGSYGTV-KrkadnsELPRREVTIGDQSg---kSVAITLWGDMSSTTAQQLEG-----MEgRA--------VLQVTgCRVTDY-ngCSLSTLsKSVASI---Np-e---tPAAQQMMLWYK VVDIIGVVETCEPWQTI--trrtgeETQKRSMVVRDDSg---rSIEVTLWGALVNNPGDQIEQMVR--GGgRP--------VLAAKaLRVGDY-ngKTLSTVgASALRL---Dpmd---lPAAQRVRGWYN NVDVVAVVKEVSELSSI-RrksdntELNKREVVLVDDSa---kTVRLTLWNA----LAVEVGEQLA--SMtNP--------VVAIRsVRVGDY-egVSIGTVsRSDIVI---Dped---vPRAVEIKKWWS MVDVIGIAYSVGDLTTI--mkrdgsETSKRSVMIRDDSd---tSIEFTLWDPHSVEIGGQIESLIA--SGeKP--------VIAVKsSRLGEF-qgKNMGTVsSTMVEI---Np-d---sSEATRMRVWFD VVDVIGIALDIGEVGEI-SskttglPVAKREVKLIDDTg---cSVALTIWGE----RARSLFSN----EDdRP--------VLLVKsAKRGDF-ngVSLSTTpSSHVEV---Np-n---iREAFELRGWFD LVDIIGICKSYEDATKI-TvrsnnrEVAKRNIYLMDTSg---kVVTATLWGE----DADKF----D--GSrQP--------VLAIKgARVSDF-ggRSLSVLsSSTIIA---Np-d---iPEAYKLRGWFD LVDIIGICKSYEDSIKI-TvksnnrEVAKRNIYLMDMSg---kVVTTTLWGE----DADKF----D--GSrQP--------VMAIKgARVSDF-ggRSLSVLsSSTVIV---Np-d---iPEAYKLRGWFD LVDIIGVCKSYEDASKV-VvkssnrEVSKRNVHLMDTSg---kVVTTTLWGE----DADRF----D--GSrQP--------VLAIKgARVSDF-ggRSLSVLsSSTILV---Np-d---iPEAFKLRGWFD IVDVIGICKSYEDVTKI-VvkasnrEVSKRNVHLMDTSg---kLVTATLWGN----EAEKF----D--GSrQP--------VIAIKgARVSDF-ggRSLSVLsSSTVVV---Np-d---sPEAFKLRGWFD VLDIIGVCKNVEEVTKV-TiksnnrEVSKRSIHLMDSSg---kVVSTTLWGE----DADKF----D--GSrQP--------VVAIKgARLSDF-ggRSLSVLsSSTVMI---Np-d---iPEAFKLRAWFD ILDVIGVCKNAEDVARI--mtknsrEVSKRNIQLIDMSg---rVIQLTMWGS----DAETF----D--GSgQP--------ILAIKgARLSDF-ggRSLSTLySSTVMI---Np-d---iPEAYKLRGWYD MPNVIGVCKSTSDLTAV-TikssnrEVNKRSLQLVDDSq---kEVSLTLWGK----EAEDF----D--GSgNP--------VIAVKgARLSGF-ggRSLSVLqNSIFQV---Np-d---iPKAHHLKGWFD MIDVIGVCKEAGEVMQF-TarssgrELKKREVTLVDSSn---aAVSLTLWGD----DAQNF----N--ATnNP--------VLVIKgARVTEFgggKSLGLVaSSVLKT---Np-d---nEEAHKIRGWYL AVDTIGICKEVGELQAF-TsrttnkEFKKRELTLVDMSn---aAVTLTLWGD----EAVNF----D--GHvQP--------VILVKgSRINEFnggKSLSMGgGSILKI---Np-d---iPEAHKLRGWFD LIDVLVVVEKMDPEATE-FtskagkSLIKREMELIDESg---aLVRLTLWGD----EATKA--LVD--DYvQK--------VIAFKgVIPREFnggFSLGTGsATRIIS---Vp-e---iAGVSELYDWYA IMNILGIVKYSGDLQIL-TsrnsgrELRKRDVSLVDESn---tTVTLTLWGS----QAEEF----D--GSsNP--------VLAVKgARITEFnggKNLSTLsSTVLQI---Dp-d---lPAAHRLRGWFN LVDVIGICKEASEVQTF-TskstnrELRKREITLVDQSk---tSIALTLWGS----QADSF----D--ATnNP--------VVVIKgAKVGEFgggKNLSTLmSSQIKL---Np-d---iPECHRIKGWYD IVDILGVVTNVGDLAQI-TtkttnkQVSKRDITLLDRSe---kSVTATLWGD----EAEKF----EEHAGkNP--------VLAIRgAKVSDF-ggRSLSVLnSSNMRV---Npvd---mKEAQVLRGWYD MIDVVGVVKSADDVVTI--ntksnrQVNKRDIELVDDSg---kVVRLTLWGT----NAEEF----D--GSqFP--------VVAVRgARVTEF-ggRSLSVVgSSQLMT---Np-d---iPEAHILRGWFD FADILAVIKEVADVTTI-VtraaqkELSKREVTLVDKDn---vSLSCTLWGK----EAEGF---VDAGGHpGV--------VMAIKaARISDF-ngRSLSVAsNSNYSI---Np-d---lKEAHELKGWCV NVDVLGIIQTINPHFEL--tsragkKFDRRDITIVDDSg---fSISVGLWNQ----QALDF----N--LPeGS--------VAAIKgVRVTDF-ggKSLSMGfSSTLIP---Np-e---iPEAYALKGWYD AIDVVGILKSVGPHFEL--aaksgkKFDRRDVEIVDDSg---aCISLGLWGE----QAIKF----N--LPeGS--------VVALKgVRVTDF-ngKSLSMGnTSSLFA---Np-d---iQEAYTLKGWYD IIDVLGALKTVFPPFQI-TakstgkVFDRRNILVVDETg---fGIELGLWNN----TATDF----N--IEeGT--------VVAVKgCKVSDY-dgRTLSLTqAGSIIP---Np-g---tPESFKLKGWYD TVDIIGVLKEVQEVTQI-VskttqkPYDKRELTLVDNTg---ySVRCTIWGK----TATNF----D--AQpES--------IVAFKgTKVSDF-ggRSLSLLsSGTMAI---Dp-d---iPEAHHLKGWYD TVDVIGVLKEVGEIGDI-TskkdgrPFQKRELTLVDDTg---fSVRVTIWGK----NANSF----D--AApES--------VVAFKgTKVSDF-ggKSLSLLsSGTMTV---Dp-d---iPDAHRLKGWYD TVDVIGVLKDVADVTQI-TskasgkFFDKRELTLVDDSg---ySVRMTIWGK----TAQNF----D--AKsES--------VVAFKgAKVGDF-ggRSLSLLsSGTMTV---Dp-d---iPEAHRLKGWFD TIDVIGVLKEAMDVTQI-TskttnkPYDKRELIMVDNTg---fSVRLTIWGS----TAQKF----N--ASpES--------VIAFKgVKVSDF-ggRSLSLLsSGSMAV---Dp-d---iEEAHKLKGWYD IIDVIGVLQNIGPVQQI-TsratsrGFDKRDITIVDQSg---fEMRMTVWGK----QAIDF----S--VPeES--------IIAFKgVKVNDF-qgRSLSMLnSSTMTT---Dp-d---iPEAHTLKGWYD ICDVIGVVKDVGEVGTI-TsrsnnrQISKRDLTLVDKSa---ySVRMTLWGK----QAEQF----K--VEpES--------IIAFKgVRVGDF-ngRNLSMTsASTMQV---Np-d---iEECFTLRGWYD TCDVIGILDSYGELSEI-VskasqrPVQKRELTLVDQGn---rSVKLTLWGK----TAETFPTNAG--VDeKP--------VLAFKgVKVGDF-ggRSLSMFsSSTMLI---Np-d---iTESHVLRGWYD YCDTIGVVKEVYAPSTV-MvrstqsELLKRDAVLVDDGg----SVRLTLWGP----KAELE-------IEsGM--------VLALKsIKVSEF-ngISISTTgGSQVVT---Np-d---iAEAHELEGWYQ TVDVIGAITNIDPIANL-Tsk-qgkEFTKFGITIADDTn---aSINVVFWNE----KATEV----APQVKvGD--------IIAMKgVKVSDF-sgRTLSYSfGSSFGL---Ndeq---lQETSNLRAHLQ TYDICAFLVDKGPEQTY------knEKAKVTLTFMDQSs---yAVEVDFWNE----DIDKTKD-----MEnGV--------VYVLTsLKLKEF-kyKTLTVTkATKILS---Nt-dieqyDEASLVNKFIQ 77 Figure 2.21: Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-B) from 54 diverse eukaryotes. Genus names for Excavata are highlighted with brown, Chromalveolata with orange, Archaeplastida with green, Opisthokonta with purple, and Amoebozoa with blue. Shaded columns indicate amino acids are 75% identical. 78 Figure 2.22: Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-C) from 54 diverse eukaryotes. Genus names for Excavata are highlighted with brown, Chromalveolata with orange, Archaeplastida with green, Opisthokonta with purple, and Amoebozoa with blue. Shaded columns indicate amino acids are 75% identical. 79 EUKARYOTES Giardia Trichomonas Trypanosoma Leishmania Naegleria Plasmodium Theileria Cryptosporidium Tetrahymena Paramecium Thalassiosira Phaeodactylum Phytophthora Arabidopsis-A Arabidopsis-B Arabidopsis-C Oryza-A Oryza-B Oryza-C Physcomitrella-A Physcomitrella-B Chlamydomonas-B Chlamydomonas-C Ostreococcus-B Ostreococcus-C Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabditis Apis Tribolium Nematostella Trichoplax Monosiga Saccharomyces Kluyveromyces Candida albicans Neurospora Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilago Encephalitozoon Dictyostelium Entamoeba 10 20 30 40 50 60 70 80 90 100 110 120 130 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| NTLg---el--agsdpLRVLASLILDR--Nle-----TATYRGCa-----s-------------CKS--ALRD--------------------s------PICPK-CqetsagekYYWRIGGHISDAlAH KQMc---rt--engevFCVNVMIQDMP--Ssr----KPV-YQACp-----n-----------eaCRGSGLIID--------------------q-etgk-MICKK-CnkevtnpkLRYSLSLNVGDYsGS EGLg---kg--pkpdyIDLRCVPVYLK--Qdt-----QW-YDACp-----q-------------CNK--KVML--------------------egamgdrFRCEK-Cdqsi-vptQRYLVSIQVTDNvSQ DGIg---rg--lkpeyVDVRCVPIYFK--Qda-----QW-YDACp-----t-------------CNK--KVTE--------------------egaqgdrFRCEK-Cdktv-tptQRYLVSIQVTDNvSQ MSVd---tv--tapdyLTIRAYVSYIK--He------LW-YDACt-----n-----------keCNK--KVQQ----------------------negi-YHCSS-CnhssdtctRKFLANLGITDWtGK VNLa-neevlsgkgiiFTTFGFIDHIY--Nai-----PV-YSACp-----n-------------CNK--KMVAtviedg----------eedmdqnvsesMYCAK-Cnkn-nipvYNYSINLKITDNtDS TNQglqfksidsngmvFTTRGLIEVLK--Dtn-----FC-FPSCt-----g-------------CRK--KMSN----------------------dqgc-WYCSK-Cnsst-npiHLYILNIKIVDEsSH ATSgvnssdildggiwVFTNATIRTIR--Dnk-----YF-WSSCr-----q-------------CKR--KVTEiedpnsvsalilpfssengnkvntgpnYHCPN-CqqtiedplKKYILSCELIDStGT KNL----qtdpemkiwKEIRGQIMYIK--Dtp-----LY-YNACf-----s-------------CKK--KIAR----------------------nnev-WTCIN-CnkdfnepdSRYILSLNISDStDT DFEg---irnikfvkfYEIKAYITNIF--Tkl-----LY-YEGCe-----n-------------CKR--KVVY--------------------iqqtkl-YHCQS-CnqnfdqpsYKYMFNAKIADTtGN EHLg---ns--dkpdwLSFKATITFLK--RekQGDDGAW-YTACa-----n---------sgepCRNMFKATQ--------------------t-sdgn-YHCDK-CqqthpncvRRFIFSGTVADDtST NNLg--yag--dkpdwLTFKATVSFLKKDKeg----GSW-YPACa-----n---------agepCKNRYKVTQ--------------------t-tdgn-WYCDK-CqgsfptcvRRWIFSGVVEDDtSS KQLg---fg--qkpdyITVKGTVNFIK--Hds----GVF-YQACp-----k-------------CQK--KVVA--------------------d-vaqn-FTCEK-CqtsypnceNRYILSVVLLDHtGS EGLg---rs--dkpdwITVKATISFIK--Tds-----FC-YTACplmig-d-----------kqCNK--KVTR--------------------s-gtnr-WLCDR-CnqesdecdYRYLLQVQIQDHtGL PSLg---e---ekpvfFSTRAYISFIK--Pdq----TMW-YRACk-----t-------------CNK--KVTE--------------------a-mdsg-YWCES-CqkkdqecsLRYIMAVKVSDStGE EKLg---ts--ekpdwITVCATISFMK--Ven-----FC-YTACpimng-d-----------rpCSK--KVTN--------------------n-gdgt-WRCEK-CdkcvdecdYRYILQIQLQDHtDL EGLg---mg--dkpdwITVKATVIFFK--Nes-----FF-YTACpnmig-d-----------rqCNK--KVTK--------------------s-tngn-WTCDK-CdrefeecdYRYLLQFQIQDHsGT PNLg---q---dkpvfFSLNAYISLIK--Pdq----TMW-YRACk-----t-------------CNK--KVTE----------------------amgsgYWCEG-CqkndaecsLRYIMVIKVSDPtGE ENLg---rl--ekpdwITVKAAISHVT--Tes-----FC-YPACpkllpvg-----------rqCNK--KAIN--------------------n-gdgm-WHCDR-CdesfqnpeYRYMLRFQIQDHtGS EGLg---rg--dkpdwITIRATVFYIK--Pen-----FC-YSACplevn-g-----------kqCMK--KVTN--------------------n-gdgt-WRCDR-CdrsvpecdYRYLLSIQVQDHtGP PNVg---eg---kpmyFNVRAYISFIK--Pdq----AMW-YLACq-----t-------------CNR--KVVE--------------------q-ssss-YWCEG-CqnhydkcsRRYIMQAKLSDSsGE ETEa---lan-dkaifQNVTACVAMINNDDkn-----IF-YLANp-----e-------------NGR--KVVD--------------------q-gggr-FWSEA-DskvvekpeHRYLLSVRLADHtGE ENLg---rs--gkadwVNVSAVLDMIKGGAsa-----VV-YPSCphdfn-g-----------rpCQK--KMMD--------------------v-gggn-WNCDR-CqfstenpaWRYLVSLSACDHtAK EIAp---vt--dkptfAWVCAHTVMCK--Pdq-----TMYYTATp-----e-----------egNNK--KVIE----------------------sdgk-WYCEA-NgqtydtceRRYIMRFKAQDSsEG ELVa---kn--egvayLSCCGIIKHIKLGAeg-----NF-YPACpllng-e-----------rtCQK--KLRK--------------------ddstge-WKCERHAgekieaadWRYMFSMVCMDHsDE EHIgedphsa-pgasyYTVRATISHIK--Qde--ERPPW-YLSCp-----d-------------CKK--KVIE--------------------e-spdm-YRCER-Cdklv-kptPRYIFSIQAMDAtGS ENLg---qg--dkpdyFSSVATVVYLR--Ken-----CM-YQACp-----t-----------qdCNK--KVID--------------------q-qngl-YRCEK-CdtefpnfkYRMILSVNIADFqEN ENLg---qg--dkadyFSTVAAVVFLR--Ken-----CM-YQACp-----t-----------qdCNK--KVID--------------------q-qngl-YRCEK-CdrefpnfkYRMILSANIADFqEN ENLg---qg--dkadyFSCVGTVVYLR--Ken-----CM-YQACp-----s-----------qdCNK--KVID--------------------q-qngl-YRCEK-CdrefpsfkYRMILSVNIADFqEN ERLg---qg--dkadyFSCVGTIVHLR--Ken-----CM-YQACp-----s-----------qdCNK--KVID--------------------q-qngl-YRCEK-CdrefpnfkYRMMLLVTIADSlDY ENLg---hg--ekadyFTSVATIVYLR--Ken-----CL-YQACp-----s-----------qdCNK--KVID--------------------q-qngl-FRCEK-CnkefpnfkYRLILSANIADFgEN EHLg---hg--dkadyFSCIATIVYIR--Ken-----CL-YQACp-----s-----------kdCNK--KVVD--------------------q-qngm-FRCEK-CdkefpdfkYRLMLSANIADFgDN QNLg---qg--ekpdyFTVKGTILFVR--Ken-----CM-YMACp-----s-----------aeCNK--KVSE--------------------n-gdgs-YRCEK-CskdyenfkYRLLLSANVADStDN KNLg---ag--dkpdyFQVKALIHNIK--San-----AV-YKACp-----q-----------aeCNK--KVID--------------------q-dngq-YRCEK-CnadfpnfkYRLLVNMLVGDWtSN RNLg---sg--dkpdyFQCKAVVHIVK--Qen-----AF-YKACp-----q-----------adCNK--KVVD--------------------e-gngq-YRCER-CnaafpnfkYRLLINMSIGDWtSN MQFg---kds-dkgdyATVKAMITRVN--Ptn-----AL-YRGCa-----s-----------egCQK--KLVG----------------------engd-YRCEK-CnknmnkfkWLYMMQFELSDEtGQ MELg---y---knsdiYTVKATLNMIR--Men-----AI-YKACp-----s-----------enCKK--KLVD--------------------q-andm-YRCEK-CdkeypnyrYRLLANISLADWtDN KGLg---hs--ekgdyFQVKATILLVR--Sen-----AL-YKACp-----t-----------ddCNK--KVVD--------------------l-engm-YRCEK-CcrefpnfkYRLLVSMNIGDFsGN EQLg---mg--ekadyISVKGVCVYFR--Ren-----CM-YKACp-----s-----------eeCNK--KVIE--------------------e-dsgf-Y-CEK-CgrkypnykYRLILSAHLADFtGS ENLg---qq--ekadyFNLKATIIYIR--Ken-----LM-YKACp-----k-----------edCNK--KVID----------------------qggs-YRCEK-CnqtfpdfkYRLMISASIVDStGS ATVg---lpd-dksvaFQVTGTILYVK--Sdn-----IY-YQACp-----t-------------CNK--KVVE--------------------e-sdgs-YECQK-CaksykefkYRLLTSFSIGDFsGS ENLg---rs--ekgdfFSVKAAISFLK--Vdn-----FA-YPACs-----n-----------enCNK--KVLE--------------------q-pdgt-WRCEK-CdtnnarpnWRYILTISIIDEtNQ SNLg---rs--ekgdyFSVKAAVSFLK--Vdn-----FA-YPACl-----n-----------egCQK--KVIM--------------------q-sdgt-WRCEK-CdmnhphpkYRYMLTISIMDQtGQ EHSg---st--ekpdyFSIKASVTFCK--Pen-----FA-YPACp-----nlvqnadatrpaqvCNK--KLVF--------------------qdndgt-WRCER-CaktyeeptWRYVLSCSVTDStGH ENLg---tn--eapdyFALKATVVFIK--Qdn-----FA-YPGCr-----s-----------egCNR--KVTD--------------------m-gdgt-WRCEK-CqinhdrpqYRYIMSVNVNDHtGQ ENLg---md--dqa-yYTIKATIVFVK--Qen-----FC-YAACl-----s-----------qgCNK--KVTQ--------------------m-pdgt-WQCEK-CnlshekpeYRYVLSLNVADHtSH DNLg---vd--dvv-yFALKATVVYIR--Qen-----FA-YPSCl-----n-----------egCSK--KVTD--------------------l-gdgs-WRCEK-CdvnhprpeYRYIMSVNVNDHtGQ EQLg---ms--eeavyFSLKATVIYIK--Qdn---MSFA-YPACl-----s-----------egCNK--KVTE--------------------l-dpgq-WRCER-CdkthpqpdYRYIMHVNVSDHtGQ QHLg---ms--etpdyFSLKATVVYIR--Kkn-----IS-YPACp-----t-----------pdCNK--KVFD----------------------qggs-WHCEK-CnkdyeaphYRYIMTIAAGDHtGQ AGFg---qs--dkpdyFSTRATIIHIK--Ddn-----IA-YPACp-----t-----------qgCNK--KVIE--------------------e-adg--WRCEK-CekvfeapeYRYIMSMMVADHtGK ENLg---ms--ekpdyFNVRATVVYIK--Qen-----LY-YTACa-----s-----------egCNK--KVNL--------------------d-henn-WRCEK-CdrsyatpeYRYILSTNVADAtGQ SDL-----------tySTVQGTVMFLK--Edg-----LW-YTSCk-----g-----------egCNK--KVVM--------------------e-dggc-YRCER-CnmtyedcdYRYMVTMHLGDFsGQ KKLy---rtigqfqrvVPLSQAGEMDK--GdeISSKMEWKYKACk-----k-------------CKK--------------------------scpegs---CPQ-Cgsd--dweYAYRMSLKLSDGdDA TSEt-------sedvkANVYGYFTMFK--Vdn----GFC-YLSCp-----d-------------CKK--KIVE------------------------gs-TFCEK-Cqkdi-qpmRRFIVRASIADStSS Giardia Trichomonas Trypanosoma Leishmania Naegleria Plasmodium Theileria Cryptosporidium Tetrahymena Paramecium Thalassiosira Phaeodactylum Phytophthora Arabidopsis-A Arabidopsis-B Arabidopsis-C Oryza-A Oryza-B Oryza-C Physcomitrella-A Physcomitrella-B Chlamydomonas-B Chlamydomonas-C Ostreococcus-B Ostreococcus-C Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabditis Apis Tribolium Nematostella Trichoplax Monosiga Saccharomyces Kluyveromyces Candida albicans Neurospora Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilago Encephalitozoon Dictyostelium Entamoeba 140 150 160 170 180 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|. TRITIFdPHASTL-M-GmSADEFVEKGDeek----------LRLLSKVLEVDVAVKVQQGKYDg------rdRTNLIGIELAPS------------------yvtifDNLVGQLQA AYINVIgDENSFMPLINvKPEEFEIDDTtkl-------RTMLLKKSFFRALRVKVRGKNSEYGv----------KLTAISGGEV----------------n-faeeaLRIANNITA VWLTLFnESGAEF-F-GmTAPELKRRQEedp--mF---VTKVAQMRMNRPVLMRLRVKEEGLGg---nedseRVRLNVVRITEFmpldtvtedkrqamaaq-lrqecDEMIKCINA AWLTLFnEAGIEF-F-GmEAAELKRRAQedp--lY---IAKLAQGRMNRPVVMRLRVKEETSSnamtgeesdRLRMSVVRISEFmpiagtseetrrrlaqn-lrtecDEILRLIEA QYCNAFnQAVEKL-FSDmTADDMCA-RAaep--eY---MPYLLGEKTFTRYVFTVRVTTETTK-------epKLKFTIIRVTPI----------------d-yareaKSILSLIRD LRVSAFaNSAKTI-MNGlSAEEFMKLRQeyisqeNIENFD-LIEKAKLNEFFFRIKAYMTSHMd------eiKKNYTILETIPL--------------skl-lvdscRYLIKEIKL IWASAMaDVGESI-M-GiKAYNLINLMErgpsneNEKSFINYFEDARLTEYIFKIKATVENFMd------epRIKYRVLKATPL--------------dre-ldlaiKDRIENIKK LRAVAFaEHGESI-MDGlNVDQLESMRNnpe--kS---TEDIFADKNFSEWVFKLNGRKEVYQd------stILKYRIFGVEDM--------tspdvlnre-akkklEYVYSKLNG IWVSAFdEVGQKI-L-GvKGDVFRYADEdte--hGTETKKKLLMAAQNKEYRFLLLTKQERDQn-----gnaRDKTVIHAIKDF----------------q-payeaKKIINSLEK LSVSVAnDQGQQI-L-QlSCDEFQKKSQvdk--------DNYVKRANFQQFRFLIIGKVETYNd------eiRPRYYISTFIQD----------------d-ivsdnEELYNQIKQ SWISMFnEQAETL-FNGmTADNLYQQSIeqgdkdF---YDSTFLKATYTEWVFKCKVKQEMVGd------etRIKTSVASLVPV----------------d-yakesRALLSSL-TWVSFFnEQAETL-LAGaTADQVYAETYqdq--qDQDAYDSYFAKANHTEWIFKCKVKNEMVNe------esRVKTSVVAMQPV----------------d-fakesRDLLSALAK TWTTCFnDQGKVV-MGGrTADEIGELRDtnp--aL---FESIFKDALFKQYVCRLRVKAENVQe------elRVKASVMNLEPV----------------n-fvqesKDLLQAIAQ TWITAFqETGEEI-M-GcPAKKLYAMKYelekeeE---FAEIVRDRLFHQYMLKLKIKEESYGd------eqRVKMTVVKVDKV----------------n-ytsesKYMLDLLVR TWLSAFnDEAEKI-I-GcTADDLNDLKSeegevnE---FQTKLKEATWSSHLFRISVSQQEYNs------ekRQRITVRGVSPI----------------d-faaetRLLLQDISK TWATAFqEAGEEI-M-GmSAKDLYYVKYenqdeeK---FEDIIRSVAFTKYIFKLKIKEETYSd------eqRVKATVVKAEKL----------------n-yssntRFMLEAIDK AWVTAFqEAGQEL-L-GcSATELNALKEred--pR---FADTMLNCLFQEYLLRLKVKEESYGd------erKVKNTAVKVEKV----------------d-psgesKFLLDLISK AWLSLFnDQAERI-V-GcSADELDRIRKeeg--dDS--YLLKLKEATWVPHLFRVSVTQNEYMn------ekRQRITVRSEAPV----------------d-haaeaKYMLEEIAK TYASAFdEAGEQI-F-GrKAGELFSIRNvdqddaQ---FAEIIEGVRWHLYLFKLKVKEETYNd------eqSLKCTAVKVEKL----------------d-pskesNVLLGAIDN TWITVFqETGEEL-M-HhTAKELFLWSQdep--qR---FSEAIQKLTFMKHIFKLKVKEETYNd------eqRTKSTLVKVDPM----------------d-wisesKLML----AWVSAFnEQAESL-L-GvSADNLSEMRNqagddnQ---YQNAVRKAMWQPCVYRISAAQTEYMs------ekRQRLTVRTVVPV----------------d-wvaesKHLLAKITK TNVQLFgKEAEAV-M-GmRADELAALKEagg--eG---FAGALKAAQWKPWQVVVMSKAREYNg------nrSVRHSAYKVENI----------------d-wvsesSRLVTLIAK QSLTAFgDAGDAI-F-GrSATEVRNMEVdrp--qE---FDRLAESIRFTPFFFRLKVAEDNYNd------eqRIKVSIYKMER--------------------------------AWLNAFnEEATKM-F-GmTANEMHELKEndf--aA---YERAVKKMTCQHWSFLVKVVTEEYQg------esKRRMTAVKCNPV----------------n-yaaesKKLLSKMGV YWVSVFgDKGDKI-F-GiSAAEMKEIYDrep--eR---YENMISDALFNDYSLRVKVAVDNYTd------vpRAKGSLVEIERV----------------n-yvdmsKKLIGKIAK HWLNCYdEVGPII-FGGySAEELKRIKEtds--eE---YQRILEQAHFGEFLFRVRVRSDTYQd------emTFRHMVVGAEKI----------------n-yesemKMLESEIHN QWVTCFqESAEAI-L-GqNAAYLGELKDkne--qA---FEEVFQNANFRSFIFRVRVKVETYNd------esRIKATVMDVKPV----------------d-yreygRRLVMSIRR QWVTCFqESAEAI-L-GqNTMYLGELKEkne--qA---FEEVFQNANFRSFTFRIRVKLETYNd------esRIKATVMDVKPV----------------d-frdygRRLIANIRK QWVTCFqESAEAI-L-GqNTAYLGELKDkne--qA---FEEVFQNANFRSYTFKIRVKLETYNd------esRIKASVLDVKPV----------------d-yreygKRLIMNIRK QWVTCFqESAEFI-L-GqSATFLGELKDkne--qA---FEEVFQNANFNTYEFKIRVKLETYNd------esRIKATALDVKPV----------------n-yreysKRLIASIRR QWITCFqESAESI-L-GqNATYLGELKEkne--qA---YDEVFQNANFRSYTFRARVKLETYNd------esRIKATAVDVKPV----------------d-hkeysRRLIMNIRK QWVTCFqDTAETL-L-GqNSSYLGQLKDtne--aA---FDEVFQHANFNTFVFRNRVKLETYNd------esRIKVTVVDAKPV----------------d-hreysKRLIINIRK QWATCFqETAEQL-L-LkSAQELGSLKDqge--aTEKEFNQVFQDACFIDYMFRMRIKMETYNe------eaRLKCTCVSAQPI----------------n-vrdytNKLIKDIRL RWVTVFtDLAEQM-L-GkSSQDIGDALEfnk--dE---AEQIFSAINFKSYVFKLRTKVEFYGd------ssRNKTTAVAANPV----------------n-hkeynAYLIKNIQE RWVTCFsETGEQL-L-KhNAQEVGEALEndp--aA---AEKMFADINFSSYIFKLRCKNEMYGd------mtRNKLTVQSMTPI----------------n-ykeynKHLIKELKE VYVTAFgDSAAKI-V-GkSAAELGELHDesp--dE---YNAIFERLQFVPKMWRLRCKMDSYNe------evRQKMTVYGVDDV-------------nqdk-yienlKQMIEQMQQ QWVTAFnDEAEKI-L-SsTAQELGELKEndi--dA---YSEKFSEATFKSFIFKIRVKVEVFGd------enRLRATCLGVSPM----------------d-yklynNHLITQIKE QWVSVFsSEAEKI-L-GkTAQEIGLTMRdds--eA---GTAIFQAANFKQFIFKCRAKMENYNd------eqRLKIVVVKVDPV----------------n-yeeynGYLCEQIEA QWVTCFqESAEAL-L-GrSASDLGQMKEnqd--eAQ--FDQVFASSEFKLHTFKIRAKMETYNe------etRLKCSVVNVVPV----------------n-ykqesKRLIDEVKK NWLTFFqETGEAM-L-KcTAQQLGAWKEnde--sK---YEHTINEALFQSYILKVRAKMESFNd------enRLKCSCVNLTPM----------------d-yvqqsRRLLEGIRR QWLQSFsEVAESV-L-GhSADEIGSWSAnsd--pR---FTTALADATFKTWTFRCRARTDTYNd------qsRLRVSVASAVPI----------------d-yvqdsKRMV----LWLTLFdDQAKQL-L-GvDANTLMSLKEedp--nE---FTKITQSIQMNEYDFRIRAREDTYNd------qsRIRYTVANLHSL----------------n-yraeaDYLADELSK IWLTLFnDQAEQL-V-GvSANELTELKEnnn--qA---FVALTQKVQMNEYDFRIRAREDNYNn------etRIRYTVANLHDL----------------r-wkaeaDFLAAELLK MWVTLFnDQAEKL-L-GiDATELVKKKEqks--eV---ANQIMNNTLFKEFSLRVKAKQETYNd------elKTRYSAAGINEL----------------d-yasesQFLIKKLDQ LWLSCFdDTARVI-M-GkSADELMEIREtde--tR---LPAEFEQANCRKLNFRCRAKMDTFGe------qqRIRYQVMSVAPL----------------d-ykmegNKLNELINS QWLTGFdDFGRQV-M-GrTADEMMELKEndd--tK---LTAAFEEANCKKFTFRCRAKMDNFGe------aqRIRYQVMSVTPL----------------d-fksegTKLAELIKQ LWLSCFdDVGRII-M-GkSADELMALKDenf--eA---FTREFENANCRKLSFRCRAKMDTFGd------nqRVRYQVMGATKM----------------d-wkseaARLADLIKQ LWLSCFdDVGRSM-M-DiSANQLMELFQtde--kA---AGDVFQDANCRTWNFRCRAKIDHFGe------qqRIRYQVSSAKPI----------------n-ysheaGRLADLIGS LWLNVFdDVGRIL-M-GkTADELNAMQEnde--nE---FTSVMSDASYVPYVFECRAKQDNFKg------evRVRYTAMSVRNI----------------d-wkqesKRLVDLIKS AWFQGFnEVGVTV-Y-GmSANDLVQIKNndh--aQ---YKAIQYHAACNTYNFSCRAKEDEFNg------vrRVRFGISRLAKV----------------d-ykeeaGYLRDLLYS MWLSGFnEDATQL-I-GmSAGELHKLREese--sE---FSAALHRAANRMYMFNCRAKMDTFNd------taRVRYTISRAAPV----------------d-fakagMELVDAIRA MWVSLFdEVATSF-F-GiSAREMKVMSEeap--gE---LQALIRRMYFRECLFRIKSKQDSYNd------eiRMRYSGLSVENL----------------d-ilkesKRLLGVIEK ISVEVMgKTGDRL-F-GkSAAELYQMNQeq--------INEIFNTVLSNNYVVSLAPSSYMGSn-----gqtYNRFNVYDFANLydtpnklvsgpnpnapnqfqnyiESSFNDCRL IWVTIFdEEMKKI-I-GkSADEMYEINEqds--eL---FENLFKQLTFIECRFHLICKKDEYNg------etRTRFTVNFIQVL----------------d-niasgTEEMKSLER 80 EUKARYOTES Trichomonas Trypanosoma Leishmania Naegleria Theileria Cryptosporidium Thalassiosira Phaeodactylum Phytophthora Arabidopsis-a Arabidopsis-b Oryza-1 Oryza-2 Oryza-3 Physcomitrella Chlamydomonas Ostreococcus Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabditis Apis Tribolium Nematostella Trichoplax Monosiga Saccharomyces Kluyveromyces Candida albicans Neurospors Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilago Encephalitozoon Entamoeba 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|... VIRIIGKIL--SHEEQDATTDK---------YVLNDCS-GTIEAFESVDPSNVRE---------------------------------------------PFEDNRYVAITGMLK--FENDSKSLSIES---IEYADDYNRITYHALDTIHAH QATVVGRVI--GYEDDTTNRVTGALTAKHYGYRITDGT-GLVVVRQWMDADHQE---E------------------------------------------PLPVQCYVRASGTVKV-WQ--NAPIVTGT---VRLVSDCNELNYHYLDVILTH QATVVGRVL--GYENANMASGGGAITAKHFGYRITDNT-GMLVVRQWIDADRMQ---E------------------------------------------PLPLNTHVRASGTVNV-WQ--QNPIVTGT---VVSMADSNEMNYHMLDAILTH IVEVLGMVT--SINS-RNGFTT---------IQIDDCT-GKLDVKVFDESVNNNPFLKSEVEQIQYVLLYIHKCLFIIIINCSCHDLKIIINCVYYFNNQQCRVNKFVKAFGLVSE-YK-DRLSIKAQM---VRRVEDANEIYFHMLECIQVH IIKLVGYVK--DAKE-TEQDTS---------FVIDDGT-GTIECIHLSPGDISD---WKRSYISE-----------------------------------LTRTKSPVKIYGGFNPLYSSSSPTIIIYS---IKEVTSPEEIKLHNLDVIYSV LFKLVGFVRCAEHEE-YPQRVR---------FYLDDGS-GLI-LIDWLIDNTGTN--YKQELIN------------------------------------SITEGCFVKVYGELTL-MV-SEPSVRAFV---VRPLVCTDEISAHDIDVAVFI MVKVVVAVR--SHEE-RSTNLF---------LDIEDGT-GFTQAKVWVN-EGDE---CSGVVQLRQN---------------------------------ACKDHQYVRIIGQVRE-FD-GTRQIVAND---VRPVSSGDEITYHFLEVAHSY HVRFVAAVR--SFED-FSTNVV---------YTLEDGT-GLMEVKQWLDDNHCT---AIAEMRQH-----------------------------------TLKENIYLKVVGQIKE-YD-GKKMVVAES---IRVLSTGNELAHHMLEVVYAG --------------------VS-----------TDDGS-GAFDCQYFISADDDN---ASEGEMN------------------------------------RLREGSYVRVVGKLRT-FQ-GKASLSCFS---VNPVEDMNELTHHLLEVIYTH NVSLVGLVC--DKDESKVTEVR---------FTLDDGT-GRIDCKRWVS---ET---FDAREME------------------------------------SVRDGTYVRLSGHLKT-FQ-GKTQLLVFS---VRPIMDFNEVTFHYIECIHFY TVVIVGRIS--RMEN-RITQVD---------FVVDDGT-GWVDCVRWCH---AR---QETEEME------------------------------------AVKLGMYVRLHGHLKI-FQ-GKRSVNVFS---VRPVTDFNEIVHHFTECMYVH NVRLVGLVS--GKTE-RNTDVS---------FTIDDGT-GRLDFIRWVN---DG---ADSAETA------------------------------------AVQNGMYVSVIGSLKG-LQ-ERKRATAFA---IRPVTDYNEVTLHFIQCVRMH TVRLVGRML--NKLD-RVTDVS---------FTLDDGT-GRVPVNRWEN---DS---TDTKEMA------------------------------------DIQNGDYVIVNGGLKG-FQ-GKRQVVAYS---VRRITNFNDVTHHFLHCVHVH NVRVLGRVV--SVVS-RDTDVC---------FTLDDST-GKIPLVRWIT---DQ---SDTRDTS------------------------------------YIQEGVYVKVQVNLMG-FQ-AKKQGLARS---IRPINNFNEVVLHFIECMHVH NVTLVGMVH--DKDE-RNIDTS---------FMLDDTT-GRIEVKRWIDGQ-DS---YEYFEMQ------------------------------------SVQNFMYVRVHGHLRT-FQ-NKLNVVAFS---VRPITDFNEVTFHFLEVIHVH TVTILGKVT--SYRE-LSTRVQ---------LQLHDGT-ASMEVCSWVD---DA---DMQAQKPV-----------------------------------EWQVGKYVRVYGNLKT-FE-GKRSLTAFA---VKPVTDHNEVTYHFLQCVMQH NLTVVGKIV--GVES-KSSYVL---------YKVDDST-GVCDVKVWSDQDGDQ---TAE----------------------------------------PIEVGAYVRVYGSVKT-LA-NEHMIAAHTQQAVRKITDHNEVTFHMLEVVYAS -----DELR----EQ--PLDLL---------WLLDDRS-GEM---IWARMASTS---SSSLAA-------------------------------------LEQSGILVRVFGQLLE-VD-GRRVLNVRA---IRKADGEVELRYHENLCQLSK QVTIVGIIR--HAEK-APTNIV---------YKIDDMTAAPMDVRQWVDTDDTS---SENT---------------------------------------VVPPETYVKVAGHLRS-FQ-NKKSLVAFK---IMPLEDMNEFTTHILEVINAH QVTIVGIIR--HAEK-APTNIV---------YKIDDMTAPPMDVRQWVDTDDAS---GENA---------------------------------------VVPPETYVKVAGHLRS-FQ-NKKSLVAFK---IIPLEDMNEFTAHILEVVNSH QVTIVGIIR--QAEK-APTNIV---------YKIDDMTAAPMDVRQWVDTDDTS---SENT---------------------------------------VVPPETYVKVAGHLRS-FQ-NKKSLVAFK---ILPLEDMNEFTIHILETVNAH QVTVVGIVR--HAEK-APTNIL---------YKVDDMTAAPMDVRQWVDTDEAG---SENI---------------------------------------VVPPGTYVKVAGHLRS-FQ-NKKSLVAFK---IMPLENMNEFTTHILETVNAH QVTIVGIVR--HAEK-APTNIL---------YKVDDMTAAPMDVRQWVDTDEAS---CENM---------------------------------------VVPPGSYVKVAGHLRS-FQ-NKKSVVAFK---IAPVDDMNEFVSHMLEVVHAH QVTIVGVIR--STDK-STINIQ---------YKVDDMTAAPMDVKQWIDTEDMG---VDNS---------------------------------------VIPPGSYVKVSGNLRS-FQ-NNRSLVAFS---VRVLEDMNEVTSHMLEVVNAH QVG----------------------------FTMKDSS---------MDQQPT-----------------------------------------------VYEENTYIKVSGNVRA-FG-GKRSIGPFR---IAPIKDLNEISMHMAEVVQSH MVTFVAIVR--SVDH-SSTKIT---------YGLEDHT-GQVDAHLWLE-EGDT---NSVP---------------------------------------GMMTHSYARVFGSVRH-QG-GSKAVMIYK---IEQVSSPNDVTTHLLEVLNAR MACVVGIVR--NIET-SSTKIT---------YTLEDHS-GRIDAHYWLE-EGDT---LKAP---------------------------------------EVMVNNYVKVYGTTRS-QA-GQKTLMVFK---LLPILDPNEVCTHLLEVLNAR TVQTVGIVK--EINQ-EGTTWS---------YDLCDPNNEAMEYRALKYENEGS---NSDQS--------------------------------------SIVEGTRVRAIGKLKS-FD-GSNSIMLFN---ITPVTDDKDFTIFELEAEAAR MFTFVGLIR--NVEE-TATKIS---------YDIEDDT-GTITALKWLEANKQE---TDR----------------------------------------VAEVNTYVRIVGMLRE----QNDKLIYAS------LKAEAKLNK--------FADVVGVLK--DFEV-QTTKAT---------CTIEDHS-ASIKAIMWLETDNDT---VTALP--------------------------------------PVKENCYVRVFGSVRT-QD-GEKMIMILK---ILPVDDLNIVTNHLLEIIQAK QVSFIGVIR--SAEE-ASTNVV---------YHVNDMTGEDIVVKKWANDNEET---EQERERRA-----------------------------------ACRENTYVHVVGNLKW-FK-ESKSLIAFS---LMPLEDFNQLTCHILEVMQAH QITFVGVIR--SVTE-SAAYTQ---------YAVDDMTKSPISVRRWVDSEVSC---NMYS---------------------------------------TLADDTYVRVVGHLRA-LQ-GVRYVMAIN---IQPIEDCNEITYHILEVIHSH KVMIVGVIR--SVDA-RATRVT---------YTVEDHT-GAISATRWSSNAGDE---EESSAAPD-----------------------------------LYRENDYVQVVGQLRSDNE-NNLQLTAYN---ISKLTNGNQLTHHLISIVHAH HVCFVGVVR--NITD-HTANIF---------LTIEDGT-GQIEVRKWSEDANDL---AAGNDDSSGKGYGSQVAQ-------------------------QFEIGGYVKVFGALKE-FG-GKKNIQYAV---IKPIDSFNEVLTHHLEVIKCH HVSFVGVIR--NVAD-NTSNVT---------LTVEDGT-GQIEFRKWTNDSNDM---SHASQEDQNGDYNSQVAQ-------------------------DYSVGKYIKVYASLRE-FS-GKMNVQYAV---VKHIDSFNEILAHHLEVIKAF MISFVGVVR--NVEN-TNASIA---------VTIEDGT-GSIDVRKWVDET-------ISSAEEDFEKY-------------------------------NEMKGKYVYVGGSLKQ-FN-NRKTVQNAS---ISLITDSNQIVYHHLSAIEHH QVTIVGQVR--SVKP-QPTNIT---------YRIDDGT-GAIDVKKWVDSEAQG---GEDGGSGAG----------------------------------TIAPDAFVRVWGRLKS-LG-GKKHVSANF---IRQIEDFNEVNYHLLEATYVH QITFIGQVR--SVQP-QPTNIT---------LKIDDGT-GQIEVKKWIDVDK-----ADDSEA-------------------------------------GFELDSHIRIWGRLKS-FN-NKRHVGAHV---IRPVSDFNEVNYHMLEATYVH QITLVGQVR--SINP-QPTNIT---------YRIDDGT-GTIDVKRWIDPEK-----AEDADAAS-----------------------------------QHQPDSYVRVWGKLKA-FN-NRRHVGALF---VRPVEDFNEVNYHMLEVAYVH SICFIGQVR--NISS-QSTNVT---------YKIDDGT-GEIEAKQWIDSMTAD---SMDTDDINNTKAATGRRDG------------------------KVELNGYAKVFGKLKS-FG-NKRFVGAHC---VRPVKSLDEVHCHLLEASAVH QVTFVGVLR--NIHA-QTTNTT---------YQIEDGT-GMIEVRHWEHIDALS----------------------------------------------ELATDTYVRVYGNIKI-FS-GKIYIASQY---IRTIKDHNEVHFHFLEAIAVH QVTIVGQIL--SIQQ-QATNSV---------YAIVDGT-GTIEARQWLNTDTDG---SIQQ---------------------------------------GLKENIYVRVAGNLKA-FN-SRRYINTTH---IRPITDPHELYFHILESMTVT QLTFVAVVR--NISR-NATNVA---------YSVEDGT-GQIEVRQWLDSSSDD---SSKAS--------------------------------------EIRNNVYVRVLGTLKS-FQ-NRRSISSGH---MRPVIDYNEVMFHRLEAVHAH NVQIIGWVV--SSKT-SATGSM---------FVLEDGT-GSVDCTFWPG---NS---YEEEQCK------------------------------------VLEEQNLLKVNGSLRT-FN-GKRSVSASH---LSAVEDSNFVTYHFLSCIYQH TVVVCGRIT--SIDI-QNDVKR---------YTINDST-GSVVVGVYQTDSTEE----------------------------------------------NIEVGQYIKCVGKIKK-FS-QETYILASR---LPLVVDVNHMMTHLIECAYAL Figure 2.23: Multiple sequence alignment of RPA2 ssDNA binding domain (DBD-D) from 45 diverse eukaryotes. Genus names for Excavata are highlighted with brown, Chromalveolata with orange, Archaeplastida with green, Opisthokonta with purple, and Amoebozoa with blue. Shaded columns indicate amino acids are 75% identical. 80 81 EUKARYOTES Giardia Trichomonas Trypanosoma Leishmania Naegleria Plasmodium Theileria Cryptosporidium Tetrahymena Paramecium Thalassiosira Phaeodactylum Phytophthora Arabidopsis-A Arabidopsis-B Arabidopsis-C Oryza-A Oryza-B Oryza-C Physcomitrella-A Physcomitrella-B Chlamydomonas-B Chlamydomonas-C Ostreococcus-B Ostreococcus-C Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabiditis Apis Tribolium Nematostella Trichoplax Monosiga Saccharomyces Kluyveromyces Candida albicans Neurospora Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilogo Encephalitozoon Dictyostelium Entamoeba 10 20 30 40 50 60 70 80 90 100 110 120 130 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|... -------------------------------------------------------MDNHH----------LlYLILSrr---LQKM----------------------------------------------NVLEDILVKkvnvMEP--------IVQVVQIKECnqh-----------dLYRAALSDGTH----------FiPAMLGsk---LKDLIenk--viqRNSLIKLLK--YTVSNNs-------KQPLIVLNAELHK ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------MQGktmdK----------ITPCVQVTEArss----------gqKLIVTISDGIN----------ScPAVIIkp---SQ--------pvdQYAVIKIYE------SLkht-----KEIVIIVKYTVEE NFIYKFFTEpnstEALKWLNSEVNLICFSQMNAGgnqvflkvidgsippQYYAIVHLGVEdngnmdppiSYvKKIIKiqkfsITNYYgklfilakKVTYLNIES--FDIEDLfkkyhlqsISYLLLNAQNDYN DFFTSLADSinnaNEK--------YFQEVHECQTpvkllclqqsciadnKYLIETLDGSV-------pvEYkHTCLA-----LVPLTpegdrql-IGKVISFTQ--YKVAPTks------RYLVVLTRISVVP GVCDQLLSGipssNGS--------IVVILAINLIrsg----------prALVHVADAGSN---------IHeSGVPLsi---RLVMSdat--qfqPGDLIRIIR--FSLNEI-hs-----TKLVTVIQFEKVG NAIDSLINStnpdEQY--------VIQVLKAPQSvae-----------nLFKICISDGFC---------KFkKGYFVsd---AATKCq----dlkDLCIIKCKK--YIDDSN-hd-----KERIIISNYELIY EALKSMVEYqdntTQY--------CVQILNIVF------qddmkskngiLYQCICSDGFA-----------kMKMHIln---LSPSIlan--iskLNPIIRILE--LKLS----------QPFFIVIKHEVLY -----------------P------VVQVIHLKKIdks--------ggdeRWKVILSDGTL----------HvSGMLAtq---LNPLVass--qitTNSILTVKD--FIINTMgsg-----QKVCILLNVEVNG ---MTGNGGgpsfSP---------ILQVLDTKQVpgp--------qgsvRYRIVLSDGKH----------YiQGMLAtq---LNHMIatn--ligANTIVQVEQ--FMSNRVkd------RTVIILLNVHALR NAVSMLYNKqapdGFEP-------WLQVIDTKKIkpa------sgtggdRYRIVLSDGSS----------YiSGMLAtq---LAPMMese--slkTNFVLQLKD--FLVNEVqg------RRILIVLSIGDIV NAITAIHDGdvnlKP---------LLQVLEIKMIgrs------qersqeRYRFLISDGVS----------AqHAMVAvq---LNDRVksg--qfeKGSIVQLID--YICSDV-kg-----RKLIVVLNMETIV DGIATVLANqsldSSSVRPEI---VVQVVDLKPA-------------gnRYTFSANDGKM----------KiKAMLPat---LTSDIisg--kiqNLGLIRLLE--YTVNDIpgkse---EKYMLITKCEAVA GVVMKMLNGevtsETDMMP-----VLQVTELKLIqsk---lhqnqessnRYKFLLSDGTD----------LaAGMLNts---LNSLVnqg--tiqLGSVIRLTH--YICNLIqt------RRIVVIMQLEVIV NGVAAALAGdtnlKP---------VLQIVELRGVqvn----gagvtrgeRFRAVVSDGTA----------AsSALFAaq---LSDHArsg--alrRGSIVQLSE--YVINEVgp------RRIIVILNLEVLV GAVAFVLENaspdAATGVPVPEI-VLQVVDLKPIgt-------------RFTFLASDGKD----------KiKTMLLtq---LAPEVrsg--niqNLGVIRVLD--YTCNTIgekq----EKVLIITKLEVVF GAVQAIAEHpdgtGTIQP------VLQVVDVRPVttk--napptpkpaeRFRMMLSDGVN----------TqQSMLAta---LNPLVkda--tlrPGTVVQLTD--FMCNTIqg------KRIIIVVKLDVLQ NAIVALNNGdvelRP---------VLQIVDVRQIgns-------qttteRFRLVLSDGVH----------LqQAMLAtq---LNEKVknn--lavKGSIVQLLE--YICNTVqn------RKIIIVLNMEIVE ------------------------------------------------------------------------------------------------------------------------------------GDVARIKSKedfaNGV--------VLRVSELQEVggk------------KHKCMLSDGNN----------SiRGVLAsq---FADLVasg--elsNGCLIKITA--FVTNTIgs------DDVVLATDLSVVS ------------------------------------------------------------------------------------------------------------------------------------GAVNKIRESagatDV---------CVQVLDFKSAdea-------------YSATLNDGEN----------TiAAKFAat---CGEKLssg--avkENAVLKLTDVAFETDGVer------KPFAVINGFEVVD NAISNILEQthgsQDFKP------IVQVFDLKELktk----pdaddaakRFRVLASDGGF----------AaQGLFGae---LNAMCerg--eitKFTVLRLRE--YIVNDLng------RRILIVMDAEVMD GAVKSIYGMntvsRP---------VLQVQEVRKLqpsvaqqaqattsgdRYRVVLSDGEH----------LlHCVLMaq---LNSFVlsg--dldKGSIVRLVD--YQPNKVqd------RVVAIIINLEILE GAIAAIMQKgdtnIKP--------ILQVINIRPIttg--------nsppRYRLLMSDGLN---------TLsSFMLAtq---LNPLVeee--qlsSNCVCQIHR--FIVNTLkdg-----RRVVILMELEVLK GAIEVMIQQentsIKP--------ILQVINIRPIstg--------nrspRYRLLMSDGLN---------TLsSFMLAtq---LNTLVegg--qlaSNCVCQVHK--FIVNTLkdg-----RKVVVLMDLEVMK GAIGLIMQQgdttIKP--------ILQVINIRPIatg--------nsppRYRMLMSDGLN---------TVsSFMLAtq---LNVLVeee--rlsSNCICQVNR--FIVNTLkfg-----RKVVILMDLEVLQ GAIAAIMQGenvyKP---------VLQVINTRAIatg--------ngppRYRVLMSDGVN---------TLsSFMLAtq---LNPLVeee--rlsAHCICQVNR--FIVNSLkdg-----RRVVILMDLDVLK GAISAMLGGdsscKP---------TLQVINIRPIntg--------ngppRYRLLMSDGLN---------TLsSFMLAtq---LNSLVdnn--llaTNCICQVSR--FIVNNLkdg-----RRVIIVMELDVLK GAIESLSKGtevnNP---------ILQCVNIRKIdgg--------ngvsRFRVMMSDGLH---------TMsSFMLStq---LNPMAeqn--qlaTNCVCVLKR--SVTNVLkdg-----RRVVVILDIEVLK ------------------------------------------------------------------------------------------------------------------------------------GCIADIMRGteleKP---------VVQILGSKRIagg-------geqseRYRLLISDGQN---------LYsFAMLAtq---LNELHhng--qlaEFTVIRIDR--YITSVVnrnekge-KRVLIILDLHVVK GVIARIMNGedvsQP---------VLQILGIKRIntn--------sdqeRYRLLMSDGKY---------YNsYAMLAsq---LNEMQnrg--llnENTIVRLDK--YMTSMVgkegsg--KRVLIVTELTVLN GYVQEAIENngypGHDG-------IVQVLKGKVEqge------qlghafTFRIRISDGVF----------QyNALMSad---IDDQIkrevehlvEGTIIALTK--FEIYDQgega----KNCFLIKGYKILS GALDKIMNGidvdKP---------VLQILGHKKLsss--------ssgeRYRLLVSDGKR---------VNsFTMLAtq---LNSMIten--iltEFSICQINR--YAISMVnnagkq--KRVMVILNIDLKV GALLRIMKGqeveEP---------LVQVLVSKKIssr-------saeteRYRIWASDGDY---------SItYGILTlp---PGK-------pveDFSIIKLKK--FVKSEIsnakgp--QKILLIIDSEIVT GAIQEILNTpsdqPDRLPEQP---VFQILGLKKIqpk------qgdasdRYRLVLSDGVL---------IHtSAMLAtq---LNDKVtdg--eieVKAVVRLDK--YICNIIqet-----RKVLILLELTTVK ------------------------MMIFLQIRLFatv----kthqhrffKWLIVLSDGIH---------AYsSVMLAtq---LNQRVtsg--eldAKAIIKLNN--YTCNIVqet-----RKVLVILDLTVLT ----------------------------------------------------LSISDGKY---------KHnSAMLAtq---LNNLIqnd--yirVNSIVRVKQ--GVCNLVsn------RRILILLDVEVVA GDFHSIFTNkqryDNPTGG-----VYQVYNTRKSdga--------nsnrKNLIMISDGIY----------HmKALLRnq---AASKFqsm--elqRGDIIRVII--AEPAIVrerk----KYVLLVDDFELVQ GDLLDIFRIperyNNPTGG-----IYQVVQTKKTetn----------akKNLILINDGKY----------HvKALLRnk---AAEAAqqa--eleRGDVFKVLN--AECAVIkekk----KFVLLVDEIEIVS GALKQVFSKeghdSVQIPM-----ILQITNIKAFdvs-------psdskKFRILVNDGVY----------StHGLIDes---CSEYIknn--ncqRYAIVQVNA--FSIFATs-------KHFFVIKNFEVLA GALDAMFNDpdraQQQFPVP----ILQCLQIKTLdsk----nggagateRFRIVLSDLKN----------YvQCMMAtq---TNHLVhdg--llqRGCIVRLKQ--YQAQCLkg------KNILIVLDLEVIQ GALDVIFNDpdkaTKLFPVP----VLQCLQVKQMaps-------aqggdRFRLVMSDGQH----------YvQTMLAtq---ANHVVhdn--klvRGCFARIKQ--YTPNNLkg------KNILVILDIEVIE GAIAAIFNDpegvKTRFPVP----VLQCLQVKLLgqq-----pnagaaeRYRVVLSDVDN----------YiQCMLAtq---ANHVIhdd--qlqRGCIVRVKS--YQANTVkg------KSVLVLLDLEVIQ GALSAIFDDtkpqTREP-------VVQCVQIKPLpaq-------qshpeRYRAVFSDISN----------YvQTMLAtq---LNPMVssk--llrKGCFVRLKS--FQANSV-kg-----KKILIILDLEVLE GSLNKINTTsdpsEFPANP-----VLQVLTVKELnsn-----ptsgapkRYRVVFSDSQN----------YaQSMLStq---LNHLVmen--klvKGAFVQLTQ--FTVNVMke------RKILIVLGLNVLP GSCERLQFAnpqdASVFESPH---TIQFLSIKKVnta---npnsnapvdRYRIIISDGVH----------FiQAMLAtq---LNELVqnn--sigKHTVAVVER--ATCNYVqe------KRLIVVLELRVVA GAIAQMIQTsdpaSSSVQNP----VCQILSIKKIqas---atsaanvgdRYRIILSDGIN----------YaQAMLAsq---KRSMVesg--eleKNCLVRVTQ--FASNSVqn------RRILILLDLDVVH GTVEALYNSqannPLYKNP-----VLQITSLGKLsva-------igdkqRYRVNLSDGVN----------YmKGIFSse---LTPHFekg--lvsRYSLIRPGR--FSVRSKdg------SVYIYIQEIQAYE GCLTNIINVdqtkRFQNIDC----RVQVIKTKQSssn------------NLEIHLSDQDH----------IfVGISKtd---PS--------nipINSIVNLSD--FSINFP--------KRFLNIVQFNVIS GKIADVVRArkiiEPI--------IVQVSNVQSVekt----------qdVVKAKIHDNKY----------QiTTIFKlk---DTALLk----rlkDFMLIKVIQGSVHVPQNisk-----TLVVAISNFEIVD 81 Figure 2.24: Multiple sequence alignment of RPA1 ssDNA binding domain (DBD-F) from 54 diverse eukaryotes. Genus names for Excavata are highlighted with brown, Chromalveolata with orange, Archaeplastida with green, Opisthokonta with purple, and Amoebozoa with blue. Shaded columns indicate amino acids are 75% identical. 82 EUKARYOTES Trichomonas Naegleria Plasmodium Cryptosporidium Tetrahymena Paramecium Thalassiosira Phytophthora Arabidopsis Oryza Physcomitrella Chlamydomonas Ostreococcus Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Apis Tribolium Nematostella Saccharomyces Kluyveromyces Candida albicans Neurospora Gibberella Magnaporthe Aspergillus Schizosaccharomyces Coprinus Ustilago Encephalitozoon 10 20 30 40 50 60 70 80 90 100 110 120 130 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|.. --------METDHSFRVNSEYLG--QHKSKMVTIVGRIV--DKSSDPYIISTTDSKSVRVHKN--PSLNDKRFSAEWIEVTGRV---QENLD-IEESSAIPITS-----KIDPEAWNQMVKLSH------KFK-EIF MSNNDIQPYQPVSYPYVNAQTLR--NYIGKEVTLIGRVVSLVSDSDVFDVLTHEG-TVTIYHN--SPVSFD--ENAFVMIRGQGEDLNGSPS-LRSTFAQQLHT---TTDLDMEVFNNFILLAE-----GKFR-DLF --------MEQFIAPRVNKKHLS--KFYSKNVRIIGKVLKK--DGNELTLLACDNEEIKCILTD-NQVEEP--LDQYVEVLGKV---NEDDT-ISDIVYVQNGG----SSINLNEINNLVNLTF----LEELE-GVF ------MQSSIENARRVNKEELQ--NFVNKQVRFVGKVVSV--EGEIVILEAPDGGTVRCRTI--SPP-----PSTYVEVIAQV---MPDLSLTQTDFMFDLGD-----SLNMDLVNESIKVSF----HPKLR-QHW ----MDAEQEQVMYPRILFEQMA--QFRGKKVTVVGNVCNEDQNDSLVIEFGPTGLNQHVVIDNYRRVDLNN-TTKFVEIRGVV---LNQNI-VSCEELTEFEQ---KDPFDFDTYSKLIHLSQ----SDKLS-SLF ------MTENPTSFQRINADMIS--KFKGQYVTLVGKLIQ---SKGDYVEFSVDGTIVKVTEI--EEVPEST-EDILLEIRGKL---NEDGY-LEAKEFTELDQ-----TFDFELYKKVINMVQ-----GQFR-ELF -----MSSQPDGAFPRVNHALIKQGQYIGLIVSVVGRTVNFD-GQSNLEIECSDGGRVTITVD--PEYNYV--PGQVLEIMGHL---MDENT-IQ----VGWGG-----GVGVALLCNDTDVS--------------MDFGGAPTGNATSPRVNKKTMG--AYVGHTVALVGAVESH--SPTAVVLRTSDGEIVNVKTQ--PGTDYG---SKVVEVIGRV---EDSET-IREFKTTLFGD-----NFDLDVYDQFVQLSQ-----TKYK-HLF -------MDTSSPSAFVNGALLR--RFIGQKVRTVIQVT--GSEIGSVVGKSTDDLQIVVRGS--SPP-SP--LTTYLEVIGIA---ESDNA-IRAETWTNFGN-----TFDTQNYNELCKLAN-----GEFK-HLF -------MDTSGPAAFVNGEILK--MFVGRRVRTVVQAQRE--EGGLLIGQSTDGHQLTIKGA--SGAP----MSHYVEIIGIA---EPNQA-IRAEVCTDFGE-----NFDPAPFNGLCKLAN-----GQMK-DLF ------MADISNPRPMVNSKLLK--NYMGRRVTTVVKVA--RTEGGNVVGELPDGAPITVKQA--PQHVAA--QSQFMEVIGVV---EGDRS-LRAETCTSFGD-----NFDMSTYNDLCQLAN-----IENR-ECF SNISDKMAGVDEAIPRVNFETMQ--RYHGRKVILCCQISQID-NGTVRVTTSDKGEVTVVGGS--SPYE-----GRFAEVVGTV---VGPTN-IQEVEHTNLSD-----NFSLDMYNELVKLAHKDAYIGMFS-TIR -------MDDSAPRPRVNGEALV--NFIGKTVLVVGEVT--PRDANSATVKTADDKMITVNLA--GAG-AF--GSKYVEFEATV---DGADC-VTECSRVEFGD-----DFDQYSYGELCKLIN-----GKSK-ELF -----MVDMMDLPRSRINAGMLA--QFIDKPVCFVGRLEKIHPTGKMFILSDGEGKNGTIELM--EPLDEE--ISGIVEVVGRV---TAKAT-ILCTSYVQFKE--DSHPFDLGLYNEAVKIIH------DFP-QFY -----MEDIMQLPKARVNASMLP--QYIDRPVCFVGKLEKIHPTGKMFILSDGEGKNGTIELM--EPLDEE--ISGIVEVVGKV---TAKAT-VLCASYTLFKE--DTNRFDLELYNEAVKIIN------ELP-QFF -----MAEVLELPRTRIGAAHVA--SFIDRPVCFVGRLEKIHPSGRSFTLTDGEGAQVTVELA--QPLEEE--ISGVLEVVGRV---TAKAT-ILCSSYVLFRD--HNHSFDLRLYNDALKIIH------EFP-QFY -----MGDVHEAPRPRIAAAQLV--QHIGRPVCFVGRVEKIHPTGKLIVLSDGEGCNATVELS--EPLDEE--ISGILEVVGRV---TNQAT-IMCTSYVQFRE--DKSPFDLELYNEALKIIH------EFP-EYF -----MADLFDVPKVRINTSMLA--QNVGRPVCFVGKVEKVHPTGTSIVVSDGAGKNATVELN--EPLEEE--ISGIIEVIGKV---TPKAT-IMGVSYVPFRE--DVSTFDLALYDEALKIIH------EFP-QYY -----MTGVYESPKTRINTSMLS--QYISRPVCFVGRLEKVHPSGKVLTLVDGEGKSASVELN--EPLDEE--LSGIVEIIGMV---SNKGA-IMATSYTQYRE--DKVPFDLELYNEGLKVLH------DFP-QHY ------MDAFKQPKPRVNGSMLP--KHQGSIVCLLGLLKNVDPNGTSLTLTLSDGVDAQVNLQ--TPLDRP--IEGLVEVVGQV---GANPRQIKGLNLISHGQ---------KDFGEICSIKQ------EFTLHTK --------MEFEPRSIVNGSLLK--RHSGQSVSIHLFVEKGDKDGRSFVGKSTDGMPIQVMLS--APLSQI--LHGWVEVIGMA---GSNDS-VRCKEIITYTGSEDGEEFDTDGHNMLCNFLA------NCR-DMY -------MDAFDPRSIINGGMLK--QFSGQTVSIMVRVESV--AGSTLLASSTDNHKLKINLP--GELGAA--EGAWVEVIGVP---HGADT-LRAKEVIEFGG--ENIDFDKDGYNGLSHLIN------NVK-AFY ------------MKKRIDGRRLA--QNIGEQVILLGTIGKKSSNGRNLELRTTDGVQVNITLP--EPIDGN--AEGYIEVHGTL---QSKST-MNCSNYIVFPL-SLTEEFDADQYNELMIILN-IVGVEKLT-ECE ------MSTRNPLYNIVSGAQIA--GFVGKNVAVCGLVNGAHVGDKTFTLRSSDGVLVPVELN--KPLTED--IEGYVEVKGVC---QQSKT-IRADEFCTFNN----EKFDSSNHTKLCKILN------SLP-NVY -----------MDAPRVNASMMK--QYSGRLVCFVGSVSEINSTGTELKMLSSDDKMIHVVLP--EPLDEA--LQGVVEVVGRV---ERDLT-ISAQRIISYAG---REEFDLSLYNEAITLAA------GFP-EMF ---------MASETPRVDPTEIS--NVNAPVFRIIAQIKSQPTESQLILQSPTISSLNNIRVS--MNKTFE--IDSWYEFVCRNNDDGELGFLILDAVLCKFKE---NEDLSLNGVVALQRLCK------KYP-EIY ATSNQQPLVMSNQTPRIDPSQIS--NTQHSVFRIIAKVLDQPQPKELILQSPTTNGLSQVKLS..SNIE--VGSWYEFVCRNVDTGDIGLMVLDSVKCELKE---GEEISVSGIVALQQLSG------KFP-DLY ---------MEASNIRIDGTLLQ--ANKNKLVRVMGKCESFDHASNQAIIVCNGTIKLDLSQV--TDSPLE--IHKNYEIIGKV---SGDELKIFVYSVIELSD-----NLDINAASKLAQYAQ------KVS-ELY -------MDNKSSTPRITCAYLS--QYVGKLVTVVGKVVQLRGEEATI---DADG-TIHAFLN--REAHLS--ANNGVQLIGKV---NPDLS-IKVLSSVDLGQ-----GVDYNLANAVVEVTH------RYK-PLF -------MSEQLSTPRITAAYLD--NFVGRVVMLVGEVTQLRGDQATV---ESDG-TVTVLLN--RDAHLS--NGNYVQVIGKV---NPDLS-IKVLTSRDLGNSVDHGPFSQQTYDEDSQLSHIPSAQPYTP-PGW --------MEQTSTPRVNCGLLD--SYVGRNVMVVGRVQQL---RGDVALIDADG-NVTANLN--RDSHLL--VGNAAQIIGKV---NPDLT-IKVLSSHDLGP-----NVDMNVSRAVVETSQ------KLK-ALF ---------MSLQTPRVLPSHLH--AFSAPPVRLLGTVTAL--HGDTATITCGTHGDVTLILK--PDSHLQ--MGKLVEVVGKVAEIDGGLG-IRVLATTDWGN---PADCDYKIYEKVVDVTH------RLK-PIF ---------MERPTPRVTKDMLP--ECSGKTVRIVGKANQV--EGETAKVDSNGSFDMHLTVD--NTLE----PNHFYEFVVSV---KPDSS-VQLLTCVDFGT-----DIDMEVYQKLVLFSH------KYN-SLF ----MSNGIEQEVTPRVNSALLS--NFQGRTIRLACKLVKFN-DNGSLTVSAADGGQVVVQLV--GEHEPI--SDTYLEIVGKV---MDPTT-IQMRGCIGLGA-----DLDMKLVNDTINLIH----DERFYGRMF ---------MEKPTPLINSSMLG--QYVGQTVRIVGKVHKV--TGNTLLMQTSDLGNVEIAMT--PDSDVS--SSTFVEVTGKV---SDAGSSFQANQIREFTTVDCGHDVDLTLVENVVQISA------AFP-NLF ---MLFLVSYLVPMYSVDVE-----NCEGQDVVVIGRLERV---EDGVVVLKCMGREVQVRH---QGVELY--RPGLVRVRGTV----ENGV-LVESSVRPVGG-----EFDMEVYGRFVAIAA------KYP-DLF Figure 2.25: Multiple sequence alignment of RPA3 ssDNA binding domain (DBD-E) from 36 diverse eukaryotes. Excavata are highlighted with brown, Chromalveolata with orange, Archaeplastida with green, and Opisthokonta with purple. Shaded regions indicate amino acids are 75% identical. 82 82 83 Table 2.2: Protein sequence comparisons between Saccharomyces cerevisiae and Homo sapiens. Number undetected Protein Length (aa) Identity S-W score expected (90%) A190 1664 0.38 3611 0 0.01 A135 1203 0.44 3185 0 0.02 AC40 335 0.46 1038 0 1.38 AC19 215 0.44 612 0 3.24 AC12.2 131 0.65 382 1 5.13 Rpb5 142 0.45 364 3 5.32 Rpb6 70 0.74 354 1 5.43 Rpb8 146 0.36 289 0 6.18 Rpb10 125 0.36 257 4 6.59 Rpb12 70 0.41 13 5 10.73 RPA1 621 0.32 1161 0 1.08 RPA2 273 0.26 255 5 6.62 RPA3 110 0.14 73 10 9.52 Rad52 471 0.24 550 12 2.96 Rad59 238 0.10 93 27 11.12 Rad51 400 0.66 1538 1 0.17 Rad55 406 0.17 98 12 10.96 Rad57 460 0.26 295 5 6.19 Dmc1 334 0.54 1178 10 0.48 Hop2 218 0.21 161 8 9.13 Mnd1 219 0.28 328 5 5.63 Rad54 898 0.45 1984 10 0.05 Rdh54 958 0.36 1524 15 0.18 Note: Yeast RNA Polymerase I, Replication Protein A, and strand exchange component amino acid lengths, their identities to human, Smith-Waterman scores, and the observed numbers of absences among 34 taxa with at least 8.0x whole-genome shotgun sequencing coverage (except for RPA3 and Rad59 in which H. sapiens was compared to Candida albicans) are shown. Proteins in bold function only during meiosis in model organisms. Number undetected observed 84 Table 2.3: Protein sequence comparisons between Homo sapiens and Oryza sativa. Number undetected RPA1 616 0.34 1292 0 RPA2 270 0.30 340 5 RPA3 121 0.21 81 10 Rad52 399 0.37 343 12 Rad51 339 0.69 1585 1 Rad55 280 0.26 199 12 Rad57 346 0.36 464 5 Dmc1 340 0.63 1321 10 Hop2 217 0.38 461 8 Mnd1 205 0.42 539 5 Rad54 747 0.47 1849 10 Rdh54 910 0.40 1638 15 Note: The lengths of Homo sapiens protein sequences, identities to Oryza sativa protein sequences, Smith-Waterman scores (except for RPA3 and Rdh54 which were compared to Physcomitrella patens, and Rad52 which was compared to Cyanidioschyzon merolae), observed numbers of absences among 34 taxa with at least 8.0x whole-genome shotgun sequencing coverage are shown. Proteins in bold function only during meiosis in model organisms. Protein Length (aa) Identity S-W score 85 Table 2.4: Protein sequence comparisons between Oryza sativa and Saccharomyces cerevisiae. Number undetected RPA1 656 0.31 1037 0 RPA2 279 0.24 275 5 RPA3 106 0.13 Unalignable 10 Rad52 318 0.42 394 12 Rad51 339 0.65 1455 1 Rad55 280 0.33 80 12 Rad57 290 0.31 229 5 Dmc1 344 0.54 1125 10 Hop2 227 0.23 160 8 Mnd1 207 0.28 255 5 Rad54 980 0.43 1562 10 Rdh54 1122 0.35 1277 15 Note: The lengths of Oryza sativa protein sequences, identities to Saccharomyces cerevisiae protein sequences, Smith-Waterman scores (except Rad52, Rad55, and Rdh54 which were compared to Cyanidioschyzon merolae, Chlamydomonas reinhardtii, and Physcomitrella, respectively), observed numbers of absences among 34 taxa with at least 8.0x whole-genome shotgun sequencing coverage are shown. Proteins in bold function only during meiosis in model organisms. Protein Length (aa) Identity S-W score 86 Figure 2.26: Phylogenetic distribution among eukaryotes of RNA Polymerase I core complex subunit genes. The names of genera, the numbers of completed or nearly completed genome projects available for those genera, and the whole genome shotgun equivalent coverage of the most complete genome project is listed, except for Oryza, Mus, and Kluyveromyces, which were unavailable, and Cyanidioschyzon and E. cuniculi, which were sequenced from end to end with BAC and PCR. Grey regions indicate subunits shared by RNA Polymerase II or III. Supergroups are presented with white text on black background with a summary of the genes present. Symbols: ‘+’ indicates sequence was found and phylogenetically verified, ‘ (-)’ indicates that sequence was not found and may be outside the calculated threshold of detection, blank spaces indicate sequences were not found and the genome project has less than the equivalent of 8.0X whole genome shotgun coverage. The tree is a cartoon that summarizes current literature (Simpson, Inagaki, and Roger 2006; Baldauf 2008; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Kolisko et al. 2008; Timmermans et al. 2008; Minge et al. 2009; Reeb et al. 2009; Shadwick et al. 2009). 87 EXCAVATA Giardia (3)(11.3X) Trichomonas (1)(7.2X) Trypanosoma (4)(8.0-10.0X) Leishmania (6)(5.0X) Naegleria (1)(8.6X) CHROMALVEOLATA Plasmodium (8)(8.0X) Theileria (2)(8.0X) Cryptosporidium (4)(13X) Tetrahymena (1)(9.1X) Paramecium (1)(8.0X) Thalassiosira (1)(12.8X) Phaeodactylum (1)(10.4X) Phytophthora (3)(9.0X) ARCHAEPLASTIDA Arabidopsis (2)(8.0X) Oryza (1) Physcomitrella (1)(8.1X) Chlamydomonas (1)(12.8X) Ostreococcus (4)(8.8X) Cyanidioschyzon (1) OPISTHOKONTA HOLOZOA Homo (3)(5.1X) Mus (2) Monodelphis (1)(6.8X) Gallus (1)(6.6X) Xenopus (2)(7.7X) Danio (1)(10X) Strongylocentrotus (1)(8.0X) Aedes (1)(7.6X) Drosophila (12)(8.9X) Caenorhabditis (2)(10.0X) Apis (1)(7.5X) Tribolium (1)(7.0X) Nematostella (1)(7.8X) Trichoplax (1)(8.1X) Monosiga (1)(8.4X) FUNGI Saccharomyces (9)(10.2X) Kluyveromyces (3) Candida albicans (2)(10.0X Neurospora (3)(8.6X) Gibberella (2)(10.0X) Magnaporthe (1)(7.0X) Aspergillus (7)(8.9X) Schizosacc. (4)(11.8X) Coprinus (1)(10.0X) Ustilago (1)(10.0X) Encephalitozoon (1) AMOEBOZOA Dictyostelium (2)(8.3X) Entamoeba (5)(8.0X) A190 A135 AC40 AC19 A12.2 Rpb5 Rpb6 Rpb8 Rpb10 Rpb12 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + + + + + (-) + + + + + + + + + (-) + + + + + + + + + + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + (-) + (-) + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + (-) + + + + + + + + + + + + + + + + + + + + + + + (-) + + + + + (-) 88 Figure 2.27: Number of detection failures for RNA Polymerase I, RPA and SE proteins as predicted by Poisson regression analysis compared with observed numbers of detection failures. (a.) Poisson regression analyses were performed using the numbers of failures to detect RNA Polymerase I subunits (A190, A135, AC40, AC19, AC12.2, Rpb5, Rpb6, Rpb8, Rpb10, and Rpb12) among 34 genera with at least one genome of 8.0X wholegenome shotgun sequencing coverage (or sequenced from end-to-end) relative to their Smith-Waterman scores. The predicted numbers of failures relative to Smith-Waterman scores (black dots) are plotted with Wald 90% confidence limits (green dots). The observed numbers of RNA Polymerase I subunit detection failures are indicated with open circles. (b.) The numbers of Replication Protein A (RPA1-3) subunit detection failures observed (open circles) compared with the Poisson regression predictions obtained from analyses of the RNA Polymerase I dataset. (c.) The observed numbers of detection failures among strand exchange components (Rad59, Rad52, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, Rad54, and Rdh54) (open circles) compared with Poisson regression predictions calculated from a combined RNA Polymerase I and Replication Protein A dataset. 89 a. b. c. 90 Table 2.5: Saccharomyces cerevisiae strand exchange gene mutant phenotypes, suppressors, and meiotic functions of their products. Mutant Phenotype Mitosis Meiosis Abnormal growth, and arrest rpa1 mutations result in S- or M- phases (Brill reduced sporulation Rpa1 during and Stillman 1991), UV and efficiency, severely reduced Rpa2 MMS sensitive, and deficient spore viability, and defective Rpa3 in homologous recombination recombination (Soustelle et (Umezu et al. 1998) al. 2002) Increased sensitivity to Reduced ability to sporulate, ionizing radiation (Game and greatly reduced spore Mortimer 1974; Saeki, viability (Game and Rad52 Machida, and Nakai 1980) and Mortimer 1974), and reduced reduced spontaneous meiotic recombination recombination (Petes, Malone, (Petes, Malone, and and Symington 1991) Symington 1991) Gene Increased sensitivity to IR and mildly defective recombination Slightly reduced sporulation Rad59 (Bai and Symington 1996; efficiency and spore viability Davis and Symington 2001; (Bai and Symington 1996) Davis and Symington 2003) Meiotic Function Form heterotrimeric complexes that bind ssDNA and recruit Rad52 to the Rpa-ssDNA complex (Firmenich, Elias-Arnanz, and Berg 1995; Gasior et al. 1998; Hays et al. 1998) Forms heptamers that mediate displacement of RPA from ssDNA and recruits Rad51 (Shinohara, Ogawa, and Ogawa 1992; Milne and Weaver 1993; Hays, Firmenich, and Berg 1995; Sung 1997; Octobre et al. 2008) Forms homomeric rings or heteromeric rings with Rad52, functions partially overlap with Rad52, may stimulate or augment Rad52 functions (Bai and Symington 1996; Davis and Symington 2001; Pannunzio, Manthey, and Bailis 2008) Suppressor None found Rad51 overexpression (Milne and Weaver 1993; Schild 1995; Krejci et al. 2002) Rad52 overexpression (Bai and Symington 1996) 90 91 Table 2.5: Saccharomyces cerevisiae strand exchange gene mutant phenotypes, suppressors, and meiotic functions of their products. - continued Gene Mutant Phenotype Mitosis Increased sensitivity to ionizing radiation (Saeki, Machida, and Nakai 1980) Rad51 and reduced spontaneous recombination (Petes, Malone, and Symington 1991) Dmc1 None (Bishop et al. 1992) Increased sensitivity to ionizing radiation (Game Mortimer 1974) and Rad55 and reduced spontaneous Rad57 recombination (Petes, Malone, and Symington 1991) Meiosis Decreased recombination, reduced spore viability (Petes, Malone, and Symington 1991), and failure to form Dmc1 foci (Bishop 1994) Defective recombination and accumulation of double-strand break recombination intermediates, failure to form normal synaptonemal complexes, and arrest late in prophase (Bishop 1994) Reduced spore viability (rad55 <25% and rad57 <3%) (Game and Mortimer 1974), decreased recombination, and failure to form Rad51 foci (Petes, Malone, and Symington 1991; Krogh and Symington 2004) Meiotic Function Suppressor Forms helical filaments on ss- and dsDNA, catalyzes strand exchange, causes ssDNA extension and dsDNA rotational transition, may recruit Dmc1 to the pre-synaptic filament during meiosis (Nishinaka et al. 1998; Krogh and Symington 2004; Lopez-Casamichana et al. 2008) None found Meiosis-specific protein with function similar to Rad51 (Bishop et al. 1992; Bishop 1994; Bishop et al. 1999; Sehorn et al. 2004; Sauvageau et al. 2005) Rad54 or Rad51 overexpression (Bishop et al. 1999; Tsubouchi and Roeder 2003) Form heterodimers, stabilize Rad51-ssDNA pre-synaptic filaments (Hays, Firmenich, and Berg 1995; Bai, Davis, and Symington 1999; Bleuyard, Gallego, and White 2006; Filippo, Sung, and Klein 2008) Rad51 or Rad52 overexpression (Hays, Firmenich, and Berg 1995; Johnson and Symington 1995; Schild and Wiese 2009) 91 92 Table 2.5: Saccharomyces cerevisiae strand exchange gene mutant phenotypes, suppressors, and meiotic functions of their products. - continued Mutant Phenotype Mitosis Meiosis Defective recombination and None Hop2 inappropriate pairing (Leu, Chua, and Roeder 1998; Mnd1 of homologs (Leu, Tsubouchi and Roeder 2002) Chua, and Roeder 1998; Chen et al. 2004) Increased sensitivity to ionizing radiation (Game and Mortimer 1974; Saeki, Machida, and Nakai 1980) and MMS (Klein 1997), reduced sister 30-100% reduced Rad54 chromatid recombination (Petes, spore viability (Game Malone, and Symington 1991), and and Mortimer 1974) accumulation of Rad51 foci (Arbel, Zenvirth, and Simchen 1999; Shinohara et al. 2000) Gene sporulation MMS sensitivity and Reduced Rdh54 Diploid-specific and spore viability reduced growth (Klein 1997) (Klein 1997) Meiotic Function Suppressor Form heterodimers, stabilize presynaptic filaments, capture dsDNA only during meiosis in model organisms (Tsubouchi and Roeder 2002; Chen et al. 2004; Henry et al. 2006) Rad51 overexpression (Henry et al. 2006) Forms homodimer/oligo, stimulates D-loop formation, dissociates Rad51dsDNA complex (Petukhova, Stratton, and Sung 1998; Petukhova et al. 1999; Kiianitsa, Solinger, and Heyer 2002) rad52, rad51, rad55, rad57 functional mutations (Klein 1997) Translocation activity stimulates Dloop formation and displacement of recombinational intermediates (Chi et al. 2006). Functions in diploidspecific mitotic recombination and is required for complete meiotic viability (Klein 1997) rad52, rad51, rad55, rad57 functional mutations (Klein 1997) 92 93 Table 2.6: The most complete genomes of the genera searched during this study with web addresses. Taxon Trichomonas Giardia Naegleria Trypanosoma Leishmania Plasmodium Theileria Cryptosporidium Tetrahymena Paramecium Thalassiosira Phaeodactylum Phytophthora Arabidopsis Oryza Pyscomitrella Chlamydomonas Ostreococcus Cyanidioschyzon Homo Mus Monodelphis Gallus Xenopus Danio Strongylocentrotus Aedes Drosophila Caenorhabditis Apis Tribolium Nematostella Web address http://trichdb.org/trichdb/ http://giardiadb.org/giardiadb/ http://genome.jgi-psf.org/Naegr1/Naegr1.home.html http://tritrypdb.org/tritrypdb/ http://tritrypdb.org/tritrypdb/ http://plasmodb.org/plasmo/ http://www.sanger.ac.uk/Projects/T_annulata/ http://cryptodb.org/cryptodb/ http://www.ciliate.org/ http://paramecium.cgm.cnrs-gif.fr/ http://genome.jgi-psf.org/Thaps3/Thaps3.home.html http://genome.jgi-psf.org/Phatr2/Phatr2.home.html http://genome.jgipsf.org/Physo1_1/Physo1_1.home.html http://www.arabidopsis.org/ http://www.plantgdb.org/OsGDB/ http://genome.jgipsf.org/Phypa1_1/Phypa1_1.home.html http://genome.jgi-psf.org/Chlre4/Chlre4.home.html http://genome.jgipsf.org/Ost9901_3/Ost9901_3.home.html http://merolae.biol.s.u-tokyo.ac.jp/ http://genome.ucsc.edu/cgi-bin/hgGateway http://uswest.ensembl.org/Mus_musculus/Info/Index http://www.broadinstitute.org/mammals/opossum http://genome.ucsc.edu/cgi-bin/hgGateway?org=chicken http://genome.jgi-psf.org/Xentr4/Xentr4.home.html http://www.sanger.ac.uk/Projects/D_rerio/ http://www.hgsc.bcm.tmc.edu/project-species-oStrongylocentrotus%20purpuratus.hgsc?pageLocation=Strongylo centrotus%20purpuratus http://www.nd.edu/~dseverso/genome.html http://flybase.org/blast/ http://www.wormbase.org/ http://www.hgsc.bcm.tmc.edu/project-species-iApis%20mellifera.hgsc?pageLocation=Apis%20mellifera http://www.hgsc.bcm.tmc.edu/project-species-iTribolium%20castaneum.hgsc?pageLocation=Tribolium%20casta neum http://genome.jgi-psf.org/Nemve1/Nemve1.home.html 94 Table 2.6: The most complete genomes of the genera searched during this study with web addresses. Continued Taxon Monosiga Trichoplax Saccharomyces Kluyveromyces Candida albicans Magnaporthe Neurospora Gibberella Aspergillus Schizosacc. Coprinus Ustilago Encephalitozoon Dictyostelium Entamoeba Web address http://genome.jgi-psf.org/Monbr1/Monbr1.home.html http://genome.jgi-psf.org/Triad1/Triad1.home.html http://www.yeastgenome.org/ http://www.genome.jp/keggbin/show_organism?org=kla http://www.candidagenome.org/ http://www.broadinstitute.org/annotation/fungi/magnaporthe/ http://www.broadinstitute.org/annotation/genome/neurospora/Multi Homhttp://www.broadinstitute.org/annotation/genome/fusarium_ver ticillioides/Info.htmle.html http://www.broadinstitute.org/annotation/genome/fusarium_verticilli oides/Info.html http://genome.jgi-psf.org/Aspni5/Aspni5.home.html http://www.sanger.ac.uk/Projects/S_pombe/ http://www.broadinstitute.org/annotation/genome/coprinus_cinereus/ MultiHome.html http://www.broadinstitute.org/annotation/genome/ustilago_maydis/H ome.html http://www.genome.jp/keggbin/show_organism?org=ecu http://genome.jgi-psf.org/Dicpu1/Dicpu1.home.html http://www.sanger.ac.uk/Projects/Comp_Entamoeba/ 95 CHAPTER 3 PHYLOGENETIC ANALYSIS OF RECA HOMOLOGS RAD51 AND DMC1 FROM ALL SUPERGROUPS PROVIDES EVIDENCE FOR MEIOSIS IN THE LAST COMMON ANCESTOR OF EUKARYOTES Background: Genetic recombination is necessary for repair of DNA double-strand breaks, introduced during replication or exposure to mutagens, in prokaryotes and eukaryotes (West 1992; Bishop 1994; Sandler et al. 1996). Among eukaryotes, recombination is also necessary for repair of DSBs introduced during meiosis to ensure accurate pairing and segregation of chromosomes to opposite spindle poles during the first meiotic division (Bishop et al. 1992; Grishchuk et al. 2004). Eubacterial recA, archaebacterial RadA, and eukaryotic Rad51 and Dmc1 genes are orthologs whose products are important because they catalyze homologous DNA strand exchange during recombination (Stassen et al. 1997; Lin et al. 2006). Rad51-ssDNA nucleoproteins seek out homologous Rad51dsDNA complexes, promoting DNA strand exchange (Krogh and Symington 2004). Dmc1 functions similarly, promoting interhomolog DNA strand exchange but only during meiosis in model organisms (Bishop et al. 1992; Paques and Haber 1999; Symington 2002; Krogh and Symington 2004). Saccharomyces cerevisiae and Arabidopsis thaliana rad51 mutants display increased sensitivity to DNA damaging agents and diminished sporulation or fertility, as a result of reduced mitotic recombination (Bishop 1994; Bleuyard, Gallego, and White 2006). Among vertebrates, rad51 mutants have a lethal phenotype, indicating a possible dependence upon recombination during growth and development (Tsuzuki et al. 1996). Homologous recombination during meiosis is reduced or eliminated among dmc1 animal, fungi, and plant mutants (Bishop et al. 1999; Tsubouchi and Roeder 2003). Available animal, fungal, and plant Rad51 and Dmc1 protein sequences are highly conserved, with a great 96 degree of similarity and retention of motifs (Stassen et al. 1997). However, less is known about Rad51 and Dmc1 among diverse protist lineages. It is necessary to include protists in studies of eukaryotic evolution as they embody the greatest breadth of eukaryotes and their genes may encode products with deviant functions (Sogin 1991; Dacks and Doolittle 2001). We present analyses of the distribution, molecular phylogenetic relationships, and characteristics of Rad51 and Dmc1 protein sequences from organisms representing all currently recognized eukaryotic supergroups - Opisthokonta, Amoebozoa, Excavata, Chromalveolata, Rhizaria, and Archaeplastida – and a currently unclassified group, the Apusozoa (Cavalier-Smith 2004; Adl et al. 2005; Baldauf 2008). Previous studies confirmed the presence of Rad51 and Dmc1 in all but one eukaryotic supergroup, Rhizaria, indicating that they likely arose early during eukaryotic evolution (Ramesh, Malik, and Logsdon 2005; Lin et al. 2006; Malik et al. 2008). The monophyly of Rad51 and Dmc1 has been demonstrated previously with phylogenetic analyses (Komori et al. 2000; Ramesh, Malik, and Logsdon 2005; Lin et al. 2006; Malik et al. 2008). The observations that homologous recombination is central to meiosis (Paques and Haber 1999; Krogh and Symington 2004) and that Dmc1 catalyzes interhomolog DNA strand exchange only during the first meiotic prophase (Bishop et al. 1992) have led to the inference that the presence of a Dmc1 gene in an organism indicates that meiosis may occur (Ramesh, Malik, and Logsdon 2005). The existence of Dmc1 in the putative early diverging eukaryotes Giardia intestinalis and Trichomonas vaginalis has been cited as evidence of meiosis in the last common ancestor to eukaryotes (Ramesh, Malik, and Logsdon 2005; Lin et al. 2006; Malik et al. 2008). This view is supported by the presence of several other meiotic genes in G. intestinalis and T. vaginalis (Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). However, the status of G. intestinalis and T. vaginalis as “primitive” eukaryotes is now dubious as different hypotheses for rooting the evolutionary tree of eukaryotes have been proposed (CavalierSmith 2002a; Stechmann and Cavalier-Smith 2002; Roger and Simpson 2009; Cavalier- 97 Smith 2010). The relatively recent morphological and molecular phylogenetic analyses of unclassified eukaryotes, such as the Apusozoa, further revives the prospect that some organisms may be primitively asexual, having diverged prior to the origin of Dmc1 genes and, perhaps, meiosis. In the absence of a clearly established earliest-diverging branch on the eukaryotic tree, it is necessary to include representatives of all known eukaryotic supergroups to address the question of whether Dmc1 genes and meiosis were present in their last common ancestor. Rad51 and Dmc1 protein sequences are well conserved, approximately 350 amino acids long, and may be distinguished by inspection of multiple sequence alignments. In addition, duplications of Rad51 and Dmc1 genes appear rare and, where present, seem to have occurred recently during eukaryotic evolution (Maeshima et al. 1995; Kathiresan, Khush, and Bennett 2002; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). Only one absence of Rad51 genes, in G. intestinalis, has been confirmed (Ramesh, Malik, and Logsdon 2005). Rad51 and Dmc1 genes are themselves paralogs, which means that it might be possible to determine which eukaryotes represent the earliest-diverging lineages with reciprocal rooting (Gogarten et al. 1989; Iwabe et al. 1989; Iwabe et al. 1991). These characteristics make Rad51 and Dmc1 good candidates for phylogenetic analyses (Baldauf and Palmer 1993). Several studies have determined that Rad51 and Dmc1 nucleotide and amino acid sequences are useful phylogenetic markers, resolving relationships among animals, fungi, and plants (Stassen et al. 1997; Petersen and Seberg 2002; Petersen, Seberg, and Baden 2004). However, it is unknown whether Rad51 and/or Dmc1 protein sequence data will be useful for elucidating the relationships among eukaryotic supergroups, or for the placement of unclassified organisms within the eukaryotic tree of life. We collected 99 Rad51 and 51 Dmc1 protein sequences (representing 97 and 50 genera, respectively) from six eukaryotic supergroups and Apusozoa. Among these sequences, degenerate PCR was used to isolate 21 new Rad51 sequences and 8 new 98 Dmc1 sequences from evolutionarily diverse representatives of the eukaryotic supergroups Rhizaria, Excavata, Chromalveolata, Amoebozoa, and also unclassified Apusozoa (Ancyromonas sp. and T. trahens sp.) for which genome sequence data were unavailable. All publically available nucleotide and protein sequence repositories were also searched for homologs in diverse eukaryotes. To ensure that the breadth of sampling was sufficient for a eukaryote-wide study of Rad51 and Dmc1, and given the abundance of sequences from some eukaryotic groups (Fungi, Metazoa, Chloroplastida, Kinetoplastida, and Apicomplexa), discrete datasets composed of exemplars were collected for some over-represented groups, while exhaustive sequence data searches were performed for all other groups (see Methods). Phylogenetic analyses revealed no clear cases of lateral gene transfer of Rad51 or Dmc1 genes, indicating that vertical transmission is the predominant (if not exclusive) mode of inheritance. In addition, phylogenetic analyses of Rad51 and Dmc1 amino acid sequences indicated support for five of the six currently proposed eukaryotic supergroups (Table 3.1). We also scrutinized our alignments of all Rad51 and Dmc1 protein sequences obtained and compared them to archaebacterial RadA and eubacterial RecA sequences. Rad51 and Dmc1 protein sequences are highly conserved across all eukaryotic groups, including functional motifs previously identified in archaebacterial RadA protein sequences (Story, Weber, and Steitz 1992). In addition, we identify ten amino acid residues conserved across all eukaryotic supergroups, but not among prokaryotes, which may confer Rad51- and Dmc1-specific functions. Taken together, these data indicate that the functions of Rad51 and Dmc1 are likely to be conserved across all eukaryotes. Thus meiosis and mitosis most likely occurred in the last common ancestor of eukaryotes. 99 Results and discussion: Phylogenetic analysis of Dmc1: We analyzed the distribution of 51 Dmc1 genes from representatives of 50 genera; 42 of which were obtained from databases and 8 by degenerate PCR (Figures 3.1-3.6). Dmc1 is present in representatives of all six currently recognized supergroups and the unclassified Apusozoa. However, the distribution of the Dmc1 gene is uneven since it is not detected in the genomes of entire groups of organisms, such as Diptera, Sordariomycota, or Stramenopila (except for oomycetes). Failure to detect Dmc1 among more stramenopiles is most parsimoniously interpreted as a loss following the divergence of oomycetes (Brown and Sorhannus 2010). Dmc1 gene losses have been confirmed in a few organisms that are known to undergo meiosis (e.g. Caenorhabditis elegans and Drosophila melanogaster) (Orr-Weaver 1995; Zalevsky et al. 1999). Therefore, meiosis may be accomplished without Dmc1 proteins in some organisms and its absence does not necessarily indicate the absence of meiosis, since these sexual organisms have adapted to Dmc1 loss. However, since Dmc1 is known to function only during meiosis, it is likely that the presence of the Dmc1 gene indicates that meiosis occurs (Bishop et al. 1992; Proudfoot and McCulloch 2006). Phylogenetic analyses of Dmc1 protein sequences consistently yield a single, distinct monophyletic group (Figures 3.5-3.7), indicating that the Dmc1 gene arose once during the evolutionary history of extant eukaryotes. Most organisms have a single copy of the Dmc1 gene within their genomes. Subsequent duplications of the Dmc1 gene appear to be rare, with recent duplications detected only in the genomes of G. intestinalis (Excavata) and Oryza sativa (Archaeplastida) (Kathiresan, Khush, and Bennett 2002; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008). Interestingly, G. intestinalis is also the only organism with a confirmed absence of Rad51 gene from its genome (Ramesh, Malik, and Logsdon 2005), but whether these observations are related is currently unknown. 100 Phylogenetic analysis of Rad51: We analyzed the phylogenetic relationships among 99 Rad51 protein sequences representing 97 genera (78 from databases and 21 inferred from degenerate PCR (data not shown), Figures 3.6-3.10). Rad51 genes were retrieved from the genomes of organisms representing every currently recognized eukaryotic supergroup and two Apusozoa (T. trahens and Ancyromonas). Unlike the Dmc1 gene, Rad51 gene appears to be present in most organisms, and so far is absent only from the genome of G. intestinalis. However, an extensive search for Rad51 in the genome of a related diplomonad, Spironucleus vortens (Jorgensen and Sterud 2007), was performed in which we explored all nucleotide, protein, and EST sequence databases and attempted to amplify Rad51 with degenerate PCR and no Rad51 gene sequences were recovered. Rad51 gene may have been lost prior to the divergence of G. intestinalis and S. vortens. Like the Dmc1 gene, duplications of Rad51 gene appear to be rare and relatively recent, with paralogs present only in Archaeplastida (Physcomitrella patens, Oryza sativa, and Zea mays), Xenopus laevis (Opisthokonta), and T. vaginalis (Excavata) (Maeshima et al. 1995; Stassen et al. 1997; Malik et al. 2008). One of the T. vaginalis Rad51 gene copies is a pseudogene, but both of the Xenopus Rad51 genes seem to encode functional products and are expressed (Maeshima et al. 1995; Malik et al. 2008). There are also no clear cases of Rad51 lateral transfer indicated, although Rad51 was discovered in the nucleomorph genomes of Bigelowiella (Rhizaria), and the cryptophytes Hemiselmis andersenii and Guillardia theta (both Chromalveolata) (Figure 3.6). Overall, our results show that Rad51 gene is thus only vertically transmitted, and arose once, prior to the divergence of extant eukaryotes. Phylogenetic analyses of Rad51 and Dmc1: Recently, many relationships among diverse eukaryotes have been determined by phylogenetic analyses performed on multiple concatenated protein sequences (Figure 3.8) (Burki and Pawlowski 2006; Kim, Simpson, and Graham 2006; Burki et al. 2007; 101 Moreira et al. 2007; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Yoon et al. 2008; Reeb et al. 2009; Parfrey et al.). These eukaryotic phylogenies provide references for assessing the utility of individual nucleotide or protein sequence datasets as phylogenetic markers. We performed extensive phylogenetic analyses on Rad51 and Dmc1 individual and concatenated protein sequence datasets to test their phylogenetic utility (Figures 3.13.12 and Table 3.1). The eukaryotic supergroup Opisthokonta (comprised of Animalia, Fungi and several protist groups) is unified by flat mitochondrial cristae and a 12-amino acid insertion in the translation elongation factor 1α (Baldauf and Palmer 1993; Adl et al. 2005; Steenkamp, Wright, and Baldauf 2006). Phylogenetic analyses typically provide strong support for topologies unifying animals and fungi, confirming these observations (Cavalier-Smith 1987c; Baldauf and Palmer 1993; Steenkamp, Wright, and Baldauf 2006). We obtained strong support with both maximum likelihood and Bayesian phylogenetic approaches for the monophyly of Metazoa and Fungi with Dmc1 and concatenated protein sequence alignments (Figures 1.2 and 3.5, Table 3.1). Although opisthokont unity was not formally observed for Rad51 protein sequence dataset which includes the Choanoflagellate, Monosiga brevicollis, this was a result of the likely erroneous placement of the Apusomonad, T. trahens, within this group (but, see below) (Figure 3.8) (Adl et al. 2005). The Unikont hypothesis proposes that the eukaryotic supergroups Opisthokonta and Amoebozoa are monophyletic, on the basis that they ancestrally possessed a single flagellum (unlike the “bikont” Excavata, Archaeplastida, Chromalveolata, Rhizaria, and Apusozoa that mostly have two flagella) and three fused genes (carbamoyl-phosphate synthase, dihydroorotase, and aspartate carbomyl-transferase), the likely result of two rare gene fusion events (Cavalier-Smith 2002a; Stechmann and Cavalier-Smith 2002; Cavalier-Smith 2003a; Stechmann and Cavalier-Smith 2003b). Phylogenetic analyses have supported the “Unikont hypothesis” (Stechmann and Cavalier-Smith 2003b; Burki, 102 Shalchian-Tabrizi, and Pawlowski 2008), however, recent phylogenetic analyses retrieve topologies in which unclassified Apusozoa (Ancyromonas and T. trahens) are monophyletic and closely related to Opisthokonts (Kim, Simpson, and Graham 2006; Cavalier-Smith 2010). While none of our analyses retrieved topologies consistent with a common origin of the Apusozoa, Ancyromonas and T. trahens, our Bayesian analysis of the individual Rad51 protein sequences strongly supports their inclusion in the Unikont clade (Figure 3.8 and Table 3.1), and analysis of concatenated Rad51 and Dmc1 proteins moderately supports the inclusion of Ancyromonas in the Unikonts (Figure 3.11 and Table 3.1). In addition to having two emergent flagella (instead of one flagellum like other Unikonts), Apusozoa also lack the three-gene fusion. Instead, they share a fusion of two genes (dihydrofolate reductase and thymidylate synthase) that distinguishes Bikonts. Unikonts may, therefore, represent a polyphyletic group if Apusozoa are sisters to Opisthokonta (Figure 1.2) (Stechmann and Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2003a). On the basis of strongly supported topologies obtained with molecular phylogenetic analyses of many concatenated protein sequence alignments, a “megagroup” of predominantly photosynthetic eukaryotes has been proposed (including supergroups Archaeplastida, Chromalveolata, and Rhizaria) (Burki, Shalchian-Tabrizi, and Pawlowski 2008). The supergroup Chromalveolata was proposed to include secondarily photosynthetic eukaryotes (alveolates, stramenopiles, cryptomonads, and haptophytes) that obtained plastids by endosymbiosis with red algae (Cavalier-Smith 2002b; Cavalier-Smith 2003b; Janouskovec et al. 2010) (Figure 1.2). However, molecular phylogenetic analyses rarely support the monophyly of this group (Parfrey et al. 2006). Despite the complexities of developing the protein targeting system observed in nascent plastids, recent phylogenetic analyses suggest secondary photosynthesis evolved at least twice during eukaryotic evolution (Keeling 2010). 103 We included a subset of chromalveolates (stramenopiles and alveolates) in our analyses. Phylogenetic analysis of Rad51 and Dmc1 concatenated protein sequence dataset retrieved topologies consistent with the Chromalveolate hypothesis (Figure 3.11). The phylogenies of the individual Dmc1 and Rad51 protein sequences both retrieve discrepant topologies that support the grouping of stramenopiles with Chloroplastida, while red algae are most closely related to stramenopiles in the Dmc1 phylogeny (Figure 3.1), and to alveolates in Rad51 phylogeny, (Figure 3.8). These topologies could be the results of phylogenetic artifacts such as long-branch attraction (Felsenstein 2004). However, it is noteworthy that Bayesian analyses of our concatenated Rad51 and Dmc1 dataset strongly support the monophyly of alveolates, stramenopiles, and Rhodophyceae, and that Chloroplastida are grouped with Cercozoa (Rhizaria) (Figure 3.11). It has been hypothesized that the difficulties of resolving relationships among secondarily photosynthetic eukaryotes with multigene analyses may be due to the “mosaic” nature of their nuclear genomes as a result of endosymbiotic gene transfer, resulting in conflicting phylogenetic signals (Parfrey et al.). Our analyses of Rad51 and Dmc1 failed to support subgroups within the photosynthetic megagroup such as SAR, in which Stramenopila, Alveolata, and Rhizaria share a common ancestor, or the Archaeplastida, which all have plastids obtained by primary endosymbiosis of a cyanobacterium (Adl et al. 2005; Rodriguez-Ezpeleta et al. 2005; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Parfrey et al.). However, we did observe support for the monophyly of Cercozoa (Rhizaria), stramenopiles and alveolates (Chromalveolata). The eukaryotic supergroup Excavata (represented in our dataset by members of its subgroups Discoba and Metamonada) was proposed to describe organisms with suspension-feeding grooves (cytostomes) used to capture particles in a current produced by anterior flagella (Figure 1.2) (Simpson 2003; Adl et al. 2005). Excavates include organisms once considered to be among the earliest-diverging eukaryotes (e.g. Euglenozoa, T. vaginalis, and G. intestinalis), based upon so-called “primitive” features 104 (like the apparent absence of organelles such as mitochondria) and early phylogenetic analyses of small ribosomal subunit sequence data which retrieved topologies placing T. vaginalis and G. intestinalis at the base of eukaryotic trees (Woese, Kandler, and Wheelis 1990; Tovar et al. 2003; Adl et al. 2005; Cavalier-Smith 2010). However, more recent discoveries have cast doubt that they represent “primitive” eukaryotes. G. intestinalis and T. vaginalis do, indeed, possess highly derived mitochondria (mitosomes and hydrogenosomes, respectively), and their placement at the base of rooted eukaryotic phylogenetic trees were most likely caused by artifacts of the phylogenetic analysis (Tovar et al. 2003; Felsenstein 2004; van der Giezen, Tovar, and Clark 2005). Similarly, Microsporidia, were later determined to be fungi with mitosomes (Cavalier-Smith 1989; Hirt et al. 1999). If T. vaginalis and G. intestinalis (or any of Excavata) are the earliestdiverging eukaryotes, then Excavata would represent a paraphyletic group (a common ancestor plus some but not all of its descendants) whose members diverged separately at the base of the eukaryotic phylogenetic tree, i.e., very early during the evolution of eukaryotes. However, recent phylogenetic analyses retrieve topologies that are consistent with the monophyly of Excavata (Burki et al. 2007; Burki, Shalchian-Tabrizi, and Pawlowski 2008; Hampl et al. 2009; Parfrey et al.). Our phylogenetic analysis of the Dmc1 protein sequence dataset also supports the monophyly of Excavata, although it is not resolved by Rad51 protein sequences (Figures 3.1 and 3.8). In an attempt to determine the earliest-diverging eukaryotic lineages we performed analyses in which one paralog was used to root the other, rather than assigning a root (Gogarten et al. 1989; Iwabe et al. 1989; Iwabe et al. 1991). However, the topologies retrieved with reciprocal rooting of Rad51 and Dmc1 protein sequence are poorly supported and discordant (Figures 3.5-3.7). Characteristics of Rad51 and Dmc1 protein sequences: We aligned Rad51 and Dmc1 protein sequences from representatives of all known eukaryotic supergroups and Apusozoa with representative archaebacteria (Nitrosopumilus 105 maritimus, Cenarchaeum symbiosum, Pyrobaculum islandicum, Candidatus Korarchaeum cryptofilum, Aeropyrum pernix, Nanoarchaeum equitans, and Methanocaldococcus fervens) and eubacteria (Bacillus amyloliquefaciens and Thermus thermophilus) (Figure 3.13). Visual inspection of the central domains responsible for recombinase activity of RecA, RadA, Rad51 and Dmc1 proteins indicates that the amino acid sequences are well conserved in all domains of life (Story, Weber, and Steitz 1992). Several motifs important for RecA function are highly conserved among eukaryotes. In addition, archaebacterial RadA sequences contain all of the described functional motifs (Chen et al. 2007); it is likely that these motifs were present in the common ancestor of archaebacteria and eukaryotes, and thus were present in the last eukaryotic common ancestor. Although Rad51 and Dmc1 perform very similar functions, Rad51 catalyzes DNA strand exchange during both mitosis and meiosis, while Dmc1 functions in interhomolog DNA strand exchange exclusively during meiosis. Specific interactions between Rad51 and Dmc1 with each other, other proteins, and DNA are required for successful completion of meiotic recombination (Krejci et al. 2001; Shin et al. 2003; Sugawara, Wang, and Haber 2003). However, the basis of these interactions remains largely unknown, especially for those interactions that distinguish Rad51 from Dmc1 function. We examined our multiple sequence alignments for conserved amino acid residues specific to Rad51 or Dmc1, which might confer Rad51- or Dmc1-specific activity. Comparison of the central domains of Rad51 and Dmc1 protein sequences from all of our representatives of six eukaryotic supergroups and Apusozoa indicate they are conserved, likely due to common ancestry and functional constraints (summarized in Figure 3.14). By identifying residues conserved in one protein but variable or different in the other, we can generate hypotheses for future functional studies. Comparing protein sequences from representatives of the entire breadth of eukaryotic diversity enables us to pinpoint residues fundamental to Rad51 or Dmc1 function. 106 To examine amino acid conservation, we analyzed an alignment of 98 Rad51 and 51 Dmc1 protein sequences from all eukaryotic supergroups and Apusozoa (Figure 3.13). The central domain (S. cerevisiae Rad51 amino acid positions 90-397) was examined because it is conserved in all RecA homologs. All groups except Apusozoa and Rhizaria were represented at each amino acid position studied. Apusozoa were represented from S. cerevisiae Rad51 amino acid positions 126-356 for the aligned Rad51 proteins and positions 188-324 for the aligned Dmc1 proteins; while Rhizaria were represented from positions 188-397 in the Dmc1 alignment. We identified 18 amino acids that are completely conserved among Rad51, and 15 completely conserved amino acids among Dmc1. Seven residues are present among at least 95% of Rad51 protein sequences, but are either different or variable among Dmc1 sequences, but among Dmc1, only three such sites were identified. We found no cases in which a residue is ≥ 95% conserved in one protein dataset and a different residue conserved ≥ 95% in the other dataset. Studies in which the structures of RecA, RadA, Rad51, and Dmc1 have been analyzed have resulted in the identification of several important functional motifs and amino acid residues (Table 3.3) (Story, Weber, and Steitz 1992; Aihara et al. 1999; Pellegrini et al. 2002; Conway et al. 2004; Chen et al. 2007; Chen, Yang, and Pavletich 2008; Okorokov et al. 2010). Residues identified with these methods are also highly conserved (often 100%) in our sequence alignments. Five sites involved in ATP binding (G185, D219, E221, D280, and S281) and three sites involved in DNA binding (N325, G346, and G347) are present in all RecA, RadA, Rad51, and Dmc1 protein sequences studied here. However, specific interactions have not been proposed for several sites that we have determined are likely to be involved in Rad51- or Dmc1-specific activities (Table 3.3). Conclusions: We isolated 8 Dmc1 and 21 Rad51 genes with degenerate PCR from eukaryotes representing four of the six currently recognized supergroups (Amoebozoa, Excavata, 107 Chromalveolata, and Rhizaria) and the unplaced Apusozoa. In addition, we performed extensive searches of all publicly available nucleotide and amino acid sequence repositories, identified, and collected a total of 51 Dmc1 and 99 Rad51 sequences (representing 50 and 97 genera, respectively). Our phylogenetic analyses indicate support for all eukaryotic supergroups (Opisthokonta, Amoebozoa, Excavata, Chromalveolata, and Rhizaria) except Archaeplastida was observed during this study (Table 3.1). However, support was strongest for the supergroup Opisthokonta, which was retrieved with phylogenetic analysis of Dmc1, Rad51, and concatenated protein sequences. These results are consistent with previous studies in which the support for supergroups was assessed (Parfrey et al. 2006). Dmc1 appears to retrieve known relationships well when several protein sequences representing the greatest breadth of eukaryotes are available. Consistent with the predictions of Stassen, et al. (1997), our analyses of Rad51 proteins retrieve “somewhat anomalous” phylogenies, most likely due to substitution rate heterogeneity among taxa resulting in long-branch artifacts (Stassen et al. 1997; Felsenstein 2004). Analysis of Rad51 and Dmc1 concatenated protein sequence data provides better resolution of the evolutionary relationships of eukaryotes (Figure 3.11). We aligned Rad51 and Dmc1 protein sequences from every eukaryotic supergroup and members of the currently unclassified Apusozoa with bacterial RecA and archaebacterial RadA protein sequences (Figure 3.13). Previously identified (Sandler et al. 1996; Chen et al. 2007; Okorokov et al. 2010) functional motifs are present in all Rad51, Dmc1 and RadA proteins sampled, thus these motifs must have been present in Rad51 and Dmc1 sequences of the last eukaryotic common ancestor. Furthermore, we identified seven sites where the amino acids are conserved among Rad51 but not in Dmc1, and three sites where the amino acids are conserved among Dmc1 but not in Rad51. These amino acids are likely to be involved in functions that are specific to Rad51 or Dmc1 but not both. Given the conservation of these amino acids in protein sequences of diverse eukaryotes, they must have been present in the last eukaryotic 108 common ancestor as well. Thus, since both Rad51- and Dmc1-specific functions are likely to have been present in the last eukaryotic common ancestor, the hypothesis that Dmc1 was both present and functioning in a meiosis-specific role is supported by these results. Methods: Database searches: Keyword searches (e.g. S. cerevisiae Rad51) of the National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov/) protein sequence database retrieved Rad51 and Dmc1 protein sequences for representatives of animals, fungi, and plants (Homo sapiens (Rad51 – accession number NP_002866 and Dmc1 Q14565), Saccharomyces cerevisiae (Rad51 - CAA45563 and Dmc1 - AAA34571), and Oryza sativa (Rad51 - BAB85491 and Dmc1 - BAB85214) (Aboussekhra et al. 1992; Bishop et al. 1992; Collins et al. 2004; Sakane et al. 2008; Kudoh et al. 2009). In addition, the clusters of euKaryotic Orthologous Groups of proteins (KOGs) database for each protein were searched (Tatusov et al. 2003). Sequence identities were initially verified by evaluating the results of bi-directional searches with the tBLASTn (Altschul et al. 1997) option of the Basic Local Alignment Search Tool (BLAST), in which the translated nucleotide database is searched using a protein query. Rad51 and Dmc1 protein sequences collected in this manner were subsequently used as queries to search protein, nucleotide, and expressed sequence tag (EST) databases at NCBI, the Institute for Genomic Research (TIGR, www.tigr.org/tdb/euk, since moved to compbio.dfci.harvard.edu/tgi/protist.html), the Joint Genome Institute (JGI, genome.jgipsf.org), the Canadian Protist EST Project (Taxonomically Broad Database, tbestdb.bcm.umontreal.ca ), Michigan State University Galdieria sulphuraria Database ((Weber et al. 2004; Barbier et al. 2005), genomics.msu.edu/galdieria) and the Cyanidioschyzon merolae Genome Project ((Matsuzaki et al. 2004), merolae.biol.s.utokyo.ac.jp/blast/blast.html) with BLASTp, tBLASTn, and BLASTn, as necessary, for all 109 available Rad51 and Dmc1 sequences from January 2004 through April 2010. Due to the abundance of sequences from a few eukaryotic groups (Fungi, Metazoa, Chloroplastida, Kinetoplastida, and Apicomplexa), discrete datasets composed of exemplars were collected for these groups, while exhaustive sequence data searches were performed for all other groups, to ensure the breadth of sampling was sufficient for a eukaryote-wide study of Rad51 and Dmc1. In case sequences from distantly-related organisms were missed, additional searches were performed using protein sequence queries from organisms likely to share more recent common ancestors: e.g. Trypanosoma brucei (Rad51 CAA73605, Dmc1 XP_827266 (Berriman et al. 2005)) protein sequences were used as additional queries for searches of sequences for a closely related kinetoplastid protist, Leishmania major. Identities of sequences were again confirmed with bidirectional BLASTx and tBLASTn searches. When multiple sequences were found for a species, only the most complete open reading frame or protein prediction was retained. If no previously annotated protein sequence was available in a database (or, it was apparently incorrectly annotated on the basis of protein sequence alignments with other orthologs) then nucleotide sequences were annotated manually, using Sequencher v4.5 (Genecodes, Ann Arbor, MI). Exons were identified with the aid of inferred translations from BLASTx pairwise comparisons to the NCBI protein sequence database and the locations of putative intron splice donor and acceptor site sequences (e.g. G/GT to AG/G, although others may be observed among diverse eukaryotes). Additional comparisons of the inferred Rad51 and Dmc1 homologous amino acid sequences were performed with alignments created using MUSCLE v3.7 (Edgar 2004) and observed with BioEdit v7.0.5.3 (Hall 1999). Degenerate PCR: DNA samples were obtained by collaboration with Jeff Cole and Robert Molestina at the American Type Culture Collection (ATCC, Manassas, VA), mainly from xenic monoprotistan cultures. PCR amplifications were performed using degenerate 110 oligonucleotide primers (i.e. primers designed corresponding to highly conserved regions of protein sequence alignments which reflect the degeneracy of the genetic code, see Table 3.2 and Figure 3.13 arrows) synthesized by Integrated DNA Technologies (IDT, Coralville, IA). Degenerate PCR primers Forward 6 and 7 and Reverse 1 were designed by JML and the remaining degenerate primers were designed by AWP (Table 3.2). Gene fragments of Rad51 and Dmc1 homologs were amplified from total DNA by PCR from representatives of four eukaryotic supergroups and Apusozoa (Figure 3.2). Amplifications utilized 0.03 U/ l MasterTaq polymerase (5 Prime, Gaithersburg, MD) according to the manufacturer’s instructions, 0.002U/ l Stratagene Cloned Pfu (La Jolla, CA) (to increase yields), 0.5 - 1 ng total DNA, 0.25 mM each dNTP (Stratagene): 1.5 mM MgCl2, and 10 µM each primer. Reaction conditions were 95º C for 2 minutes followed by 40 cycles including denaturation at 94º C for 40 seconds, with replicates annealing at temperatures of 55º C, 60º C, or 65º C for 1 minute, extension at 72º C starting at 1.5 minutes, adding 6 seconds per cycle, and ending with 10 minutes at 72º C, in Eppendorf gradient Mastercyclers (Hamburg, Germany). Resulting PCR products were analyzed for size on 2% agarose gels by electrophoresis. Initially, eight degenerate primer combinations were tested for each sample. When necessary, additional primer combinations were applied or nested amplifications were performed using diluted (1:1000) PCR products. Subsequent amplifications extended coverage of target genes by primer walking, using exact-match primers vs. degenerate primers in all possible combinations. Amplicons for Perkinsus marinus Rad51 genes were obtained with exactmatch primers designed from non-overlapping partial sequences (NCBI GenInfo numbers 126277177 and 126301963, Table 3.2). Selected amplicons were fractionated and excised from 0.5% NuSieve GTG: 0.5% low-melt agarose gels (BioWhittaker [Walkersville, MD], Fisher [Pittsburgh, PA]) at 4º C and 100 V for 40 minutes in 1x TAE buffer) and cloned directly into the pSC-ATM vector (StrataCloneTM kit, Stratagene, La Jolla CA, USA). Positive clones were identified by PCR with T3 and T7 primers to verify the 111 presence of appropriately sized inserts (cycling conditions: 94º for 2 minutes followed by 30 cycles at 94º C for 1 minute, 57º C for 1 minute, and 72º C for 1.5 minutes, ending with 72º C for 5 minutes [Stratagene and Promega]). At least two clones per PCR product were isolated with FastPlasmid Mini kits (5 Prime, Gaithersburg, MD) and sequenced in each direction with ABI BigDye 3.1 reagents and T3 and T7 primers, on an ABI 3730 sequencer (Applied Biosystems [Foster City, CA]). Nucleotide sequence data was assembled with Sequencher v4.5 (Genecodes, Ann Arbor, MI) and the identities were initially verified with BLASTx searches in NCBI. If either Rad51 or Dmc1 gene sequence fragments were isolated, but not both genes, then single sequences from four or five additional clones were obtained to detect the other paralog. In total, sequences generated from both strands for at least three clones per gene were obtained. Nucleotide sequences were annotated and inferred exons were translated to proteins as described above (Database Searches). Phylogenetic analyses: We aligned all potential eukaryotic Rad51 and Dmc1 protein sequences with archaebacterial RadA protein sequences using MUSCLE v3.7, manually edited them by removing ambiguously aligned columns and gaps in BioEdit v7.0.5.3 (Hall 1999; Edgar 2004), and performed phylogenetic analyses on the multiple sequence alignment. Optimal protein substitution models and parameters were determined for each alignment independently with Modelgenerator v0.85 (Keane et al. 2006). Analyses were performed with PhyML v3.0 (Guindon et al. 2009) for 1000 replicates, and PhyloBayes v3.1 (Lartillot, Lepage, and Blanquart 2009), which used at least two independent converged chains in which maximum differences observed across all bipartitions were less than 0.10. Every other tree after burnins (selected to minimize the differences across all bipartitions) was used to calculate consensus tree topologies. Only sequences that unambiguously grouped as either Rad51 or Dmc1 were retained, while those that did not most likely represented other Rad51 paralogs such as Rad55 or Rad57 (Lin et al. 2006) 112 and were removed prior to subsequent analysis (Figure 3.2). Uncorrected pairwise protein sequence distances were calculated with ClustalX v2.0.12 (Thompson et al. 1997). Pairs of sequences with less than 0.10 protein sequence distance were identified. One member of this pair was removed on the basis of observed protein sequence-lengths or branch-lengths determined with phylogenetic analyses, usually reducing representation to one species per genus (Stiller and Harrell 2005). We removed the most divergent sequences during subsequent analyses, as necessary, to minimize the effects of longbranch attraction (Felsenstein 2004; Hampl et al. 2009). 113 Table 3.1: Support for eukaryotic supergroups and first order groups from phylogenetic analyses of Rad51, Dmc1, and concatenated protein sequence data. Amoebozoa Excavata Chromalveolata Archaeplastida Rhizaria Opisthokonta Rad51 +++ +++ Dmc1 +++ +++ ++ Concat. +++ +++ ++ Rad51 Dmc1 Concat. Fungi Metazoa Centra. Mycet. Arch. Discoba Meta. Stramen. Alveolata Chloro. Rhodo. Cercozoa Apusozoa ++ +++ +++ +++ +++ +++ N/A N/A N/A N/A N/A N/A N/A N/A ++ - + - + +++ +++ +++ +++ ++ - +++ +++ +++ ++ - +++ + Note: Support for eukaryotic groups was assessed with PhyloBayes posterior probabilities from phylogenetic analyses performed on Rad51, Dmc1, and concatenated protein sequences (Figure 1). Pluses indicate that monophyletic groups were retrieved (ignoring the placement of Apusozoa) (+++= > 0.90, ++= 0.70-0.90, += < 0.70) and minuses indicate the relationship was not retrieved. N/A indicates only one representative of the group was in the alignment. 113 Dmc1 Rad51 S.c. Rad51 am ino acid position 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 280 285 290 295 300 305 310 315 320 325 330 335 340 345 350 355 360 365 370 114 Ancyromonas Thecamonas M astigamoeba Sexangularia Arachnula Bodomorpha Cercomonas Proleptomonas Spongomonas Thaumatomonas Cafeteria Pendulomonas Perkinsus Perk F Pylaiella Bodo Diplonema Jakoba libera M onotrichomonas Percolomonas Scytomonas Seculamonas Ancyromonas Thecamonas M astigamoeba Spongomonas Adenoides Diplonema Percolomonas Scytomonas F2/F4/F5 R3 R5 R4 F7 F1/F3/F5 F6 F7 R2 R2 R2 F8 F7 F7 R1 R1 R1 F6 F6 R5 R3 F2/F4 F6 R1 Perk R F7 F8 F8 F7 R1 R5 R3 R2 F4/F5 R5 R1 F6 F4 F5 R5 R3 F8 F6 F6 F6 F7 F7 F7 F6 R1 R1 R2 R1 R1 R1 R1 R1 Figure 3.1: Graphic representation of Rad51 or Dmc1 gene sequence fragments amplified with degenerate PCR from representatives of four eukaryotic supergroups and Apusozoa relative to Saccharomyces cerevisiae Rad51 protein sequence. Amoebozoa are labeled with blue, Rhizaria with eggplant, Chromalveolata with orange, Excavata with brown, and Apusozoa with black. Amino acid positions are Saccharomyces cerevisiae (S.c.) Rad51 protein sequence positions. Grey bars indicate regions encoded by fragments amplified with degenerate PCR. Letters and numbers on each side of grey bars indicate degenerate primers used (Table 3.2 and Figure 3.1). 114 115 Dmc1 Candida glabrata Saccharomyces Kluyveromyces Ascomycota 0.99/922 Candida albicans 0.99/802 Aspergillus Fungi 0.99/696 Schizosaccharomyces Coprinopsis 0.99/705 Basidiomycota 0.99/748 Cryptococcus 0.99/554 Mucoromycotina Phycomyces 0.99/695 Batrachochytrium Chytridiomycota Homo Eumetazoa Metazoa 1.00/1000 Strongylocentrotus Myxogastria Physarum Mycetozoa 0.98/266 Mastigamoeba* Mastigamoebida Archamoebae 0.77/307 Entamoeba Entamoebida Leishmania 0.97/606 0.99/878 Scytomonas* Euglenozoa 0.95/399 Trypanosoma Discoba 0.50/21 Diplonema* Percolomonas* Heterolobosea 0.33/22 0.65/5 Naegleria 0.71/22 Trichomonas Parabasalia Giardia A 0.94/425 Metamonada Fornicata Giardia B 0.99/940 0.99/911 Spironucleus Apusozoa Thecamonas* Apusomonadidae 0.85/403 Pythium Oomycetes Stramenopiles Phytophthora 1.00/999 0.91/270 0.99/996 Hyalo. 0.64/Chlamydomonas 0.99/722 Chlorella Chlorophyta Chloroplastida 0.94/40 0.89/423 Micromonas 1.00/999 Ostreococcus Gracilaria Florideophyceae 0.55/Rhodophyceae 1.00/575 Galdieria Bangiophyceae Arabidopsis 0.99/967 Streptophyta Chloroplastida Oryza 0.86/0.56/Spongomonas* Cercomonadida Cercozoa Gymnophrys unclassified 0.68/Adenoides* Dinozoa Plasmodium 0.99/694 Apicomplexa 0.99/790 Toxoplasma Alveolata 0.91/261 Cryptosporidium Dinozoa Karlodinium 0.99/776 0.42/Perkinsus Perkinsea Sterkiella Ciliophora 0.63/Ancyromonas* Ancyromonadidae Apusozoa 0.99/700 0.99/1000 0.99/949 Opis. Amoeb. Excav. Chrom. Arch. Rhiz. Chrom. 0.1 substitutions/site Figure 3.2: Unrooted phylogenetic tree of 47 Dmc1 homologs. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 312 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown. 116 Candida glabrata 50293765 Saccharomyces 118683 Kluyveromyces 50311197 0.99/922 Candida albicans 1706446 0.99/802 Aspergillus 121709155 0.99/696 Schizosaccharomyces 3176384 Coprinopsis 6714639 0.99/705 Cryptococcus 134118469 0.99/748 0.99/554 Phycomyces jgiScaffold_3|1364891|1940257 and 1122|189|447 0.99/695 Batrachochytrium jgiScaffold_2|2017505|202793 Homo 13878923 Strongylocentrotus 115660762 1.00/1000 Physarum 90192353 0.98/266 Mastigamoeba* 0.77/307 Entamoeba 67482427 Leishmania 72549845 0.97/606 0.99/878 Scytomonas* 0.95/399 Trypanosoma 71659624 0.50/21 Diplonema* Percolomonas* 0.33/22 0.65/5 Naegleria jgiScaffold_1|500453|501457 0.71/22 Trichomonas 123408121 0.94/425 Giardia A 30578211 Giardia B 71080540 0.99/940 0.99/911 Spironucleus jgiScaffold_430|11672|12631 Thecamonas* 0.85/403 Pythium 166325657 Phytophthora r. jgi76896 1.00/999 0.91/270 Hyaloperonospora 199610544/64 0.99/996 0.64/Chlamydomonas 158272235 0.99/722 Chlorella jgi52039 0.94/40 Micromonas 226524329 0.89/423 Ostreococcus 145352283 1.00/999 Gracilaria 120463106 0.55/Galdieria Galdieria genomeScaffold 896 Oct13 2005:g78.t1 1.00/575 Arabidopsis 21903409 0.99/967 Oryza 18700485 0.56/0.86/Spongomonas* Gymnophrys 158071814 Adenoides* 0.68/Plasmodium 68076139 0.99/694 0.99/790 Toxoplasma 237843305 0.91/261 Cryptosporidium 209879790 Karlodinium TBestDBKML00009877 0.99/776 0.42/Perkinsus TIGR1637 Sterkiella 209371672 0.63/Ancyromonas* 0.99/700 0.99/1000 0.99/949 Dmc1 0.1 substitutions/site Figure 3.3: Unrooted phylogenetic tree of 47 Dmc1 homologs with accession numbers. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 312 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. All references are GenBank unless otherwise noted. The consensus topology of 2 PhyloBayes chains is shown. 117 Candida glabrata 50293765 Saccharomyces 118683 Kluyveromyces 50311197 Candida albicans 1706446 Aspergillus 121709155 0.99/741 Schizosaccharomyces 3176384 Coprinopsis 6714639 0.99/702 0.99/766 Cryptococcus 134118469 0.99/608 Phycomyces jgiScaffold_3|1364891|1940257 and 1122|189|447 0.99/715 Batrachochytrium jgiScaffold_2|2017505|202793 Homo 13878923 0.99/1000 Strongylocentrotus 115660762 0.99/723 0.74/Plasmodium 68076139 1.00/809 Toxoplasma 237843305 Cryptosporidium 209879790 0.90/809 Karlodinium TBestDBKML00009877 0.74/0.99/766 Perkinsus TIGR1637 0.72/Sterkiella 209371672 0.44/Anycromonas* Physarum 90192353 0.99/269 Mastigamoeba* 0.73/Entamoeba 67482427 0.65/0.99/623 Leishmania 72549845 0.99/864 Scytomonas* 0.97/383 Trypanosoma 71659624 0.46/Diplonema* Percolomonas* 0.30/8 0.40/Naegleria jgiScaffold_1500453|501457 0.60/16 Trichomonas 123408121 0.96/349 Giardia A 30578211 Giardia B 71080540 0.99/927 0.70/1.00/881 Spironucleus jgiScaffold_430|11672|12631 Adenoides* 0.78/Gymnophrys 158071814 0.80/53 Spongomonas* Oryza 18700485 0.74/0.99/974 Arabidposis 21903409 0.79/70 Gracilaria 120463106 Galdieria Galdieria genomeScaffold_896 Oct13 2005:g78.tl 0.99/538 1.00/998 Ostreococcus 145352283 0.72/153 Micromonas 226524329 0.87/453 Chlorella jgi52039 0.99/722 Chlamydomonas 158272235 Pythium 166325657 Phytophthora jgi76896 0.99/999 0.99/993 Hyaloperonospora 199610544/64 Thecamonas* 0.85/Nanoarchaeum 41615212 0.99/891 Methanocaldococcus 256811072 0.98/690 Aeropyrum 109689248 Pyrobaculum 119872227 0.71/661 Candidatus 170290825 0.99/998 Nitrosopumilus 161528894 0.99/1000 Cenarchaeum 118575453 0.99/730 0.99/999 0.99/957 0.99/914 0.99/810 Dmc1 RadA 0.1/substitutions/site Figure 3.4: Unrooted phylogenetic tree of 54 Dmc1 and RadA homologs with accession numbers. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 312 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. All references are GenBank unless otherwise noted. The consensus topology of 2 PhyloBayes chains is shown. 118 Strongylocentrotus 115610811 Ciona 198420224 Homo 19924133 Ixodes 215491711 Apis 110756953 0.75/0.99/547 Aedes 157112162 0.29/57 Thecamonas* 0.97/325 Trichoplax jgi Scaffold 6|2098752|2100304 0.88/174 Monosiga jgi 6000172 Batrachochytrium jgi Scaffold 2|1601520|1603332 Phycomyces jgi Scaffold 14|1066538|1067949 0.99/220 1.00/774 Cryptococcus 58259207 1.00/876 Coprinopsis 3237296 0.57/326 0.53/Ustilago 71018413 0.99/749 Schizosaccharomyces 397843 Saccharomyces 4275 0.99/800 0.57/1.00/900 Candida 68485285 Ancyromonas* Dictyostelium 66822135 0.41/0.73/344 0.51/Acanthamoeba Baylor Contig 1595 Pendulomonas* 0.99/591 Seculamonas* Jakoba bahamiensis 109794508 Trichomonas 123408472 0.88/Galdieria Genome Contig 785|1|1017 0.71/52 0.45/Guillardia nucleomorph 162605684 0.97/161 Cyanidioschyzon 151559143 0.88/188 Cryptosporidium 209875975 0.99/346 Toxoplasma 211963576 1.00/954 1.00/895 Plasmodium 124803581 Trypanosoma 2108337 Leishmania 157871568 1.00/993 0.86/716 Bodo* Jakoba libera* 0.99/546 Proleptomonas* 0.75/139 Bodomorpha* 0.95/Thaumatomonas* 0.77/Cercomonas* 1.00/968 Oryza 18874071 1.00/984 0.53/Vitis 225444585 Physcomitrella 16605579 1.00/999 Ostreococcus 145349400 0.99/Micromonas sp. jgi 226516672 0.99/729 Chlorella jgi 20220 Volvox jgi Scaffold 15|851250|854214 0.99/893 1.00/985 Chlamydomonas 45685351 Euglena 109787391 Scytomonas* Diplonema* 0.62/591 Phytophthora r. jgi 74160 1.00/998 Pythium 207397927 Hyaloperonospora 199611623 1.00/995 Ectocarpus 241962436 1.00/874 Pylaiella* Aureococcus jgi Scaffold 14|44397|45350 1.00/994 Thalassiosira jgi Scaffold 2|665690|666833 Phaeodactylum 219119366 1.00/999 0.54/513 Chaetoceros 164412700 0.99/706 Candida glabrata 50293765 1.00/998 Saccharomyces 118683 1.00/946 Kluyveromyces 50311197 1.00/941 Candida albicans 1706446 0.99/705 Aspergillus 121709155 0.99/533 Schizosaccharomyces 3176384 0.99/684 Coprinopsis 6714639 0.99/668 Cryptococcus 134118469 0.99/636 Phycomyces jgi Scaffold 3|1364891|1940257/1122|189|447 0.99/687 Batrachochytrium jgi Scaffold 2|2017505|202793 Homo 13878923 0.25/1.00/1000 Strongylocentrotus 115660762 Ancyromonas* Sterkiella 209371672 0.33/1.00/715 Karlodinium TBestDB 00009877 0.44/Perkinsus TIGR 1637 Cryptosporidium 209879790 0.81/Plasmodium 68076139 0.99/875 0.99/811 Toxoplasma 237843305 Physarum 90192353 Entamoeba 67482427 0.97/380 0.73/310 Mastigamoeba* 0.36/Gymnophrys 158071814 Adenoides* 0.99/969 Arabidopsis 21903409 0.61/Oryza 18700485 Spongomonas* 0.99/608 Galdieria Galdieria genome Scaffold 896 Oct13 2005:g78.t1 Gracilaria 120463106 0.41/1.00/992 Hyaloperonospora 199610544/64 1.00/1000 Phytophthora r. jgi 76896 0.56/0.54/7 Pythium 166325657 Thecamonas* 0.99/642 Chlamydomonas 158272235 0.65/144 Chlorella sp. jgi 52039 0.87/397 Micromonas 226524329 1.00/996 Ostreococcus 145352283 Naegleria jgi Scaffold 1|500453|501457 Diplonema* Trypanosoma 71659624 0.99/Leishmania 72549845 1.00/822 0.98/583 Scytomonas* Percolomonas* Trichomonas 123408121 Giardia A 30578211 Spironucleus jgi Scaffold 430|11672|12631 1.00/936 1.00/842 Giardia B 71080540 0.64/0.77/- 0.95/296 0.73/180 Rad51 0.44/- 0.26/- 0.41/- 1.00/876 0.43/352 0.38/- 0.18/- 0.36/- Dmc1 0.34/- 0.24/- 0.40/- 0.33/- 0.55/- 0.1 substitutions/site Figure 3.5: Unrooted phylogenetic tree of 105 Rad51 and Dmc1 homologs. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 315 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown. 119 Figure 3.6: Unrooted phylogenetic tree of 112 Rad51, Dmc1, and RadA homologs. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 315 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown. 120 Strongylocentrotus 115610811 Ciona 198420224 Homo 19924133 Ixodes 215491711 0.83/171 0.69/Apis 110756953 0.99/526 Aedes 157112162 0.38/79 Thecamonas* 0.96/344 Trichoplax jgi Scaffold 6|2098752|2100304 0.95/41 Monosiga jgi 6000172 Batrachochytrium jgi Scaffold 2|1601520|1603332 Phycomyces jgi Scaffold 14|1066538|1067949 0.99/234 0.99/783 Cryptococcus 58259207 1.00/891 Coprinopsis 3237296 0.68/322 0.89/Ustilago 71018413 0.99/757 Schizosaccharomyces 397843 Saccharomyces 4275 0.99/819 1.00/915 Candida 68485285 Ancyromonas* 0.41/Dictyostelium 66822135 0.57/325 0.58/Acanthamoeba Baylor Contig 1595 0.50/Pendulomonas* Seculamonas* 1.00/616 Jakoba bahamiensis 109794508 Trichomonas 123408472 0.73/0.87/Galdieria Genome Contig 785|1|1017 0.85/Guillardia nucleomorph 162605684 0.97/185 Cyanidioschyzon 151559143 0.84/174 Cryptosporidium 209875975 1.00/339 Toxoplasma 211963576 1.00/967 1.00/897 Plasmodium 124803581 0.67/Euglena 109787391 Trypanosoma 2108337 0.83/Leishmania 157871568 1.00/989 0.82/668 Bodo* Jakoba libera* Oryza 18874071 1.00/983 1.00/983 Vitis 225444585 0.48/Physcomitrella 16605579 0.36/Cercomonas* 0.84/Thaumatomonas* 0.95/Proleptomonas* 0.73/145 0.99/528 Bodomorpha* Volvox jgi Scaffold 15|851250|854214 1.00/984 1.00/902 Chlamydomonas 45685351 0.98/728 Chlorella jgi 20220 Ostreococcus 145349400 1.00/999 Micromonas sp. jgi 226516672 0.99/905 0.79/304 Scytomonas* 0.57/Diplonema* 1.00/991 Ectocarpus 241962436 1.00/863 0.65/Pylaiella* Aureococcus jgi Scaffold 14|44397|45350 1.00/997 Thalassiosira jgi Scaffold 2|665690|666833 Phaeodactylum 219119366 1.00/1000 0.62/546 Chaetoceros 164412700 Pythium 207397927 Phytophthora r. jgi 74160 1.00/999 0.96/Hyaloperonospora 199611623 0.99/715 Candida glabrata 50293765 1.00/998 Saccharomyces 118683 1.00/946 Kluyveromyces 50311197 1.00/947 Candida albicans 1706446 1.00/744 Aspergillus 121709155 0.99/571 Schizosaccharomyces 3176384 Coprinopsis 6714639 0.99/702 1.00/713 Cryptococcus 134118469 0.99/621 Phycomyces jgi Scaffold 3|1364891|1940257/1122|189|447 0.99/670 Batrachochytrium jgi Scaffold 2|2017505|202793 Homo 13878923 0.28/1.00/1000 Strongylocentrotus 115660762 Physarum 90192353 Entamoeba 67482427 0.99/352 0.12/0.64/271 Mastigamoeba* Ancyromonas* Sterkiella 209371672 0.40/Karlodinium TBestDB 00009877 1.00/734 0.40/Perkinsus TIGR 1637 Cryptosporidium 209879790 0.07/0.77/Plasmodium 68076139 0.99/0.99/Toxoplasma 237843305 1.00/817 Giardia B 71080540 1.00/936 Spironucleus jgi Scaffold 430|11672|12631 0.91/278 Giardia A 30578211 Trichomonas 123408121 0.33/0.49/Naegleria jgi Scaffold 1|500453|501457 Diplonema* 0.47/Trypanosoma 71659624 0.97/Leishmania 72549845 1.00/819 0.98/599 Scytomonas* 0.43/0.77/Gymnophrys 158071814 Adenoides* Spongomonas* 0.37/Arabidopsis 21903409 0.60/0.68/0.99/971 Oryza 18700485 Galdieria Galdieria genome Scaffold 896 Oct13 2005:g78.t1 0.99/575 Gracilaria 120463106 0.62/Percolomonas* 0.99/655 Chlamydomonas 158272235 Chlorella sp. jgi 52039 0.87/0.83/351 Micromonas 226524329 1.00/997 Ostreococcus 145352283 Pythium 166325657 0.99/797 Hyaloperonospora 199610544/64 1.00/1000 0.99/993 Phytophthora r. jgi 76896 Thecamonas* 1.00/1000 Nitrosopumilus 161528894 Cenarchaeum 118575453 0.85/646 Pyrobaculum 119872227 1.00/999 Candidatus 170290825 0.92/674 Aeropyrum 109689248 Nanoarchaeum 41615212 1.00/863 0.84/Methanocaldococcus 256811072 0.62/173 0.75/- 0.97/250 Rad51 Dmc1 RadA 0.1 substitutions/site 121 0.99 Saccharomyces 4275 Kluyveromyces 50309711 Candida albicans 68485285 Neurospora 28926929 Aspergillus 83774056 Schizosaccharomyces 397843 Ustilago 71018413 0.99 0.55 0.99 Cryptococcus 58259207 0.74 Coprinopsis 3237296 Phycomyces jgi Scaffold 14|1066538|1067949 0.59 Batrachochytrium jgi Scaffold 2|1601520|1603332 0.34 Amoebidium TBestDB 00001039 Monosiga jgi 6000172 0.99 Homo 19924133 0.74 Danio 47086005 Strongylocentrotus 115610811 0.45 0.83 Ciona 198420224 0.72 0.67 Drosophila 17864108 0.45 Tribolium 91080301 0.87 Aedes 157112162 0.53 Apis 110756953 0.73 0.68 Ixodes 215491711 Thecamonas* 0.61 Trichoplax jgi Scaffold 6|2098752|2100304 0.81 Arachnula* 1.00 Acanthamoeba Baylor Contig 1595 Acanthamoeba 106789002 Glaucocystis TBestDB L00001512 0.77 Physarum 90192351 0.74 0.33 Cyanophora 109763966 0.45 Dictyostelium 66822135 0.73 Paracercomonas 156129599 0.28 1.00 Cercomonas* Proleptomonas* 0.46 Bodomorpha* 0.72 0.84 Spongomonas* Thaumatomonas* 0.32 0.58 Trypanosoma 37778910 0.99 Leishmania 157871568 0.23 0.58 Bodo* Mastigamoeba* Naegleria jgi Scaffold 63|72794|73744 0.24 Euglena 109787391 0.49 Jakoba libera* 0.79 Sexangularia* 0.63 Entamoeba 67477127 0.33 0.99 Toxoplasma 211963576 0.99 Plasmodium 68071341 0.99 Theileria 71028444 0.71 Cryptosporidium 209875975 0.23 Cyanidioschyzon 151559143 0.38 Porphyra 3702015 Galdieria Genome Contig785|1|1017 Hemiselmis nucleomorph 160331524 0.86 Guillardia nucleomorph 162605684 0.99 0.73 Cafeteria* 0.16 0.68 Bigelowiella TBestDB 00000947 0.39 Malawimonas* 0.67 Ancyromonas* 0.99 Seculamonas* 0.37 Jakoba bahamiensis 109794508 Percolomonas* 0.40 0.99 Nosema 239605787 0.81 Enterocytozoon 169806553 0.38 Trichomonas 123408472 0.92 Monotrichomonas* 0.99 Reticulomyxa 113376167 0.99 Stylonychia 54659980 0.35 Sterkiella 209384558 Tetrahymena 118355624 0.87 Paramecium 145492218 0.99 Isochrysis 106825547 0.73 Emiliania jgi Scaffold 59|567257|569210 0.99 0.99 Vitis 225444585 0.57 Populus 112419535 Zea 194691108 0.99 0.99 Oryza 18874071 0.99 Triticum 222154117 Physcomitrella 16605579 0.99 Volvox jgi Scaffold 15|851250|854214 0.99 0.99 0.50 Chlamydomonas 45685351 Chlorella jgi Scaffold 4|1825618|1827461 1.00 Ostreococcus 145349400 0.50 Micromonas sp. jgi Chr. 4|891136|892374 Bigelowiella nucleomorph 161899442 0.50 Perkinsus*/126301760|1426|2487 0.63 Oxyrrhis 117409217 0.85 0.98 Scytomonas* Diplonema* 0.32 Schizochytrium 148527882 Pendulomonas* 0.39 0.99 Pythium 207461444 Phytophthora r. jgi 74160 0.61 0.54 Phaeodactylum 219119366 0.99 Chaetoceros 164412700 0.46 Thalassiosira jgi Chr. 2|665690|666833 Aureococcus jgi Scaffold 14|44397|45350 0.99 Pylaiella* 0.99 0.99 Ectocarpus 241962436 0.99 Candida glabrata 50293765 0.99 Saccharomyces 118683 0.99 Kluyveromyces 50311197 0.99 Candida albicans 1706446 0.99 Aspergillus 121709155 0.99 Schizosaccharomyces 3176384 0.98 Coprinopsis 6714639 Cryptococcus 134118469 0.99 0.36 Phycomyces jgi Scaffold3|1364891|1940257 and Scaffold 1122|189|447 Nosema 239605717 0.99 Tribolium 91078458 0.70 Homo 13878923 0.99 Strongylocentrotus 115660762 0.91 0.57 Batrachochytrium jgi Scaffold 2|201705|202793 Ancyromonas* 0.99 Plasmodium 68076139 0.69 Toxoplasma 237843305 0.99 0.61 Cryptosporidium 209879790 0.52 Theileria 71028324 Sterkiella 209369151/209371672 0.30 Tetrahymena 118382143 0.36 Karlodinium TBestDB 00005950 0.42 Perkinsus TIGR 1637 0.99 0.37 Physarum 90192353 Entamoeba 67482427 0.98 Mastigamoeba* 0.65 0.65 Gymnophrys 158071814 Adenoides* 0.67 Spongomonas* 0.24 Arabidopsis 21903409 0.70 Oryza 18700485 0.99 Percolomonas* Naegleria jgi Scaffold 1|500453|501457 0.96 Leishmania 72549845 0.61 0.99 0.72 Scytomonas* 0.97 0.40 Trypanosoma 71659624 Diplonema* Trichomonas 123408121 0.23 Giardia A 30578211 0.91 Giardia B 159119566 0.99 0.60 Spironucleus jgi Scaffold 430|11672|12631 0.99 0.99 Galdieria Genome contig_896_Oct13_2005:g78.t1 Gracilaria 120463106 0.99 Hyaloperonospora 199610564/199610544 0.99 Phytophthora r. jgi 44552 0.95 Pythium 166325657 0.99 Chlamydomonas 158272235 0.60 Chlorella sp. jgi 52039 Micromonas 226524329 0.89 Ostreococcus 145352283 0.99 Thecamonas* 0.99 Nitrosopumilus 161528894 Cenarchaeum 118575453 0.99 0.79 Pyrobaculum 119872227 Candidatus 170290825 Aeropyrum 109689248 0.83 Nanoarchaeum 41615212 0.99 Methanocaldococcus 256811072 0.82 0.1 substitutions/site 0.99 0.93 0.99 1.00 0.99 Rad51 Dmc1 RadA Figure 3.7: Unrooted phylogenetic tree of 157 Rad51, Dmc1 and RadA homologs. Trees were estimated with PhyloBayes (LG+G) from 314 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown. 122 Homo Strongylocentrotus Ciona Eumetazoa Ixodes 0.71/.0.52/70 Apis 0.99/535 Aedes Trichoplax Placozoa 0.44/106 Thecamonas* Apusomonadidae 0.99/761 Cryptococcus 0.30/0.99/843 Coprinus Basidiomycota Ustilago 0.99/787 Schizosaccharomyces 0.56/0.72/177 Saccharomyces Ascomycota 0.99/859 1.00/966 Candida Batrachochytrium Chytridiomycota 0.84/251 0.93/116 Phycomyces Mucoromycotina Monosiga Codonosigidae 0.96/501 Acanthamoeba Acanthamoebidae Dictyostelium Dictyosteliida 0.57/243 Ancyromonas* Ancyromonadidae Ectocarpus PX Clade 1.00/1000 0.99/878 Pylaiella* 0.99/994 Aureococcus Pelagophyceae Thalassiosira Bacillariophyta Phaeodactylum 1.00/1000 0.59/0.63/496 Chaetoceros Hyaloperonospora Phytophthora r. Oomycetes 1.00/1001 0.29/0.71/554 Pythium 1.00/988 Oryza 1.00/1000 Streptophyta Vitis Physcomitrella 1.00/1000 Ostreococcus 0.82/478 0.49/Micromonas sp. Chlorophyta 0.91/721 Chlorella sp. Volvox 0.99/949 0.42/1.00/988 Chlamydomonas Scytomonas* 0.58/Diplonema* Euglenozoa Euglena 0.57/77 Trypanosoma 0.39/1.00/943 Bodo* Cercomonas* Cercomonadida Thaumatomonas* Silicofilosea 0.98/414 0.67/438 Bodomorpha* Cercomonadida Seculamonas* Jakobida Trichomonas Parabasalia 0.38/Galdieria 0.81/147 Cyanidioschyzon Bangiophyceae 0.54/215 Cryptosporidium 0.99/483 Apicomplexa Toxoplasma 0.99/1000 1.00/902 Plasmodium 0.80/- 0.95/248 0.63/- Rad51 Metazoa Opis. Apusozoa Fungi Opis. Choanoflagellida Centramoebida Amoeb. Mycetozoa Apusozoa Stramenopila Chloroplastida Chrom. Arch. Discoba Excav. Cercozoa Discoba Metamonada Rhodophyceae Rhiz. Alveolata Chrom. Excav. 0.1 substitutions/site Figure 3.8: Unrooted phylogenetic tree of 52 Rad51 homologs. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 307 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown. 123 Cryptoccus 58259207 Coprinopsis 3237296 0.99/721 Ustilago 71018413 Schizosaccharomyces 397843 0.61/318 Saccharomyces 4275 0.98/777 1.00/864 Candida albicans 68485285 0.99/234 Phycomyces jgiScaffold_40|201443|202854 Batrachochytrium jgiScaffold_2|1601520|1603332 Aedes 157112162 0.99/468 0.29/Apis 110756953 0.74/103 Ixodes 215491711 0.94/225 Homo 19924133 0.78/131 Ciona 198420224 0.89/168 0.66/178 Strongylocentrotus 115610811 0.72/140 Trichoplax jgiScaffold_6|2098752|2100304 0.95/318 0.88/Thecamonas* Monosiga jgiScaffold_6|780044|781460 Ancyromonas* 0.56/Dictyostelium 66822135 0.48/309 Acanthamoeba*/106789002 0.61/Pendulomonas* 1.00/942 Plasmodium 124803581 1.00/927 Toxoplasma 211963576 0.99/290 Cryptosporidium 209875975 0.92/178 Cyanidioschyzon 151559143 0.96/150 Guillardia 162605684 0.74/Galdieria Galdieria genome contig 785 Trichomonas 123408472 0.58/Jakoba bahamiensis 109794508 0.99/571 Seculamonas* Bodo* 0.56/674 0.99/927 Leishmania 157871568 0.51/Trypanosoma 2108337 Euglena 109787391 Scytomonas* Diplonema TBestDB Scaffold 118 Chaetoceros 164412700 0.56/468 0.52/0.33/0.99/936 Phaeodactylum 219119366 Thalassiosira jgiScaffold_2|665690|666833 0.99/927 Aureococcus jgiScaffold_14|44397|45350 Pylaiella* 0.99/805 0.53/0.99/927 Ectocarpus 241962436 0.48/0.48/Hyaloperonospora 199611623 1.00/927 Phytophthora r. jgi74160 0.66/543 Pythium 207397927 Oryza 18874071 1.00/908 1.00/927 Vitis 225444585 Physcomitrella 16605579 Ostreococcus 145349400 1.00/936 0.57/328 Micromonas sp. jgi226516672 0.44/Chlorella sp. jgi20220 0.89/655 Volvox jgiScaffold_15|851250|854214 0.99/842 1.00/917 Chlamydomonas 45685351 Bodomorpha* 0.99/487 0.69/140 Proleptomonas* 0.94/Thaumatomonas* Cercomonas* 0.17/Jakoba libera* 0.99/721 0.99/805 Rad51 0.1 substitutions/site Figure 3.9: Unrooted phylogenetic tree of 58 Rad51 homologs with accession numbers. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 307 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown 124 Cryptococcus 58259270 Coprinopsis 3237296 Ustilago 71018413 Scizosaccharomyces 397843 0.99/833 0.67/338 Saccharomyces 4275 1.00/922 0.98/267 Candida albicans 68485285 Phycomyces jgiScaffold_40|201443|202854 Batrachochytrium jgiScaffold_2|1601520|1603332 0.98/560 Aedes 157112162 0.69/Apis 110756953 0.98/51 Ixodes 215491711 0.94/260 Homo 19924133 0.74/Ciona 198420224 0.64/82 Strongylocentrotus 115610811 0.81/0.96/Trichoplax jgiScaffold_6|2098752|2100304 0.95/354 Thecamonas* 0.38/194 Monosiga jgiScaffold_6|780044|781460 Ancyromonas* 0.70/Acanthamoeba*/106789002 0.52/0.48/Dictyostelium 66822135 0.57/Pendulomonas* Seculamonas* Jakoba bahamiensis 109794508 0.99/602 1.00/929 Plasmodium 124803581 1.00/1000 Toxoplasma 211963576 0.94/0.81/- 0.98/304 Cryptosporidium 209875975 0.86/173 Cyanidioschyzon 151559143 0.97/Guillardia 162605684 Galdieria Galdieria sulphuraria genome contig 785 0.88/Trichomonas 123408472 0.86/Euglena 109787391 Trypanosoma 2108337 0.99/992 0.83/Leishmania 157871568 0.83/661 Bodo* Jakoba libera* 0.99/979 Oryza 18874071 1.00/990 Vitis 225444585 0.52/Physcomitrella 16605579 0.94/0.44/Cercomonas* Thaumatomonas* 0.93/124 Proleptomonas* 0.67/159 0.99/524 Bodomorpha* 1.00/986 Volvox jgiScaffold_15|851250|854214 0.99/910 Chlamydomonas 45685351 Chlorella sp. jgi20220 0.99/399 Ostreococcus 145349400 0.99/1000 Micromonas sp. jgi226516672 0.79/Scytomonas* Diplonema TBestDBScaffold 118 0.86/0.99/996 Ectocarpus 241962436 0.99/871 Pylaiella* 0.88/Aureococcus jgiScaffold_14|44397|45350 0.99/992 Thalassiosira jgiScaffold_2|665690|666833 Phaeodactylum 219119366 1.00/1000 0.58/496 Chaetoceros 164412700 Pythium 207397927 0.99/995 Phytophthora r. jgi74160 0.94/Hyaloperonospora 199611623 0.99/1000 Nitrosopumilus 161528894 Cenarchaeum 118575453 0.99/1000 0.82/602 Pyrobaculum 119872227 Candidatus 170290825 0.89/631 Aeropyrum 109689248 Nanoarchaeum 41615212 0.99/811 0.81/Methanocaldococcus 256811072 0.1 substitutions/site 0.99/794 0.99/879 0.99/781 Rad51 RadA Figure 3.10: Unrooted phylogenetic tree of 65 Rad51 and RadA homologs with accession numbers. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 314 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. All references are GenBank unless otherwise noted. The consensus topology of 2 PhyloBayes chains is shown. 125 Rad51/Dmc1 Aspergillus Schizosaccharomyces Ascomycota Coprinopsis 0.99/780 1.00/977 Cryptococcus Basidiomycota 1.00/786 Mucoromycotina Phycomyces 0.99/871 Batrachochytrium Chytridiomycota 0.73/Homo 1.00/991 Strongylocentrotus Eumetazoa 0.78/Ancyromonas Ancyromonadidae 0.62/Acanthamoeba Acanthamoebidae Entamoeba Entamoebida 0.97/353 Diplonema 0.71/164 Scytomonas Euglenozoa Leishmania 0.89/92 1.00/1000 Trypanosoma Percolomonas 0.29/Naegleria Heterolobosea 0.24/Thecamonas Apusomonadidae Trichomonas Parabasalia 0.46/Giardia A 0.57/0.27/Giardia B Fornicata 0.98/811 1.00/913 Spironucleus 0.71/ Thaumatomonas Silicofilosea 0.82/363 Bodomorpha Cercomonas Cercomonadida 0.30/Mastigamoeba Mastigamoebida 0.99/972 Oryza 1.00/1000 Arabidopsis Streptophyta 0.90/Physcomitrella Ostreococcus 0.97/260 1.00/1000 Micromonas Chlorophyta 0.99/823 Chlorella 1.00/949 Chlamydomonas 0.98/191 Gracilaria Florideophyceae Galdieria Bangiophyceae Pythium 0.91/1.00/1000 Phytophthora Oomycetes 0.91/Perkinsus Perkinsea Cryptosporidium 0.99/771 Apicomplexa 1.00/995 Toxoplasma 1.00/906 1.00/976 Fungi Opis. Metazoa Apusozoa Centramoebida Amoeb. Archamoebae Discoba Excav. Apusozoa Metamonada Excav. Cercozoa Rhiz. Archamoebae Amoeb. Chloroplastida Arch. Rhodophyceae Stramenopiles Alveolata Chrom. 0.1 substitutions/site Figure 3.11: Unrooted phylogenetic tree of 40 Concatenated Rad51 and Dmc1 homologs. Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 603 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. The consensus topology of 2 PhyloBayes chains is shown. 126 Aspergillus 121709155/169781702 Schizosaccharomyces 3176384/397843 Coprinopsis 6714639/3237296 0.99/780 Cryptococcus 134118469/58259207 1.00/977 1.00/786 Phycomyces jgiScaffold_3|1364891|1940257 and 1122|189|447/jgiScaffold_40|201443|202854 0.99/871 Batrachochytrium jgiScaffold_2|2017505|202793/jgiScaffold_2|1601520|1603332 Homo 13878923/19924133 0.73/Strongylocentrotus 115660762/115610811 1.00/991 0.78/Ancyromonas*/* 0.62/Acanthamoeba Baylorcontig_2440/106789002 Entamoeba 67482427/167387582 Diplonema*/TBestDB Scaffold 118 0.97/353 Scytomonas*/* 0.71/164 Leishmania 72549845/157871568 0.89/92 Trypanosoma 71659624/2108337 1.00/1000 Percolomonas*/* 0.29/Naegleria jgiScaffold_1|500453|501457/Scaffold_63|72794|73744 Thecamonas*/* 0.24/Trichomonas 123408121/123408472 0.46/Giardia A 30578211/ 0.57/Giardia B 71080540/ 0.98/811 0.27/Spironucleus jgiScaffold_430|11672|12631/ 1.00/913 Thaumatomonas/* 0.71/Bodomorpha/* 0.82/363 Cercomonas/* 0.30/Mastigamoeba */* Oryza 18700485/18874071 0.99/972 Arabidopsis 21903409/18420327 1.00/1000 0.90/Physcomitrella jgiScaffold_42|1203633|1204535/16605579 Ostreococcus 145352283/145349400 0.97/260 Micromonas sp. 226524329/jgi226516672 1.00/1000 Chlorella sp. jgi52039/jgi20220 0.99/823 Chlamydomonas 158272235/45685351 1.00/949 Gracilaria 120463106/ 0.98/191 Galdieria Galdieria genomeScaffold 896/contig 785 Pythium 166325657/207397927 0.91/Phytophthora r. jgi76896/jgi74160 1.00/1000 Perkinsus TIGR1637/* 0.91/Cryptosporidium 209879790/209875975 0.99/771 Toxoplasma 237843305/211963576 1.00/995 1.00/906 1.00/976 Rad51/Dmc1 0.1 substitutions/site Figure 3.12: Unrooted phylogenetic tree of 40 Concatenated Rad51 and Dmc1 homologs with accession numbers (Dmc1/Rad51). Trees were estimated with PhyML (LG+G) and PhyloBayes (LG+G) from 603 aligned amino acids. Opisthokonta are highlighted in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in violet, and Excavata in brown. Asterisks indicate data was obtained with degenerate PCR. All references are GenBank unless otherwise noted. The consensus topology of 2 PhyloBayes chains is shown 127 Figure 3.13: Protein sequence alignment of prokaryotic and eukaryotic RecA orthologs with amino acids conserved among 158 protein sequences indicated. Two eubacterial RecA, seven archaebacterial RadA, 98 eukaryotic Rad51, and 51 eukaryotic Dmc1 protein sequences were aligned and analyzed for conserved amino acids. Seven exemplar Rad51 and Dmc1 protein sequences and two RadA protein sequences are presented. Amino acids that were present 100% among all domains, archaebacteria and eukaryotes only, eukaryotes only, and Rad51 or Dmc1 only are highlighted with black, blue, green and yellow respectively. In addition, sites present ≥ 95% in one eukaryotic paralog but different or variable in the other paralog are highlighted in red. Dots mark residues identified during this study for which no function has been determined. Opisthokonta are labeled in purple, Amoebozoa in blue, Chromalveolata in orange, Excavata in brown, Rhizaria in eggplant, and Apusozoa in black. Arrows indicate positions of degenerate PCR primers. Numbers indicate amino acid positions of Saccharomyces cerevisiae Rad51. Supergroups were represented at each amino acid position except Apusozoa (126-356) and Rhizaria (188-397). . 128 Rad51 Saccharomyces Entamoeba Oryza Plasmodium Trypanosoma Cercomonas* Ancyromonas* 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 |....|....|....|....|....|....|....|....|....|....|~.|....|....|....|....|....|..~..|....|....|....|....|....|....|~....|....|....|....|....|....|....|....|.... GITMADVKKLRESGLHTAEAVAYAPRKDLLEIKGISEAKADKLLNEAARLV~FVTAADFHMRRSELICLTTGSKNLDTLLG~GGVETGSITELFGEFRTGKSQLCHTLAVTCQIP~LDIGGGEGKCLYIDTEGTFRPVRLVSIAQRFGLDPDDALNNVAY GITEGDCKKLEEAGFFTVQSIAFTPKKQLITIKGISDAKADKLLAESSKIV~FTNAAELNNLRKETIRITTGSRELDKLLC~GGFETGSITELFGEFRTGKTQLCHQLCVTCQLG~IENGGTEGRAIYIDTEGTFRPERLTQIAEKYGLNSEEALNNVAV GIAALDVKKLKDSGLYTVESVAYTPRKDLLQIKGISEAKVDKIVEAASKLV~FTSASQLHAQRLEIIQVTTGSRELDKILD~GGIETGSITEIYGEFRSGKTQLCHTLCVTCQLP~LDQGGGEGKALYIDAEGTFRPQRLLQIADRFGLNGADVLENVAY GFVKRDLELLKEGGLQTVECVAYAPMRTLCSIKGISEQKAEKLKKACKELC~FCNAIDYHDARQNLIKFTTGSKQLDALLK~GGIETGGITELFGEFRTGKSQLCHTLAITCQLP~IEQSGGEGKCLWIDTEGTFRPERIVAIAKRYGLHPTDCLNNIAY GIASADIKKLMESGFYTVESVAYAPKKNILAVKGISETKADKIMAECAKLV~FTSAVVYHEARKEIIMVTTGSREVDKLLG~GGIETGGIRELFGEFRTGKTQLCHTLCVTCQLP~ISQGGAEGMPLYIDTEGTFRPERLVAVAERYKLDPQDVLSNVAC -------------GLTPSSPLRYATTKRMTAMKGISDAKALKLVAEAAKYV~FTTATEYHQQREEIIQPHHRRRRTGACVG~GGVETGCITEMFGEFRTGKTQLCHTLCVTCQLK~VEQGGGEGKALYIDTEGTFRPKRLIAIAERFGLNPMDVLDNVAY -------------------------------------TKAEKLRIEAANQI~FTTASAFNMQRENVIHLTSGSKAVDDLLG~GGFETGSITEICGEFRTGKTQLCHTLCVTCQLP~LESGGGVGKALYIDTEGTFRPERLLAIAERYGLSGQDVLDNVCY Dmc1 Saccharomyces Entamoeba Oryza Plasmodium Trypanosoma Gymnophrys Amastigomonas* GINASDLQKLKSGGIYTVNTVLSTTRRHLCKIKGLSEVKVEKIKEAAGKII~FIPATVQLDIRQRVYSLSTGSKQLDSILG~GGIMTMSITEVFGEFRCGKTQMSHTLCVTTQLP~REMGGGEGKVAYIDTEGTFRPERIKQIAEGYELDPESCLANVSY GINVGDINKLKSAGCNTIESVVMHTHKELCAIRGFSDSKVDKIMEAVSKIF~FISATTSLERRANVIKITTGSSQFDQLLG~GGIETMSVTEMFGEFRTGKTQLCHTLAVTTQLP~SHLKGGNGKVAYIDTEGTFRPERIAQIAERFGVDQTAVLDNILI GINSGDVKKLQDAGIYTCNGLMMHTKKSLTGIKGLSEAKVDKICEAAEKLL~FMTGSDLLIKRKSVVRITTGSQALDELLG~GGIETLCITEAFGEFRSGKTQLAHTLCVSTQLP~IHMHGGNGKVAYIDTEGTFRPERIVPIAERFGMDANAVLDNIIY GINAADINKLK-GGYCTILSLIQATKKELCNVKGISEVKVDKILEVASKIE~FITGNQLVQKRSKVLKITTGSSVLDKTLG~GGFESMSITELFGENRCGKTQVCHTLAVTAQLP~KNMQGGNGKVCYIDTEGTFRPEKICKIAQRFGLNSEDVLDNILY GVATADIAKLRQAGIFTVAGIHMQCRKDLALIKGLSDAKVEKIIEAARKLF~FTNGVTYLQQRGKVTRMTTGSTALDQLLG~GGIESMSITEAFGEFRTGKTQIAHTLCVTCQLP~TSMGGGNGKVIYVDTESTFRPERIKPIAARFGLDADAVLNNILV ---------------------------------------------------~-----------------------------~----------------------------TAQMP~TEMGGGNGKVVYIDTEGTFRPQRIQAISERFGVDATAVLDNITY ---------------------------------------------------~-----------------------------~----------------TGKTQIAHTLCVTSQLP~LEAGGGGGKVLYIDTEGTFRPGRIVQIAERYGLDSNDVLENILT RadA Cenarchaeum Nanoarchaeum GVGPVTKKKLEDSGVHSMMDLVVRGPVELGEISSMSSEICEKIVTIARKRL~FASGSEIYKRRQSIGMITTGTDALDALLG~GGIETQAITEVFGEFGSGKTQFCHTMCVTTQKP~KEEGGLGGGVMYIDTEGTFRPERVVTIAKANNMDPAKLLDGIIV GVGPKTAEKLISAGYDSLIKIASASVEELMEAADIGEATARKIIEAAMERL~FKTAEEVLEERQKTARITTMSKNLDSLLG~GGIETAALTEFYGEYGSGKTQVGHQLAVDVQLP~PEQGGLEGKAVYIDTEGTFRPERIKQMAEALDLDPKKALKNVYH RecA Bacillus Thermus ------------MSDRQAALDMALKQIEKQFGKGSIMKLGEKTDTRISTVP~------------------SGSLALDTALG~GGYPRGRIIEVYGPESSGKTTVALHAIAEVQEK~------GGQAAFIDAEHALDPVYAQKLGVNIEELLLSQPDT-------------MDESKRKALENALKAIEKEFGKGAVMRLGEMPKQQVDVIP~------------------TGSLALDLALG~GGIPRGRIVEIYGPESGGKTTLGLTIIAQAQRR~------GGVAAFVDAEHALDPLYAQRLGVQVEDLLVSQPDT--- HhH (dsDNA binding) Rad51 Saccharomyces Entamoeba Oryza Plasmodium Trypanosoma Cercomonas* Ancyromonas* Subunit Subunit Polym‐ Rotation erization Walker A (Triphosphate binding) 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 |....|....|....|....|...~.|....|....|.~...|....|~....|....|....|....|....|....|....|..~..|....|....|....|....|..~..|....|....~|....|....|....|....|.. ARAYNADHQLRLLDAAAQMMSESR~FSLIVVDSVMALY~RTDFSGRGE~LSARQMHLAKFMRALQRLADQFGVAVVVTNQVVAQVD~NPDPKKPIGGNIMAHSSTTRLGFKK~GKGCQRLCKVVD~SPCLPEAECVFAIYEDGVGDPRE ARAHNTEHQMQLLQMASGLMAKER~YGLLIIDSATALY~RTDYSGRGE~LASRQMHLAKFLRALQRIADEFSVAVVLTNQVVAQVD~GGDTKKPVGGNIIAHASTTRLYLRK~GKGEARICKVYD~SPCLPESEASFAITTNGIEDVKD ARAYNTDHQSRLLLEAASMMIETR~FALMIVDSATALY~RTDFSGRGE~LSARQMHMAKFLRSLQKLADEFGVAVVITNQVVAQVD~AGPQIKPIGGNIMAHASTTRLALRK~GRGEERICKVIS~SPCLAEAEARFQIASEGVADVKD AKAYNCDHQTELLIDASAMMADTR~FALLIVDSATALY~RSEYTGRGE~LANRQSHLCRFLRGLQRIADIYGVAVIITNQVVAKVD~GGHEKIPIGGNIIAHASQTRLYLRK~GRGESRICKIYD~SPVLPEGEAVFAITEGGIADYEE ARAFNTDHQQQLLLQASAMMAENR~FAIIIVDSATALY~RTDYSGRNE~LAARQMHLGKFLRSLHNLAEEYGVAVVVTNQVVANVD~QADAKKPIGGHIMAHASTTRLSLRK~GRGEQRIMKVYD~SPCLAEAEAIFGIYEDGVGDARD ARAYNSDHQSKLLMQAAGMLTEAR~YALVVVDSATALY~RTDYSGRGE~LSARQMHLARFLRQLQRLADEFGVAVVITNQVVASVD~FGDPLKPIGGNIMAHSSTTRLSLRK~GRGET-------~----------------------ARAYNSDHQNQLLQQAAGIMAESR~YVLMIVDSATALY~RTDYSGRGE~LSARQMHLAKFLRQLMRLADEYGIAVVITNQVVAQVD~ASDPKK-------------------~------------~----------------------- Dmc1 Saccharomyces Entamoeba Oryza Plasmodium Trypanosoma Gymnophrys Amastigomonas* ARALNSEHQMELVEQLGEELSSGD~YRLIVVDSIMANF~RVDYCGRGE~LSERQQKLNQHLFKLNRLAEEFNVAVFLTNQVQSDPG~SADGRKPIGGHVLAHASATRILLRK~GRGDERVAKLQD~SPDMPEKECVYVIGEKGITDSSD ARAYTHEQQFDLLIEVAARMAEDH~FRMLIIDSVTSLF~RVDFSGRGE~LSERQQKLGKMMNKLIKISEEFNVAVVITNQVMSDPG~VVDPKKPIGGHVIAHASTTRLYLRK~GKGEQRIVKIYD~SPNLPEAEATFAIDTGGIIDAKD ARAYTYEHQYNLLLGLAAKMAEEP~FRLLIVDSVIALF~RVDFSGRGE~LAERQQKLAQMLSRLTKIAEEFNVAVYITNQVIADPG~ITDPKKPAGGHVLAHAATIRLMLRK~GKGEQRVCKIFD~APNLPEGEAVFQVTSGGIMDAKD ARAFTHEHLYQLLATSAAKMCEEP~FALLVVDSIISLF~RVDFSGRGN~LSERQQKLNKIMSVLSKLGEQFNIAIVITNQVMSDPG~IANPMKPVGGHVIGHASTTRLSLRK~GKGDQRVCKVYD~APNLPEIECIFQLSDGGVIDALD ARAYTHEHQMHLLSMVAAKMAEDQ~FGLLVVDSITALF~RVDFSGRGE~LAERQQKLAKMMSHLIKLAEEFNVAVYITNQVVADPG~FVDPKKPVGGHILAHASTTRLSLRK~GRGDQRVCKIYD~SPSLPEVECVFSISEQGIVDARE ARAYTHEHQYELLTAVAAKMTEER~YALLIVDSVTALF~RVDFSGRGE~LAERQQKLAQFLSKLIKIAEEFNIAVFITNQVVADPG~VADTKKPIGGHILAHASTTRLFLRK~GRAEQRICKIYD~SPCLPESEAVYQLTNGGVADATD VRVYTHEQQYNMLVRAAALMADDG~IRMLIVDSITALF~RVDYTGRGQ~LAERQQKLNQMLARLTKLADEFNIAVFI---------~-------------------------~------------~----------------------- RadA Cenarchaeum Nanoarchaeum ARAYNSSHQVLILEEAGKTIQEEN~IKLIISDSTTGLF~RSEYLGRGT~LASRQQKLGRYIRLLARIAETYNCAVLATNQVSSSPD~FGDPTRPVGGNVVGHASTYRIYFRK~GGKNKRVAKIID~SPHHPASEAVFELGERGVQDTEE MKVFNTDHQMLAARKAEELIRKGE~IKLIVVDSLTALF~RAEYTGRGQ~LAERQHKLGRHVHDLLRIAELYNVAIYVTNQVMAKPD~GLDSVQAVGGHVLAHASTYRVFLRK~GKKGIRIARLVD~SPHLPERETTFVITEEGIRDPE- RecA Bacillus Thermus --------GEQALEIAEALVRSGA~VDIVVVDSVAALV~KAEIEGDMG~VGLQARLMSQALRKLSGAINKSKTIAIFINQIREKVG~FGNPETTPGGRALKFYSSVRLEVRR~GEGISKEGEIID~LDIVQKSGSWYSYEEERLGQGRE --------GEQALEIVELLARSGA~VDVIVVDSVAALV~RAEIEGEMG~VGLQARLMSQALRKLTAVLAKSNTAAIFINQVREKVG~YGNPETTPGGRALKFYASVRLDVRK~GRGLDPVADLVN~AGVIEKAGSWFSYGELRLGQGKE Loop L1 (ssDNA binding) Loop L2 (ssDNA binding) 128 Walker B (Mg++ binding) 129 Rad51 Dmc1 RadA RecA Saccharomyces Entamoeba Oryza Plasmodium Trypanosoma Cercomonas* Ancyromonas* Saccharomyces Entamoeba Oryza Plasmodium Trypanosoma Gymnophrys Amastigomonas* Methanococcus Sulfolobus Eschirichia Bacillus Thermus 0.33 0.22 0.37 0.33 0.23 0.28 0.44 0.46 0.44 0.45 0.40 0.43 0.52 0.59 0.55 0.75 0.79 0.76 Sac. 0.30 0.35 0.32 0.26 0.27 0.43 0.48 0.46 0.52 0.43 0.42 0.46 0.56 0.57 0.82 0.80 0.81 Ent. 0.34 0.28 0.18 0.26 0.48 0.44 0.43 0.48 0.42 0.41 0.47 0.61 0.59 0.81 0.79 0.77 Ory. 0.00-0.19 Avg. recA RadA Rad51 Dmc1 Dmc1 0.77 0.55 0.46 0.34 Rad51 0.80 0.57 0.29 0.35 0.28 0.34 0.51 0.52 0.47 0.53 0.48 0.43 0.49 0.60 0.53 0.82 0.80 0.79 Pla. Rad51 0.28 0.29 0.47 0.51 0.49 0.50 0.47 0.46 0.51 0.60 0.56 0.82 0.83 0.81 Try. 0.20-0.29 0.16 0.40 0.43 0.47 0.48 0.43 0.45 0.47 0.46 0.43 0.45 0.37 0.38 0.46 0.44 0.54 0.58 0.55 0.53 0.81 0.82 0.79 0.82 0.79 0.80 Cer. Anc. 0.30-0.39 0.46 0.41 0.44 0.43 0.39 0.42 0.51 0.49 0.77 0.77 0.75 Sac. 0.28 0.32 0.29 0.28 0.36 0.54 0.61 0.76 0.81 0.77 Ent. 0.40-0.49 0.31 0.26 0.21 0.34 0.57 0.56 0.77 0.80 0.76 Ory. 0.30 0.30 0.40 0.57 0.57 0.76 0.82 0.78 Pla. Dmc1 0.50-0.59 0.26 0.36 0.33 0.55 0.56 0.51 0.56 0.55 0.53 0.53 0.78 0.74 0.77 0.82 0.77 0.82 0.76 0.77 0.84 0.76 0.76 0.74 0.74 0.85 0.76 Try. Gym. Ama. Met. Sul. RadA 0.60-0.69 0.70-0.79 0.30 0.29 0.26 Esc. Bac. RecA 0.80-0.89 Figure 3.14: p-distance matrix of prokaryotic and eukaryotic RecA orthologs. Uncorrected distances between eukaryotic Rad51 and Dmc1, archaebacterial RadA, and eubacterial RecA protein sequences. All currently recognized eukaryotic supergroups are represented (purple=Opisthokonta, light blue=Amoebozoa, green=Archaeplastida, orange=Chromalveolata, brown=Excavata, blue=Rhizaria), and the currently unclassified Apusozoa (Ancyromonas and T. trahens). Calculations were performed with MEGA. Asterisks designate sequences obtained with degenerate PCR. 129 130 Table 3.2: Degenerate primers and their positions. Name Nucleotide Sequence Amino Acid Sequence I K G L S D/E I K G L S D/E I K G L S D/E I K G I S D/E I K G I S D/E E M F G E F R/S D R E G T F R/S T E G T F R/S P TNQVVA TNQVVAH G G H/N I F A KGKGE KGRGET forward 1 ATC AAG GGC TTR AGY GA forward 2 ATC AAG GGA CTN TCN GA forward 3 ATC AAG GGA CTN AGY GA forward 4 ATC AAG GGC ATH TCN GA forward 5 ATC AAG GGC ATH AGY GA forward 6 GAG ATG TTC GGC GAR TTY MG forward 7 GAC AGG GAA GGC ACN TTY MG forward 8 ACT GAA GGC ACN TTY MGN CC reverse 1 AGC GAC GAC YTG RTT NGT reverse 2 ATG CGC GAC NAC YTG RTT reverse 3 TGC GAA DAT RTK NCC NCC reverse 4 TTC ACC YTT NCC YTT reverse 5 AGT CTC ACC ACK NCC YTT Perkinsus ATT GAC CAG GGC ATA GGT IDQGIGT forward Perkinsus AAT TCT GAG CGC AAC AGG TT N L L R S E F/L reverse Note: positions are relative to the Saccharomyces cerevisiae Rad51 protein Position 122 122 122 122 122 184 221 222 331 332 355 370 371 98 290 131 Table 3.3: Proposed functions of residues identified during this study. Protein RecA RadA Rad51 Dmc1 RadA Rad51 Dmc1 Rad51 Dmc1 Rad51 ≠ Dmc1 Residue G185 D219 E221 D280 S281 N325 G346 G347 Function ATP binding DNA binding R287 DNA binding F144 E176 F224 R225 L285 L296 H356 E182 E295 R299 A351 H352 A320 K371 E382 E186 R293 G294 K343 K191 Q193 N254 A265 R308 D332 I349 Subunit binding DNA binding References (Pellegrini et al. 2002) (Story, Weber, and Steitz 1992) (Okorokov et al. 2010) (Chen et al. 2007) (Conway et al. 2004) (Grigorescu et al. 2009) (Okorokov et al. 2010) (Pellegrini et al. 2002) (Seong et al. 2009) (Shin et al. 2003) (Story, Weber, and Steitz 1992) Rad52 binding Subunit binding Rad54 binding DNA binding (Okorokov et al. 2010) (Pellegrini et al. 2002) (Shin et al. 2003) (Story, Weber, and Steitz 1992) ATP binding Subunit binding Subunit/BRC4 DNA binding (Chen et al. 2007) (Shin et al. 2003) (Story, Weber, and Steitz 1992) Dmc1 ≠ Q301 Subunit binding (Shin et al. 2003) Rad51 Note: Functions were determined by analysis of RecA, RadA, Rad51, and Dmc1 protein structures. Amino acids were either identified in 100% of sequences at that position or 95% among one eukaryotic paralog but different or variable in the other. Amino acid positions are relative to the Saccharomyces cerevisiae Rad51 protein. 132 CHAPTER 4 MEIOSIS-SPECIFIC GENES AROSE BY DUPLICATION PRIOR TO THE LAST COMMON ANCESTOR OF EUKARYOTES Abstract Meiosis is a distinct and nearly universal feature of eukaryotes. However, the origins and evolutionary histories of genes that encode proteins that function during meiosis remain largely unknown. Whether the last eukaryotic common ancestor (LECA) was capable of meiosis is unknown. Also, whether meiosis in the LECA may have used the same machinery used by extant eukaryotes to complete important functions during meiosis, such as: 1) synaptonemal complex formation; 2) interhomolog DNA strand exchange; 3) Holliday junction resolution; and 4) sister chromatid cohesion, is unknown. We present our inventory of 20 genes whose products catalyze these important functions (Hop1, Rad21, Rec8, Spo11-1, Spo11-2, Spo11-3, Rad51, Dmc1, Hop2, Mnd1, Pms1, Mlh1, Mlh2, Mlh3, Msh2, Msh3, Msh4, Msh5, Msh6, and Mer3) among 46 diverse eukaryotes. For the first time, genomes of representatives from all eukaryotic supergroups and the Apusozoa (Thecamonas trahens) were tested for the presence of these meiotic components. We used alignments of phylogenetically verified protein sequence data to search nucleotide, EST, and protein sequence repositories. We determined that 10 of 20 genes are present in all eukaryotic supergroups and the unclassified Apusozoa, and 19 were likely present in the LECA. I also performed phylogenetic analyses on the protein sequence data obtained for all of the eukaryotes tested, revealing a pattern of gene duplications, most prior to the LECA. Many genes that encode proteins known to function only during meiosis in model organisms are paralogs of genes whose products also function during mitotic DNA damage repair or maintenance. In addition, these genes most likely arose by duplication of genes involved 133 in DNA damage repair. These data indicate that meiosis itself likely arose by gene duplication. Introduction The evolutionary forces that gave rise to meiosis are unknown (Cavalier-Smith 2002d; d'Erfurth et al. 2009; Wilkins and Holliday 2009; Bernstein and Bernstein 2010). Efforts to collect data on the origins of meiotic genes are ongoing (Villeneuve and Hillers 2001; Ramesh, Malik, and Logsdon 2005; Malik et al. 2008; Cavalier-Smith 2010; Wickstead, Gull, and Richards 2010). Meiosis is distinguished from mitosis by two nuclear divisions (reductional and equational) following a single genome-wide replication event (generally resulting in four genetically distinct haploid products), rather than one nuclear division (generally resulting in two genetically identical diploid products). Despite this dramatic difference, many events that occur during meiosis are analogous to mitosis, which itself depends upon functional components of somatic DNA mismatch and damage repair (Borts, Chambers, and Abdullah 2000; Marcon and Moens 2005; d'Erfurth et al. 2009; Wilkins and Holliday 2009; Bernstein and Bernstein 2010). Furthermore, some genes that encode products that process DNA damage in all domains of life are homologous to genes that encode proteins that function during meiosis in eukaryotes (Ramesh, Malik, and Logsdon 2005; Malik et al. 2007; Malik et al. 2008). It is from these observations that the generally held notion in which meiosis arose from mitosis early during eukaryotic evolution naturally emanates (Wilkins and Holliday 2009). Determining when meiotic genes appeared during the history of eukaryotes (especially those that function only during meiosis in model organisms) and what genetic mechanisms were responsible will help to clarify how meiosis arose (Ramesh, Malik, and Logsdon 2005). Differences between meiotic and mitotic forms of nuclear division are apparent during meiotic prophase I, during which interactions between homologous chromosomes ensure their appropriate alignment and subsequent segregation (Dudas 134 and Chovanec 2004; Krogh and Symington 2004; Filippo, Sung, and Klein 2008; d'Erfurth et al. 2009; Wilkins and Holliday 2009). Events necessary for completion of meiotic prophase I in many eukaryotes include: 1) synaptonemal complex formation; 2) interhomolog DNA strand exchange; 3) sister chromatid cohesion; and 4) Holliday junction resolution (d'Erfurth et al. 2009; Wilkins and Holliday 2009). Several studies established that genes that encode products known to function during these events in model organisms arose very early during eukaryotic evolution (Paques and Haber 1999; Villeneuve and Hillers 2001; Dudas and Chovanec 2004; Krogh and Symington 2004; Ramesh, Malik, and Logsdon 2005; Filippo, Sung, and Klein 2008; Malik et al. 2008; Wickstead, Gull, and Richards 2010). Definitive evidence that genes involved in different stages of meiosis were present in the last common ancestor to all eukaryotes have not been produced because of several limiting conditions, including: a) failure to place a root on the eukaryotic phylogeny (Baldauf 2003; Simpson and Roger 2004; Roger and Simpson 2009); b) lack of genome-sequence data for all eukaryotic supergroups; and c) the existence of currently unclassified eukaryotes. The failure to completely resolve evolutionary relationships among eukaryotes makes it possible that any unsampled supergroups or unclassified eukaryotes may be the earliest-diverging eukaryotes and exclusion from analyses could result in misestimations of the presence of genes in the common ancestor of extant eukaryotes. Furthermore, biased taxonomic sampling of eukaryotes has been problematic for phylogenetic analyses (Dunn et al. 2008), clouding any conclusions regarding the evolution of meiotic genes. We performed extensive searches of sequence databases for 20 genes that encode products involved in sister chromatid cohesion, pairing of homologous chromosomes, synaptonemal complex formation, and interhomolog DNA strand exchange (Hop1, Rad21, Rec8, Spo11-1, Spo11-2, Spo11-3, Rad51, Dmc1, Hop2, Mnd1, Pms1, Mlh1, Mlh2, Mlh3, Msh2, Msh3, Msh4, Msh5, Msh6, and Mer3) in the genomes of 46 diverse eukaryotes. Ten of these genes (Hop1, Rec8, Spo11-1, Spo11-2, 135 Dmc1, Hop2, Mnd1, Msh4, Msh5, and Mer3) are known to function only during meiosis in model organisms in model organisms (Malik et al. 2007). Access to newly sequenced genomes of eukaryotes from previously neglected groups greatly increased the breadth of sampling and provided our first glimpses of the suites of meiotic genes present in the supergroup Rhizaria (Bigelowiella natans), the first order group Haptophyta (Emiliania huxleyi), and currently unclassified Apusozoa (Thecamonas trahens). Distributions of 10 genes that encode proteins that function during four distinct stages of meiosis (eight that are meiosis-specific) indicate these genes were present in the last common ancestor to all eukaryotes. These analyses also provide data supporting the additional presence of Spo11-3 and Msh3 in the last common ancestor of eukaryotes; only Rec8 and Spo11-2 may have arisen later during eukaryotic evolution. Eukaryotic homologs that encode products that function during mitosis and DNA repair are frequently paralogs of meiosis-specific gene products. Several genes most likely arose from other genes that encode products that function during DNA repair, replication, or transformation (i.e. orthologs of archaebacterial Ski2, Top6A, RadA, MutL, and MutS genes). The bulk of meiotic genes tested arose once by duplication prior to the last common ancestor of eukaryotes (Malik et al. 2007; Bernstein and Bernstein 2010). The presence of so many genes in the last common ancestor that encode products necessary for a range of important steps during meiosis in extant eukaryotes strongly implies that meiosis in the last common ancestor was similar to meiosis observed in most eukaryotes today. Results and discussion Distributions of meiotic genes We present the distribution of 20 genes that encode proteins that function during meiosis (Hop1, Rad21, Rec8, Spo11-1, Spo11-2, Spo11-3, Rad51, Dmc1, Hop2, Mnd1, Mlh1-3, Pms1, Msh2-6, and Mer3) among 46 diverse eukaryotes (Figure 4.1). These genes were selected on the basis that their products are important for four different stages 136 of meiosis: 1) synaptonemal complex formation; 2) interhomolog DNA strand exchange; 3) sister chromatid cohesion; and 4) Holliday junction resolution (Table 4.1) (Kleckner 1996; d'Erfurth et al. 2009). The taxonomic sampling includes representatives of every currently recognized eukaryotic supergroup and the Apusozoa (Thecamonas trahens). Ten genes (Rad51, Dmc1, Hop2, Mnd1, Mlh1, Mlh3, Msh2, Msh4, Msh5, and Msh6) are present in representatives of every supergroup and T. trahens, implying that they were present in the last eukaryotic common ancestor (LECA) (Figure 4.2). An additional six genes (Hop1, Rad21, Spo11-1, Pms1, and Mer3) are missing from representatives of at least one eukaryotic supergroup and/or T. trahens (Figures 4.1 and 4.2) but are likely to have been present in the LECA, given their distributions and our current understanding of the evolutionary relationships among eukaryotes (Figure 1.2). In addition, we interpreted the distribution of genes in the context of phylogenetic analyses performed on translated amino acid sequences of putative paralogs, with and without products of archaebacterial orthologs (Figures 4.3 - 4.15). Tree topologies retrieved with phylogenetic analyses of protein sequences translated from the 16 genes inferred to have been present in the LECA feature strongly supported monophyletic clades for many paralogs. Similarly, strongly supported topologies from analyses including Spo11-3, Msh3, Mlh2, and Rec8 protein sequences support the monophyly of their genes (Figures 4.5; 4.6; 4.14; and 4.15). Since these paralogs arose simultaneously during eukaryotic evolution, the distribution of one paralog can be inferred to be true for the other. Since the Spo11-1 gene is inferred to have been present in the LECA and the Spo11-3 gene arose at the same time, Spo11-3 is also likely to have been present in the LECA. This inference is especially important if genes are apparently absent from particular groups (e.g. Discoba and/or Metamonada) that have been hypothesized as the earliest-diverging eukaryotes. As such, absences from such groups could indicate that a genes was not present in the LECA but arose early during eukaryotic evolution, after the 137 divergence of some eukaryotes. Thus, the Spo11-3, Msh3, Mlh2, and Rec8 genes were likely present in the LECA (Figure 4.2). Only one gene (Spo11-2) may have arisen later during eukaryotic evolution given its distribution (Figures 4.1; 4.5; and 4.6). The Spo112 gene is apparently absent from genomes of the Metamonada tested (Trichomonas vaginalis, Giardia intestinalis, and Spironucleus vortens). In addition, the phylogenetic analyses of Spo11-2 protein sequence data (Figures 4.5 and 4.6) retrieve topologies in which the Spo11-2 clade is nested within the Spo11-1 clade. This indicates that the Spo11-2 gene may have arisen later during eukaryotic evolution, if the Metamonada are the earliest-diverging eukaryotes (Figure 1.2). The phylogenetic analyses performed here indicate also that all of the meiotic genes tested arose by gene duplication (Figure 4.16) and many are orthologous to archaebacterial genes that encode proteins that function during DNA damage repair. Interestingly, several genes that encode proteins that function only during meiosis in model organisms (Hop1, Rec8, Spo11-1, Spo11-2, Dmc1, Msh4, Msh5, and Mer3) are paralogs of genes whose products function also during meiosis, mitosis, DNA damage repair, or maintenance (Rev7, Rad21, Spo11-3, Rad51, Msh2, Msh3, Msh6, Brr2, and Slh1) (Table 4.2). Further, some genes are orthologs of archaebacterial genes whose products function during DNA damage repair or maintenance (Top6A, Ski2, RadA, MutS, and MutL). Whether the duplications of meiosis-specific genes occurred simultaneously, due to large-scale genome duplication events is unknown. However, prior studies indicate that great numbers of gene duplications are likely to have occurred in the LECA (Zhou, Lin, and Ma 2010). Although we cannot be certain that the gene duplication events yielding meiosis-specific genes mark the origin of meiosis itself, these gene duplications most certainly resulted in the meiotic functions observed today. Assessment of distributions To determine the likelihood that observed gene absences indicate true losses of genes from genomes (Figures 4.1 and 4.17 and Table 4.2) the heuristic metric developed 138 in Chapter 2 was applied. Here, the proportions of observed absences explained by sequence detection failures (type II error) were estimated. Among observed absences from genomes of any of the Dmc1, Pms1, Msh3, Msh4, Msh6, and Mer3 genes there is a ~ 1-10% chance that the gene is present in the genome sequence but was not detected. If a given organism’s genome is well covered, then the gene has most likely been lost by the organism (e.g. Ustilago maydis Rad51). However, if a there is a possibility that the genome sequencing is incomplete, the gene may by present in the genome but not in the genome assembly (e.g. Bigelowiella natans Pms1). The data analyzed here indicate that sporadic secondary gene losses occur frequently among diverse eukaryotes (Figures 4.1 and 4.17), a pattern first demonstrated among genes that encode DNA strand exchange proteins (Chapter 2). Case study: the Spo11 genes Some apparent absences of meiotic genes are more ambiguous. For example, the absence of the Spo11-1 and Spo11-2 genes from the genome sequences of D. purpureum and Polysphondylium pallidum may be due to either true losses of genes from the genomes or false negatives caused by sequence detection failures. Assessment of Spo111 and Spo11-2 protein sequences (meiosis-specific transesterases that introduce dsDNA breaks necessary for homologous recombination (Keeney, Giroux, and Kleckner 1997; Baudat and Keeney 2001; Lichten 2001; Szekvolgyi and Nicolas 2010)) indicate that a high proportion of observed absences are likely due to false negatives caused by the inability to detect gene sequences (0.67 and 0.45, respectively) (Table 4.2). In addition, only the genome of D. purpureum has been completely sequenced. The authors of a previous study in which the distribution of Spo11-1, Spo11-2, and Spo11-3 genes hypothesized that the observed absences of Spo11-1 and Spo11-2 genes from the genome of D. discoideum may be due to incomplete genome sequence coverage or a result of mutagenesis during axenic cultivation (Malik et al. 2007). 139 However, the additional absence these genes from D. purpureum implies that the Spo11-1 and Spo11-2 genes were absent in the common ancestor of the two Dictyostelium species. Recent population genetic data indicate that D. discoideum populations display a rapid decay of linkage disequilibrium and recombinant genotypes, consistent with meiotic recombination (Flowers et al. 2010). In addition, formation of macrocysts (resulting from the fusion of two haploid cells) has been observed with D. purpureum(Mehdiabadi et al. 2009) and D. giganteum (Mehdiabadi et al. 2010) and synaptonemal complexes have been observed in D. discoideum (Okada et al. 1986). Therefore, it is likely that meiosis and homologous recombination occurs in D. purpureum. There are three possibilities that explain the apparent absences of the Spo111 and Spo11-2 genes in D. purpureum: i) the rate of dsDNA breaks is sufficiently high to stimulate interhomolog DNA strand exchange without Spo11-1 or Spo11-2; ii) another nuclease is introducing dsDNA breaks, or iii) the sequences have diverged, making them difficult to detect. This study cannot distinguish among these possibilities. Conclusions We performed extensive search for homologs of 20 genes that encode products that are known to catalyze at least four important tasks during meiosis: 1) synaptonemal complex formation; 2) homologous recombination; 3) Holliday junction resolution; and 4) sister chromatid cohesion (Table 4.2). The distributions of ten genes (Rad51, Dmc1, Hop2, Mnd1, Mlh1, Mlh3, Msh2, Msh4, Msh5, and Msh6) indicate they are present their presence in the genomes of representatives from every currently recognized eukaryotic supergroups and the Apusozoa (Thecamonas trahens) (Figures 4.1 and 4.2). Some genes are absent from the genomes of one or more eukaryotic supergroups or T. trahens (Hop1, Rad21, Spo11-1, Pms1, and Mer3). However, based upon our current understanding of the evolutionary relationships of eukaryotes (Figure 1.2), we determined that these genes are likely to have been present in the last eukaryotic common ancestor (LECA). We also performed phylogenetic analyses on the proteins translated from all of the genes collected 140 (Figures 4.3 – 4.15). We used protein sequences of paralogs and archaebacterial orthologs to root the phylogenies. On the basis of these analyses we determined that an additional four genes (Rec8, Spo11-3, Mlh2, and Msh3) are likely to have been present in the LECA, despite their apparent absences from representatives of multiple eukaryotic supergroups. Only one gene (Spo11-2) may have arisen later during eukaryotic evolution, based upon its distribution and phylogenetic analyses that retrieve topologies in which the Spo11-2 clade is nested within the Spo11-1 clade. Frequently, we observed that genes arose by duplication, often in the LECA, of genes that are likely to have encoded proteins that functioned during DNA damage repair (Figure 4.16). In addition, we noticed that many homologs are paralogs in which at least one gene encodes products that function only during meiosis in model organisms and at least one other paralog that functions during both meiosis and mitosis. Nearly all of the genes here (except Hop2 and Mnd1 for which no other eukaryotic or archaebacterial orthologs have been identified) likely arose by duplications of genes that encode DNA repair proteins, yielding multiple genes whose products are both meiosis-specific and generalist in nature, within the LECA. These data are most consistent with the possibility that meiosis arose from mitosis (Marcon and Moens 2005; d'Erfurth et al. 2009; Wilkins and Holliday 2009). Methods Database Searches Keyword searches (e.g. Saccharomyces cerevisiae Rad51) of the National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov/)protein sequence database retrieved protein sequences for representatives of animals, fungi, and plants. In addition, the Clusters of euKaryotic Orthologous Groups of proteins (KOGs) database for each protein was accessed (Tatusov et al. 2000). Sequence identities were initially verified using the tBLASTn (Altschul et al. 1997) option of the Basic Local Alignment Search Tool (BLAST), in which the translated nucleotide database is 141 searched using a protein query and evaluating the results (bi-directional BLAST). These protein sequences were subsequently used as queries to search genome sequence databases at NCBI and other publicly available sites (Table 4.3) with BLASTp, tBLASTn, and BLASTn, as necessary, for all available Hop1, Rev7, Rad21, Rec8, Spo11-1, Spo11-2, Spo11-3, Rad51, Dmc1, Hop2, Mnd1, Mlh1-3, Pms1, Msh2-6, Mer3, Slh1, and Brr2 sequences available for a set of 46 taxa from June through August 2010. Once additional protein sequence data were obtained, searches were also performed using protein sequence data from closely related organisms likely to share more recent common ancestors as queries. Identities of sequences were again confirmed with bidirectional BLAST (BLASTx and tBLASTn, as necessary). When necessary, phylogenetically verified (see below) protein sequences were aligned with MUSCLE v3.7 (Edgar 2004) and used to create position specific scoring matrices (PSSMs) with the tBLASTn module (available at http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_T YPE=Download). Matrices were then used as queries with the PSI-BLAST module to search nucleotide genome sequence databases. When multiple sequences were found for a species, only the most complete was retained. If no previously annotated protein sequence was available in a database, then nucleotide sequences were annotated by hand, using Sequencher v4.5 (Genecodes, Ann Arbor, MI). Exons were identified on the basis of inferred translations using BLASTx pairwise comparisons to the NCBI protein sequence database and locations of putative intron splice donor and acceptor site sequences (e.g. G/GT to AG/G, although others may be observed among diverse eukaryotes). Additional comparisons of resulting amino acid sequences to other homologs were performed with alignments created using MUSCLE v3.7 (Edgar 2004) and observed with BioEdit v7.0.5.3 (Hall 1999). 142 Phylogenetic analyses We aligned all potential eukaryotic protein sequences with archaebacterial protein sequences using MUSCLE v3.7, manually edited them (removing ambiguously aligned columns and gaps) with BioEdit v7.0.5.3 (Hall 1999; Edgar 2004) and performed phylogenetic analyses on the set. Optimal protein substitution models and parameters were determined for each alignment independently with Modelgenerator v0.85 (Keane et al. 2006). Analyses were performed with RAxML v7.2.7 (Stamatakis, Hoover, and Rougemont 2008), for 1000 replicates at the CIPRES Science Gateway v3.0 (Miller et al. 2009). Inventory Assembly Genes were determined to be present in an organism when putative orthologs were discovered and identified with phylogenetic analyses. To determine the numbers of observed sequence absences attributable to failures of the sequence detection regimen, Smith-Waterman pairwise alignment scores (Homo sapiens versus Saccharomyces cerevisiae) were calculated with the PRSS/PRFX tool (http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=shuffle) (Table 2.2). The numbers of sequence detection failures expected for each protein, given its SmithWaterman score, were determined with a Poisson regression analysis of protein sequence data previously collected (Chapter 2) for ten RNA Polymerase I and three Replication Protein A subunits among diverse eukaryotes with completed genome sequences. 143 Figure 4.1: Distribution of 20 homologs that function during meiosis among 46 eukaryotes representing all eukaryotic supergroups. Cells filled in with color (Opisthokonta in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in eggplant, Excavata in brown, and Apusozoa in black) indicate the homolog was found and phylogenetically verified. Labels of proteins known to function only during meiosis in model organisms are blue. Shades of grey indicate the proportion of observed absences attributed to sequence detection failures, estimated from Smith-Waterman pairwise alignment scores (Saccharomyces cerevisiae versus Homo sapiens) (see Methods). Darker greys indicate the gene is not present in the genome sequence sampled while lighter greys indicate the gene may be present but was not detected. Black protein labels identify sequences discovered in all eukaryotes sampled. Asterisks identify completed genome sequences (>8.0x WGS coverage or sequenced end-toend). Ratio of number of undetected sequences expected to observed no failures 0.01‐0.10 observed 0.11‐0.20 0.21‐0.50 0.51‐1.00 > 1.00 Mer3 Msh6 Msh5 Msh4 Msh3 Msh2 Mlh3 Mlh2 Mlh1 Pms1 Mnd1 Hop2 Dmc1 Rad51 Spo11‐3 Spo11‐2 Spo11‐1 Rec8 Rad21 TAXA/PROTEIN METAMONADA Trichomonas vaginalis Giardia intestinalis* Spironucleus vortens DISCOBA Naegleria gruberi* Leishmania major/donovani Trypanosoma cruzi* ARCHAEPLASTIDA Arabidospsis thaliana* Oryza sativa Physcomitrella patens* Chlamydomonas reinhardti* Chlorella sp.* Ostreococcus tauri* Galdieria sulphuraria HAPTOPHYTA Emiliania huxleyi* STRAMENOPILA Thalassiosira pseudonana* Phaeodactylum triconutum* Fragilariopsis cylindrus Phytophthora ramorum/sojae* Aureococcus anophagefferens Blastocystis hominis ALVEOLATA Plasmodium vivax* Toxoplasma gondii Theileria parva/annulata* Cryptosporidium muris* Perkinsus marinus Paramecium tetraaurelia* RHIZARIA Bigelowiella natans AMOEBOZOA Entamoeba dispar Dictyostelium purpureum* Polysphondylium pallidum HOLOZOA Homo sapiens* Ciona* Nematostella vectensis Trichoplax adhaerens* Monosiga brevicollis* Salpingoeca rosetta* Capsaspora owczarzaki* FUNGI Saccharomyces cerevisiae* Aspergillus fumigatus* Ustilago maydis* Cryptococcus neoformans Laccaria bicolor* Coprinopsis cinerea* Mucor circinelloides* Batrachochytrium dendrobatidis* APUSOZOA Thecamonas trahens* Hop1 144 Spo11‐2 Spo11‐3 Rad51 Dmc1 Hop2 Mnd1 Pms1 Mlh1 Mlh2 Mlh3 Msh2 Msh3 Msh4 Msh5 Msh6 + + + + + + + + + + Mer3 Holliday‐junction resolution Spo11‐1 DNA strand exchange dsDNA break formation Rec8 LECA METAMONADA DISCOBA ARCHAEPLASTIDA CHROMALVEOLATA RHIZARIA AMOBOZOA OPISTHOKONTA APUSOZOA Rad21 TAXA/PROTEIN Hop1 Synaptonemal Complex Formation Sister‐chromatid cohesion 145 Figure 4.2: Presence of twenty homologs that function during meiosis in the last eukaryotic common ancestor (LECA) inferred by their distribution among eukaryotic supergroups. Cells filled in with color (Opisthokonta in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in eggplant, Excavata in brown, and Apusozoa in black) indicate the homolog was found and phylogenetically verified within that group. Labels of proteins known to function only during meiosis in model organisms are blue. Black plusses indicate the gene was most likely present in the LECA based upon its distribution among all eukaryotic supergroups, red plusses indicate the presence of the gene in the LECA on the basis of phylogenetic analyses. 146 Naegleria Trichomonas Giardia 20 Blastocystis 50 Theileria 240 Cryptosporidium 590 130 Plasmodium 510 580 Toxoplasma Bigelowiella Thecamonas 650 Oryza 930 Arabidopsis 500 Physcomitrella 70 860 Ostreococcus Chlorella 80 980 Chlamydomonas 990 Leishmania 150 110 Trypanosoma Emiliania Aspergillus 370 140 Saccharomyces 180 480 Cryptococcus Coprinus 950 130 980 Laccaria Capsaspora Batrachochytrium 580 Ciona 100 Homo Trichoplax 850 Nematostella Salpingoeca Monosiga 820 Arabidopsis 580 Oryza 500 Physcomitrella 280 Chlorella 90 Mucor Trichoplax 250 Homo 490 Nematostella 970 310 Coprinus Laccaria 430 Ustilago 490 Cryptococcus 140 Aspergillus 470 Saccharomyces 270 910 Polysphondylium Dictyostelium 320 Ciona Trypanosoma 10 340 990 Hop1 Rev7 0.5 substitutions/site Figure 4.3: Unrooted phylogenetic tree of 50 eukaryotic Hop1 and Rev7 homologs. Trees were estimated with maximum likelihood inference (LG+G; 1000 replicates) from 129 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 147 Figure 4.4: Unrooted phylogenetic tree of 49 eukaryotic Rad21 and Rec8 homologs. Trees were estimated with maximum likelihood inference (LG+G; 1000 replicates) from 171 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 148 Homo Nematostella Trichoplax 70 Capsaspora 20 20 Saccharomyces Cryptosporidium 120 Plasmodium 20 Monosiga 1000 Salpingoeca 530 Coprinus 250 Laccaria 190 Cryptococcus 10 310 Ustilago Batrachochytrium Aspergillus 30 140 260 Mucor Fragilariopsis Aureococcus 470 Phytophthora 100 Phaeodactylum 260 1 540 Thalassiosira 1 1000 Leishmania 270 Trypanosoma 140 Naegleria Entamoeba 50 970 Polysphondylium 10 Dictyostelium Trichomonas 220 Bigelowiella 80 Thecamonas Emiliania 160 Ostreococcus 540 Chlamydomonas Physcomitrella 320 Oryza 820 850 Arabidopsis Galdieria 610 Homo Ciona 370 860 Saccharomyces Aspergillus 320 Cryptococcus 230 Batrachochytrium 470 Coprinus 980 Laccaria 570 Ustilago 970 Arabidopsis Oryza 830 Ostreococcus 450 890 0.2 substitutions/site Rad21 Rec8 149 Figure 4.5: Unrooted phylogenetic tree of 69 eukaryotic Spo11-1, Spo11-2, and Spo11-3 homologs with 6 archaebacterial Top6A homologs. Trees were estimated with maximum likelihood inference (LG+G+I; 1000 replicates) from 170 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 150 670 270 200 230 Perkinsus Plasmodium Theileria Cryptosporidium Toxoplasma Bigelowiella Emiliania 300 Galdieria 1 390 Trypanosoma 410 Leishmania 990 Aureococcus Blastocystis 200 Phaeodactylum 490 Fragilariopsis 470 50 Phytophthora 150 Thalassiosira 320 850 Oryza 900 Arabidopsis 250 Physcomitrella 10 800 Chlorella Chlamydomonas 110 30 Ostreococcus Naegleria 360 Ustilago Trichomonas 1000 Trypanosoma 30 Leishmania 820 Coprinus 730 Laccaria 670 Cryptococcus 200 1 Aspergillus Mucor 570 Homo 420 Ciona 730 1 Nematostella 30 Trichoplax Capsaspora 30 Salpingoeca 270 Monosiga 870 Arabidopsis 740 1 Oryza 130 Physcomitrella 550 Giardia 1 90 Spironucleus 10 Paramecium 870 Cryptosporidium 10 Plasmodium 570 Theileria 90 Thecamonas 210 Batrachochytrium Saccharomyces Galdieria 60 140 Entamoeba Naegleria 170 Bigelowiella 20 Thecamonas Monosiga 50 600 Aureococcus 40 Capsaspora Emiliania 160 Thalassiosira 50 Fragilariopsis 890 650 Phaeodactylum 70 Ostreococcus 620 Chlorella 890 Chlamydomonas 160 Arabidopsis Physcomitrella 600 Oryza 990 Aeropyrum 860 Sulfolobus Pyrobaculum 1000 Methanosarcina Methanocaldococcus 990 Nanoarchaeum 30 980 0.2 substitutions/site Spo11-2 Spo11-1 Spo11-3 Top6A 151 420 460 980 Leishmania Trypanosoma Galdieria Emiliania Bigelowiella Toxoplasma Cryptosporidium 260 Perkinsus 10 170 Theileria 270 Plasmodium 680 Aureococcus Blastocystis 200 Phaeodactylum 490 Fragilariopsis 510 50 Phytophthora 170 Thalassiosira 350 900 Oryza 910 Arabidopsis Physcomitrella 200 10 Chlorella Chlamydomonas 70 770 Ostreococcus 20 Naegleria Leishmania 1000 Trypanosoma Ustilago 40 Trichomonas 360 Saccharomyces 120 Galdieria Batrachochytrium 840 Coprinus 760 Laccaria 660 Cryptococcus 20 Aspergillus Paramecium Mucor 90 830 Arabidopsis 730 Oryza 20 Physcomitrella 1 Cryptosporidium Theileria 270 Plasmodium 180 550 Homo 470 1 Ciona 760 Nematostella Trichoplax Capsaspora 1 Monosiga 1 Thecamonas 30 Salpingoeca 30 Giardia 200 Spironucleus 620 Naegleria Entamoeba 640 Fragilariopsis 900 Phaeodactylum 200 Thalassiosira Monosiga 90 Aureococcus 30 560 Capsaspora Thecamonas 20 190 Bigelowiella Emiliania 10 300 Chlamydomonas 30 640 Chlorella Physcomitrella 40 Arabidopsis Oryza Ostreococcus 350 40 1 60 1 960 Spo11-2 Spo11-1 Spo11-3 0.2 substitutions/site Figure 4.6: Unrooted phylogenetic tree of 69 eukaryotic Spo11-1, Spo11-2, and Spo11-3 homologs. Trees were estimated with maximum likelihood inference (LG+G+I; 1000 replicates) from 170 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 152 Figure 4.7: Unrooted phylogenetic tree of 81 eukaryotic Rad51 and Dmc1 homologs with 6 archaebacterial RadA homologs. Trees were estimated with maximum likelihood inference (LG+G; 1000 replicates) from 305 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 153 620 130 1000 Homo Ciona Trichoplax 570 Nematostella 1000 Laccaria 690 Coprinus 510 230 Cryptococcus 460 Ustilago Mucor 170 250 Aspergillus 540 Saccharomyces Batrachochytrium 130 40 950 Monosiga Salpingoeca 220 Capsaspora 100 Thecamonas Paramecium 920 Plasmodium 20 490 70 Toxoplasma 1000 Cryptosporidium Theileria 460 Trichomonas 20 370 Naegleria Leishmania 1000 10 Trypanosoma 40 Galdieria Entamoeba 30 Dictyostelium 100 1000 Polysphondylium Bigelowiella 280 Emiliania Physcomitrella Arabidopsis 1000 140 920 Oryza 750 Chlorella 930 Chlamydomonas Ostreococcus 160 420 Aureococcus Thalassiosira 1000 990 Fragilariopsis 1000 520 Phaeodactylum Blastocystis Phytophthora 620 Homo 940 Nematostella 900 Ciona 790 Trichoplax 770 Capsaspora Batrachochytrium Mucor 670 990 Saccharomyces 730 Aspergillus 30 Cryptococcus 560 Laccaria 900 1000 Coprinus Perkinsus 20 860 Plasmodium 700 Toxoplasma 980 Cryptosporidium 640 Theileria 10 Physcomitrella Oryza 990 890 Arabidopsis Naegleria 1000 Trypanosoma 30 30 Leishmania Trichomonas 450 10 Giardia 990 Spironucleus 70 30 Blastocystis Entamoeba 60 Galdieria 50 320 Bigelowiella 180 Thecamonas Phytophthora Ostreococcus Chlorella 230 810 740 Chlamydomonas Emiliania Salpingoeca 980 960 Monosiga 710 Aeropyrum 920 Nanoarchaeum 550 Methanocaldococcus Pyrobaculum 550 Candidatus Nitrosopumilus 1000 Cenarchaeium 0.1 substitutions/site Rad51 Dmc1 RadA 154 1000 Laccaria Coprinus Cryptococcus 490 Ustilago Mucor 280 Saccharomyces 210 490 Aspergillus 40 Batrachochytrium Capsaspora 220 Salpingoeca 280 930 Monosiga Nematostella 590 Trichoplax 160 Ciona 660 Homo 130 920 Plasmodium 510 Toxoplasma 1000 Cryptosporidium 470 Theileria 80 20 Trichomonas Thecamonas 90 Paramecium 1000 Trypanosoma 20 380 Leishmania Naegleria 20 40 Galdieria Entamoeba 40 Polysphondylium 100 990 Dictyostelium Bigelowiella 250 Emiliania Physcomitrella Arabidopsis 1000 90 910 Oryza 750 Chlamydomonas 920 Chlorella Ostreococcus 1000 400 Aureococcus Thalassiosira 1000 Fragilariopsis 1000 560 Phaeodactylum 380 Phytophthora Blastocystis 650 Homo 910 Nematostella 900 Ciona 740 Trichoplax Capsaspora 1000 Coprinus 900 680 Laccaria 520 Cryptococcus Aspergillus 700 980 Saccharomyces 30 Mucor 670 Batrachochytrium Physcomitrella Arabidopsis 10 990 890 Oryza 860 Plasmodium 980 Toxoplasma Cryptosporidium 1 680 Theileria 680 Perkinsus Naegleria 1000 Leishmania Trypanosoma 20 20 Trichomonas 460 10 Spironucleus 1000 Giardia 70 10 Blastocystis Entamoeba Bigelowiella 50 40 Galdieria 290 Phytophthora 190 Thecamonas Ostreococcus Chlamydomonas 220 800 Chlorella Emiliania Monosiga 970 940 Salpingoeca 680 510 Rad51 Dmc1 0.1 substitutions/site Figure 4.8: Unrooted phylogenetic tree of 81 eukaryotic Rad51 and Dmc1 homologs with. Trees were estimated with maximum likelihood inference (LG+G; 1000 replicates) from 305 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 155 Mucor Batrachochytrium Saccharomyces Capsaspora Cryptococcus Laccaria 920 980 Coprinus Nematostella Ciona Trichoplax Homo Salpingoeca Monosiga Naegleria Entamoeba Phytophthora Trichomonas Bigelowiella Toxoplasma Emiliania Cryptosporidium Perkinsus Plasmodium 360 530 290 140 70 60 260 730 20 280 880 80 210 40 490 330 80 80 210 180 130 500 Blastocystis 680 140 10 70 760 50 190 240 460 1000 830 700 Paramecium Thecamonas Chlorella Chlamydomonas Galdieria Polysphondylium Dictyostelium Ostreococcus Physcomitrella Arabidopsis Oryza Spironucleus Giardia Trypanosoma 990 1 30 70 350 Leishmania Leishmania Trypanosoma Naegleria Coprinus 180 930 Laccaria 80 Bigelowiella 20 Trichomonas Dictyostelium 1 990 Polysphondylium 170 Paramecium Giardia 1 1 Emiliania 140 Entamoeba 1 Cryptococcus 340 Batrachochytrium Nematostella Monosiga 130 50 Trichoplax 10 Ciona 1 220 Homo Saccharomyces 190 210 Plasmodium Theileria 120 320 Perkinsus 1 250 Toxoplasma Cryptosporidium 20 Blastocystis 10 240 Galdieria Thecamonas 400 Capsaspora Phytophthora 10 Aureococcus 740 Phaeodactylum 850 Fragilariopsis 950 660 Thalassiosira Mucor Physcomitrella Oryza 900 690 Arabidopsis Chlorella Chlamydomonas Aspergillus Ostreococcus 250 150 Hop2 970 Mnd1 0.2 substitutions/site Figure 4.9: Unrooted phylogenetic tree of 82 eukaryotic Hop2 and Mnd1 homologs. Trees were estimated with maximum likelihood inference (LG+G; 1000 replicates) from 98 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 156 Figure 4.10: Unrooted phylogenetic tree of 131 eukaryotic Mlh1, Mlh2, Mlh3, and Pms1 homologs with 4 archaebacterial MutL homologs. Trees were estimated with maximum likelihood inference (LG+G+F; 1000 replicates) from 185 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. The best RAxML v7.2.7 tree is shown. 157 Homo Trichoplax Nematostella Ciona Capsaspora 230 40 Monosiga 270 840 Salpingoeca Batrachochytrium 70 890 Polysphondylium 100 Dictyostelium Mucor Laccaria 660 970 Coprinus 310 Cryptococcus Saccharomyces 280 10 880 Aspergillus 210 Thecamonas 30 Ustilago Bigelowiella 990 Arabidopsis 760 Oryza 350 Physcomitrella Ostreococcus 20 10 Chlorella 20 230 760 Chlamydomonas Emiliania Perkinsus 10 Toxoplasma 10 50 10 Trypanosoma 220 990 Leishmania Aureococcus Phytophthora 450 Trichomonas 30 Naegleria Plasmodium 40 Theileria 430 370 Cryptosporidium Thalassiosira Fragilariopsis 980 970 Phaeodactylum Paramecium 310 Blastocystis Galdieria Entamoeba Spironucleus Giardia 550 Trichoplax 400 Ciona 510 Homo 420 Nematostella 390 Aspergillus Mucor Saccharomyces Thecamonas 470 440 Trichomonas Giardia 380 650 250 50 110 510 290 110 590 520 360 560 Monosiga 680 300 250 90 120 130 570 80 990 300 10 120 860 110 370 0.5 substitutions/site Mlh2 Physcomitrella Ostreococcus Arabidopsis Oryza Capsaspora Polysphondylium 70 980 Dictyostelium Ciona Trichoplax 350 20 Homo 330 Nematostella 350 Mucor Batrachochytrium 380 Aspergillus 60 10 Saccharomyces Ustilago 190 Cryptococcus 540 Laccaria 760 90 920 Coprinus Phytophthora 190 Emiliania Thecamonas 440 Aureococcus 310 1 Galdieria Thalassiosira 130 Phaeodactylum 980 160 820 Fragilariopsis Salpingoeca 70 820 Monosiga Naegleria Perkinsus 140 Theileria Toxoplasma 40 Plasmodium 160 Cryptosporidium Leishmania 1000 Trypanosoma Paramecium Entamoeba Blastocystis Trichomonas Giardia 920 Spironucleus Homo 300 Ciona 230 Capsaspora 20 Nematostella Thecamonas Phytophthora 80 20 Naegleria 100 270 Ostreococcus 310 Physcomitrella Emiliania 960 Arabidopsis 330 Oryza 10 180 Chlorella 280 Chlamydomonas 140 Aspergillus Trichomonas 20 Cryptococcus Monosiga 870 Dictyostelium 1 Polysphondylium 50 Thalassiosira 90 Salpingoeca 150 Batrachochytrium Mucor Trichoplax 90 Laccaria 110 700 Coprinus Saccharomyces Ustilago Bigelowiella 270 Entamoeba Aciduliprofundum Methanosarcina Halogeometricum 1000 Natrialba 170 20 Mlh1 870 Pms1 Mlh3 MutL 158 Homo Trichoplax Nematostella Ciona Capsaspora 40 230 Monosiga 280 Salpingoeca 830 70 Batrachochytrium 880 Polysphondylium 90 Dictyostelium Mucor 960 Coprinus 640 Laccaria 280 Cryptococcus 300 Saccharomyces Aspergillus 890 210 Thecamonas 40 Ustilago Bigelowiella 990 Trypanosoma 190 Leishmania 60 Toxoplasma 10 Perkinsus 1 Emiliania 20 760 Chlamydomonas 240 Chlorella Ostreococcus Physcomitrella 340 10 Arabidopsis 760 980 Oryza Aureococcus Phytophthora 960 Phaeodactylum 960 Fragilariopsis Thalassiosira Plasmodium 380 660 240 10 20 10 130 30 410 380 Blastocystis 690 40 50 160 Giardia 150 300 410 Naegleria Trichomonas 490 570 420 400 510 Mucor 310 450 880 360 180 80 900 150 40 30 150 20 110 200 20 70 130 980 100 310 220 300 960 310 10 180 270 Salpingoeca 50 870 730 60 280 130 Ustilago Pms1 Giardia Emiliania Chlorella Chlamydomonas Aspergillus Trichomonas Cryptococcus Monosiga 70 40 Arabidopsis Oryza 110 20 1 Ostreococcus Naegleria 280 Phytophthora Thecamonas Nematostella Homo Ciona Capsaspora Physcomitrella 30 10 Arabidopsis Oryza Physcomitrella Mlh2 690 940 90 Homo Trichoplax Ciona Giardia Ostreococcus Capsaspora Dictyostelium Polysphondylium Ciona Trichoplax 350 30 Homo 300 Nematostella 370 Mucor Batrachochytrium 410 Saccharomyces 70 20 Aspergillus Ustilago 210 Cryptococcus 560 Laccaria 740 100 Coprinus 900 Phytophthora Emiliania 220 Thecamonas 410 Galdieria 280 20 Aureococcus Thalassiosira 100 Phaeodactylum 980 Fragilariopsis 820 Salpingoeca Monosiga 800 Naegleria Perkinsus Plasmodium Cryptosporidium Theileria Toxoplasma 1000 Leishmania Trypanosoma Entamoeba Paramecium Blastocystis Trichomonas Spironucleus 400 30 30 Nematostella Spironucleus Saccharomyces Trichomonas Thecamonas 460 Monosiga 480 Aspergillus Theileria Galdieria Paramecium Entamoeba 440 790 540 Cryptosporidium Mlh1 Thalassiosira Mlh3 Polysphondylium Dictyostelium Coprinus Laccaria Batrachochytrium Mucor Trichoplax Entamoeba Saccharomyces Bigelowiella 0.2 substitutions/site Figure 4.11: Unrooted phylogenetic tree of 131 eukaryotic Mlh1, Mlh2, Mlh3, and Pms1 homologs. Trees were estimated with maximum likelihood inference (LG+G+F; 1000 replicates) from 185 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. The best RAxML v7.2.7 tree is shown. 159 Figure 4.12: Unrooted phylogenetic tree of 113 eukaryotic Mer3, Brr2, and Slh1 homologs with 6 archaebacterial Ski2 homologs. Trees were estimated with maximum likelihood inference (LG+G+I; 1000 replicates) from 338 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 160 500 280 Emiliania 390 Homo Capsaspora Chlamydomonas Chlorella Physcomitrella Oryza Arabidopsis 1000 110 1000 Laccaria 480 Coprinus 770 Ustilago 480 Cryptococcus Mucor 90 Aspergillus 430 Saccharomyces 200 550 Batrachochytrium Blastocystis 70 70 930 Monosiga Salpingoeca Bigelowiella 300 Trichomonas 500 Ostreococcus Thalassiosira 170 Fragilariopsis 1000 Phaeodactylum 820 Naegleria Trypanosoma 530 Leishmania 1000 Giardia Spironucleus 1000 900 Homo 520 Ciona 660 Nematostella 690 Trichoplax Monosiga 310 Salpingoeca 990 Emiliania 180 870 Oryza 710 Arabidopsis Physcomitrella Galdieria 340 Chlorella 240 Chlamydomonas 900 150 Thecamonas 50 Ustilago 50 30 Aspergillus Cryptococcus 40 Coprinus 550 Laccaria 900 50 Mucor Batrachochytrium 50 60 Paramecium Saccharomyces 400 Blastocystis Phytophthora 10 710 Phaeodactylum 110 1000 Fragilariopsis 780 Thalassiosira 30 Aureococcus 360 Plasmodium 380 Theileria 210 20 Cryptosporidium 220 Toxoplasma 510 Perkinsus 30 Trypanosoma Leishmania 1000 Naegleria 70 Polysphondylium 430 Dictyostelium 1000 140 Bigelowiella Entamoeba Galdieria 290 Homo 370 Trichoplax 190 Nematostella 540 Ciona Monosiga 580 Salpingoeca 1000 Capsaspora 580 640 Batrachochytrium 60 Mucor 550 Saccharomyces 990 Aspergillus 230 Ustilago 310 Cryptococcus 550 Coprinus 840 160 Laccaria 990 Thecamonas 660 Chlorella Ostreococcus 50 Physcomitrella 110 880 Oryza 1000 Arabidopsis 1000 Naegleria 120 Emiliania 310 Blastocystis Phytophthora 410 Aureococcus 70 820 Thalassiosira 980 Phaeodactylum 1000 Fragilariopsis 780 1000 Dictyostelium Polysphondylium 940 Theileria 360 490 Plasmodium Cryptosporidium 80 Paramecium Trichomonas 150 1000 1000 Leishmania Trypanosoma Spironucleus 100 96 Giardia Entamoeba 850 Haloarcula 1000 Natronomonas Halobacterium Pyrococcus 1000 460 Sulfolobus Methanosarcina 990 970 1000 460 700 980 0.1 substitutions/site Mer3 Brr2 Slh1 Ski2 161 Figure 4.13: Unrooted phylogenetic tree of 113 eukaryotic Mer3, Brr2, and Slh1 homologs. Trees were estimated with maximum likelihood inference (LG+G+I; 1000 replicates) from 338 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 162 Coprinus Laccaria Ustilago Cryptococcus Mucor Aspergillus Saccharomyces Batrachochytrium Capsaspora 1000 440 740 480 570 460 130 530 540 430 Emiliania 960 1000 480 80 310 950 280 110 1000 770 410 510 1000 Homo 990 280 40 1000 Blastocystis Chlorella Chlamydomonas Physcomitrella Oryza Arabidopsis 1000 Trichomonas Bigelowiella Monosiga Salpingoeca Ostreococcus Thalassiosira Fragilariopsis Phaeodactylum Naegleria Leishmania Trypanosoma 1000 330 990 910 240 330 720 1 890 790 540 10 60 30 150 100 180 490 310 530 0.1 substitutions/site 1000 180 Blastocystis Trypanosoma Leishmania Paramecium Saccharomyces Batrachochytrium Mucor 120 300 Ustilago 330 Aspergillus Cryptococcus 330 120 Laccaria 540 Coprinus 900 Naegleria Dictyostelium 420 Polysphondylium 1000 Thecamonas 530 270 Perkinsus 170 Toxoplasma Cryptosporidium 220 Theileria 390 Plasmodium 370 150 Entamoeba Bigelowiella 290 Homo 370 Trichoplax 170 Nematostella 540 Ciona Salpingoeca 580 Monosiga 1000 Capsaspora 990 Laccaria 840 600 Coprinus 560 Cryptococcus 300 Ustilago Aspergillus 240 Saccharomyces 550 980 Batrachochytrium Mucor 640 90 Thecamonas 330 Naegleria Emiliania 990 Oryza 1000 Arabidopsis 890 Physcomitrella 20 Chlorella 66 Ostreococcus 1000 Trypanosoma 20 Leishmania Blastocystis 90 Phytophthora 400 Aureococcus 820 Thalassiosira 970 Phaeodactylum 1000 Fragilariopsis 790 1000 Polysphondylium Dictyostelium Galdieria Cryptosporidium Theileria Plasmodium 930 Trichomonas 960 Spironucleus Giardia Paramecium Entamoeba 10 40 Giardia Spironucleus 910 550 680 700 180 Homo Ciona Nematostella Trichoplax Monosiga Salpingoeca Emiliania Chlamydomonas Chlorella Galdieria Physcomitrella Oryza Arabidopsis 710 Phaeodactylum 1000 Fragilariopsis Thalassiosira Aureococcus Phytophthora 60 Mer3 430 Brr2 Slh1 163 Figure 4.14: Unrooted phylogenetic tree of 183 eukaryotic Msh2, Msh3, Msh4, Msh5, and Msh6 homologs with 5 archaebacterial MutS homologs. Trees were estimated with maximum likelihood inference (LG+G+F; 1000 replicates) from 259 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 164 720 480 770 730 490 180 110 220 20 180 110 220 680 150 970 Homo Nematostella Ciona Trichoplax Monosiga Salpingoeca 1000 Capsaspora Bigelowiella Batrachochytrium Aspergillus Saccharomyces Mucor Ustilago Cryptococcus Laccaria 990 Coprinus 1000 Thecamonas Chlamydomonas Chlorella 1000 460 870 40 40 70 20 1000 40 1000 Trypanosoma Naegleria Trichomonas 300 140 1000 Msh4 Leishmania Galdieria Entamoeba Emiliania Polysphondylium Dictyostelium Thalassiosira 1000 Fragilariopsis Phaeodactylum 620 Physcomitrella Oryza Arabidopsis 450 250 510 460 330 10 30 220 30 120 90 1 Homo Trichoplax Ciona Nematostella Capsaspora Salpingoeca Monosiga Thecamonas Emiliania 910 Thalassiosira 1000 Phaeodactylum Fragilariopsis 550 Bigelowiella Ostreococcus Entamoeba 970 Polysphondylium Dictyostelium Saccharomyces Trichomonas 420 390 380 30 20 240 Naegleria 990 Galdieria Chlamydomonas Chlorella Physcomitrella 890 200 1000 Arabidopsis 920 Oryza Aspergillus Mucor 1000 330 Ustilago Cryptococcus 60 1000 Coprinus 510 Laccaria Batrachochytrium 570 Homo 390 Nematostella Ciona 460 Trichoplax 320 480 Salpingoeca Monosiga 190 910 Capsaspora 160 Batrachochytrium Mucor 360 580 Saccharomyces Aspergillus 510 Cryptococcus 240 480 Ustilago Laccaria 530 60 Coprinus 820 Thecamonas Dictyostelium 1 Polysphondylium 1000 Bigelowiella Trichomonas Entamoeba 10 110 Paramecium 10 Naegleria 20 340 Spironucleus 130 Giardia 20 Cryptosporidium 10 Perkinsus 120 Toxoplasma 220 Theileria 330 Plasmodium 550 970 Fragilariopsis 950 Phaeodactylum 830 Thalassiosira 610 Phytophthora 250 Aureococcus 950 Oryza 960 700 Arabidopsis Physcomitrella Ostreococcus 580 250 680 Chlorella 870 Chlamydomonas Emiliania Galdieria 120 980 Trypanosoma 80 Leishmania Bigelowiella Blastocystis 970 Trypanosoma 410 Leishmania Thecamonas 210 Phytophthora 250 130 Monosiga Polysphondylium 200 Dictyostelium 990 Capsaspora Salpingoeca 380 290 Homo 300 Nematostella 870 370 Ostreococcus Chlamydomonas Chlorella 630 470 Physcomitrella 850 Oryza 630 Arabidopsis 930 590 Mucor 820 Aspergillus Ustilago 750 Cryptococcus 730 Laccaria 750 Coprinus 1000 Saccharomyces 760 Homo 900 Nematostella 910 Trichoplax Ciona 50 680 Arabidopsis Oryza Ostreococcus 500 Chlorella 220 Chlamydomonas 780 130 Physcomitrella 30 100 Naegleria Trichomonas 30 Entamoeba 420 Galdieria 980 Trypanosoma 430 Leishmania 930 Spironucleus 100 20 Giardia Thecamonas 920 Salpingoeca 310 Monosiga 20 110 Capsaspora 100 Emiliania 990 Polysphondylium 110 Dictyostelium Mucor 250 Batrachochytrium 990 Saccharomyces 580 Aspergillus 520 Ustilago 900 Cryptococcus 950 Laccaria 990 440 Coprinus 1000 Phytophthora Bigelowiella 170 Aureococcus 100 Fragilariopsis 960 510 510 Thalassiosira Phaeodactylum Blastocystis Plasmodium 850 320 490 Perkinsus Toxoplasma Methanosaeta 980 Candidatus Methanoculleus Methanohalophilus Methanohalobium 30 Trypanosoma Leishmania Msh5 810 120 990 410 490 450 470 950 0.2 substitutions/site Msh2 Msh3 Msh6 MutS 165 710 Homo Nematostella Ciona Trichoplax 1000 Salpingoeca Monosiga Capsaspora Batrachochytrium Mucor 260 240 Saccharomyces 110 Aspergillus 190 Ustilago 280 Cryptococcus 660 Laccaria 30 990 Coprinus 1000 Bigelowiella Thecamonas Trypanosoma Leishmania 160 1000 840 970 Chlorella 490 Chlamydomonas 1000 Oryza 1000 Arabidopsis Physcomitrella Naegleria 570 40 Fragilariopsis 1000 Phaeodactylum 60 Thalassiosira Emiliania 20 990 Polysphondylium Dictyostelium 70 Entamoeba 40 Trichomonas 130 Galdieria 300 Homo 480 410 Ciona 400 Trichoplax 270 Nematostella 360 Capsaspora Salpingoeca 40 Monosiga 510 Thecamonas 10 Emiliania 330 Naegleria 900 Phaeodactylum 1000 Thalassiosira Fragilariopsis 10 550 Ostreococcus 210 Bigelowiella 30 Entamoeba 90 970 Polysphondylium 120 Dictyostelium 30 Saccharomyces Trichomonas 20 Trypanosoma 230 99 Leishmania 30 Galdieria 790 Chlorella Chlamydomonas 110 Physcomitrella 1000 880 Arabidopsis 920 Oryza 1000 Aspergillus Batrachochytrium Mucor 90 Ustilago 60 Cryptococcus 530 Laccaria 520 Coprinus 1000 590 Homo 400 Nematostella Ciona 490 Trichoplax 320 500 Monosiga 190 Salpingoeca 920 Capsaspora 160 Batrachochytrium Mucor 360 580 Aspergillus Saccharomyces 520 Ustilago 230 470 Cryptococcus Coprinus 510 60 Laccaria 820 Thecamonas Polysphondylium Dictyostelium 1000 880 Chlorella 30 700 Chlamydomonas 600 Ostreococcus Physcomitrella Arabidopsis 960 Oryza 950 Galdieria 10 80 600 30 Aureococcus Phytophthora Thalassiosira 830 20 Phaeodactylum 10 950 Fragilariopsis 970 Emiliania 10 Bigelowiella Trichomonas Naegleria 70 Trypanosoma 110 Leishmania 990 Paramecium 280 Spironucleus 100 Giardia 180 50 Cryptosporidium Perkinsus 900 Toxoplasma 170 200 Theileria 310 Plasmodium 510 300 Entamoeba Bigelowiella Blastocystis 880 Homo 340 Nematostella 100 Capsaspora Salpingoeca Dictyostelium 130 Polysphondylium 990 310 Leishmania 320 Trypanosoma 980 430 Phytophthora 210 Thecamonas Monosiga 370 Chlamydomonas 80 620 Ostreococcus Chlorella 470 Physcomitrella 840 Arabidopsis 650 Oryza 930 570 Aspergillus 810 Mucor Ustilago 760 Cryptococcus 730 Coprinus 760 Laccaria 1000 Saccharomyces 780 Homo 910 Nematostella 910 Trichoplax 60 Ciona Ostreococcus 690 Oryza Arabidopsis 550 Chlorella 180 Chlamydomonas 800 120 20 Naegleria 90 Physcomitrella 410 40 Entamoeba 410 Trichomonas Galdieria Giardia 340 Spironucleus 890 Trypanosoma 80 20 Leishmania 970 Thecamonas 910 Salpingoeca 300 Monosiga 10 120 Capsaspora 20 Emiliania 990 Dictyostelium 110 Polysphondylium Mucor 250 Batrachochytrium 990 570 Aspergillus Saccharomyces 520 Ustilago 350 910 Cryptococcus 940 Laccaria 990 Coprinus Blastocystis1000 Plasmodium 320 Toxoplasma 860 Perkinsus 480 Bigelowiella Phytophthora 170 Aureococcus 200 Fragilariopsis 520 Thalassiosira 970 Phaeodactylum 540 480 770 730 180 720 480 Msh4 Msh5 Msh2 Msh3 Msh6 0.5 substitutions/site Figure 4.15: Unrooted phylogenetic tree of 183 eukaryotic Msh2, Msh3, Msh4, Msh5, and Msh6 homologs. Trees were estimated with maximum likelihood inference (LG+G+F; 1000 replicates) from 259 aligned amino acids. Opisthokonta are labeled with purple, Amoebozoa with blue, Archaeplastida with green, Chromalveolata with orange, Rhizaria with eggplant, Excavata with brown, and Apusozoa with black. Labels of proteins known to function only during meiosis in model organisms are blue. The best RAxML v7.2.7 tree is shown. 166 A B D E C D E E Figure 4.16: Radial tree topologies of archaebacterial and eukaryotic homologs. Phylogenetic analyses were performed using a maximum likelihood approach on protein sequences of eukaryotic homologs encoding products that function during meiosis with paralogs and archaebacterial homologs. Blue bubbles indicate proteins known to function only during meiosis in model organisms and pink bubbles indicate proteins that function during meiosis, mitosis, and/or DNA mismatch repair. Branch colors of eukaryotes correspond to supergroups (Opisthokonta in purple, Amoebozoa in blue, Archaeplastida in green, Chromalveolata in orange, Rhizaria in eggplant, Excavata in brown, and Apusozoa in black). Letters indicate that at least one of the proteins in the tree is important for synaptonemal complex formation (A), sister chromatid cohesion (B), double-strand breaks (C), DNA strand exchange (D), or Holliday junction resolution (E). 167 Table 4.1: Proteins involved in four general categories of meiosis and their functions. Category Protein Function Binds discrete sites on axial elements and promotes synapsis between homologous DNA duplexes during Synaptonemal meiotic prophase I (Anuradha and Muniyappa 2004a; complex Anuradha and Muniyappa 2004b; Latypov et al.). Hop1 formation and Hop1 is a paralog of Rev7, a gene encoding an accessory subunit pairing of of DNA polymerase zeta that is involved in translesion homologous synthesis during post-replication repair and dsDNA break chromosomes repair (Acharya et al. 2005; Kolas and Durocher 2006; Lee and Myung 2008). Members of cohesin complexes that bind Smc1/Smc3 heterodimers, forming large rings around chromosomes Sister-chromatid Rad21 during S-phase. Proteolytic cleavage by the separase cohesion Rec8 triggers sister-chromatid disjunction during Mitotic Anaphase (Rad21) or Meiotic Anaphase II (Rec8) (Gruber, Haering, and Nasmyth 2003). Form dimers that cut dsDNA, generating 5'-nucleoprotein linkages on either side of the break that may become sites of recombination. Monomers are removed from the ends oligonucleotide-bound complexes, leaving ssDNA tails. Spo11-1 as Double-strand Spo11-1 and -2 function only during meiosis. In plants, Spo11-2 Spo11-3 functions DNA breaks during vegetative growth (Lin and Spo11-3 Smith 1994; Keeney, Giroux, and Kleckner 1997; Dernburg et al. 1998; Baudat and Keeney 2001; Hartung et al. 2002; Sugimoto-Shirasu et al. 2002; Szekvolgyi and Nicolas). Both form helical filaments on ss- and ds-DNA, catalyze strand exchange, and cause ssDNA extension and dsDNA rotational transition during Mitotic prophase (Rad51) and Rad51 meiotic prophase I (Rad51/Dmc1). Rad51 may recruit Dmc1 Dmc1 to the pre-synaptic filament (Nishinaka et al. 1998; DNA Krogh and Symington 2004; Lopez-Casamichana et al. strand exchange 2008). Together, they form heterodimers that stabilize Dmc1Hop2 ssDNA pre-synaptic filaments and stabilize dsDNA Mnd1 during DNA strand exchange (Chen et al. 2004; Henry et al. 2006). Note: names of proteins known to function only during meiosis are bolded 168 Table 4.1: Proteins involved in four general categories of meiosis and their functions. Names of proteins known to function only during meiosis are bolded. – continued Mlh1-3 Pms1 Holliday junction resolution Msh2,3,6 4,5 Mer3 Mlh1 forms heterodimers with Pms1, Mlh2, and Mlh3. Mlh1-Pms1 functions during mismatch repair, interacting with Msh2/3 and Msh2/6 heterodimers. During meiosis Mlh1/2 and Mlh1/3 function to resolve heteroduplexes (Hunter and Borts 1997; Borts, Chambers, and Abdullah 2000; Hoffmann et al. 2003). Form heterodimer sliding clamps that may diffuse along duplex DNA adjacent to mismatches (Msh2/3 or Msh2/6), marking the location of the lesion and signaling downstream machinery. During meiosis Msh4/5 form clamps that bind Holliday junctions and Msh2/6 (Snowden et al. 2004). A helicase with roles in synaptonemal complex formation, crossover interference, and unwinding of Holliday junctions during meiosis (Bishop and Zickler 2004; Borner, Kleckner, and Hunter 2004; Sugawara et al. 2009; Wang et al. 2009). Mer3 is paralogous to Slh1 that encodes a putative RNA helicase involved in translation inhibition of non-poly (A) mRNAS and is required suppressing dsRNA viruses and Brr2 that encodes an RNA helicase required for activation of spliceosomal catalysis (Noble and Guthrie 1996; de la Cruz, Kressler, and Linder 1999; Searfoss, Dever, and Wickner 2001). 169 Figure 4.17: Number of detection failures as predicted by Poisson regression analysis of RNA Polymerase I and Replication Protein A subunits with observed numbers of detection failures for 18 meiotic genes. The numbers of observed meiotic gene detection failures (indicated with open circles) are plotted against the natural logarithm of Smith-Waterman pairwise alignment scores of Homo sapiens and Saccharomyces cerevisiae. Poisson regression analyses were performed on the observed numbers of failures to detect RNA Polymerase I subunits (A190, A135, AC40, AC19, AC12.2, Rpb5, Rpb6, Rpb8, Rpb10, and Rpb12) and Replication Protein A subunits (RPA1-3) among 34 taxa with at least 8.0X whole-genome shotgun sequencing coverage relative to Smith-Waterman scores (Methods). The predicted numbers of failures relative to Smith-Waterman scores (black dots) were plotted with Wald 90% confidence limits (green dots). Shades of grey indicate the proportion of observed absences attributed to sequence detection failures, estimated from Smith-Waterman pairwise alignment scores (S. cerevisiae versus H. sapiens) (see Methods). Darker greys indicate the gene is not present in the genome sequence sampled while lighter greys indicate the gene may be present but was not detected. Black labels identify sequences discovered in all eukaryotes sampled. 170 Table 4.2: Observed numbers of sequence absences from 46 genomes, SmithWaterman pairwise alignment scores, predicted numbers of absences, and the proportion of observed absences likely due to detection failures for 20 proteins that function during meiosis. Smith# Predicted Ratio (Exp. Waterman absences /Obs.) alignment score Hop1 14 261 9.24 0.66 Rad21 8 225 10.26 1.28 Rec8 34 102 14.66 0.43 Spo11-1 15 233 10.03 0.67 Spo11-2 23 220 10.41 0.45 Spo11-3 31 480 4.90 0.16 Rad51 2 1538 0.23 0.11 Dmc1 8 1178 0.65 0.08 Hop2 7 161 12.35 1.76 Mnd1 3 328 7.61 2.54 Pms1 3 1544 0.22 0.07 Mlh1 0 1664 0.16 0.00 Mlh2 35 405 6.09 0.17 Mlh3 15 427 5.71 0.38 Msh2 0 2285 0.03 0.00 Msh3 22 1534 0.23 0.01 Msh4 12 1175 0.65 0.05 Msh5 11 868 0.59 0.14 Msh6 3 1780 0.11 0.04 Mer3 16 1572 0.21 0.01 Note: Smith-Waterman alignment scores were calculated from pairwise alignments of Saccharomyces cerevisiae and Homo sapiens protein sequences and used to determine the numbers of absences predicted due to detection failures. The ratio of expected and observed values indicates the proportion of observed absences that are likely due to sequence detection failures. Protein names in bold indicate meiosis-specific function. Protein # Observed absences 171 Table 4.3: Genome sequence databases searched with web address and references Trichomonas vaginalis Giardia intestinalis TrichDB GiardiaDB Spironucleus vortens JGI Naegleria gruberi JGI Leishmania and Trypanosoma TriTrypDB Physcomitrella patens JGI Chlamydomonas reinhardtii JGI Chlorella JGI Ostreococcus tauri JGI Galdieria sulphuraria The Galdieria suplphuraria Genome Project Emiliania huxleyi JGI Thalassiosira pseudonana JGI Phaeodactylum tricornutum JGI Fragilariopsis cylindrus JGI Phytophthora ramorum JGI P. sojae JGI Aureococcus anophagefferens JGI http://trichdb.org/trichdb/ http://giardiadb.org/giardiadb/ http://genome.jgipsf.org/Spivo0/Spivo0.home. html http://genome.jgipsf.org/Naegr1/Naegr1.home. html (Aurrecoechea et al. 2009a) (Aurrecoechea et al. 2009a) (Aslett et al. 2010) http://genome.jgi(Rensing et al. psf.org/physcomitrella/physco 2008) mitrella.home.html http://genome.jgi(Merchant et al. psf.org/chlamy/chlamy.home. 2007) html http://genome.jgipsf.org/ChlNC64A_1/ChlNC 64A_1.home.html http://genome.jgi(Palenik et al. psf.org/Ostta4/Ostta4.home.ht 2007) ml (Barbier et al. http://genomics.msu.edu/cgi- 2005) bin/galdieria/blast.cgi http://tritrypdb.org/tritrypdb/ http://genome.jgipsf.org/Emihu1/Emihu1.home .html http://genome.jgipsf.org/Thaps3/Thaps3.home. html http://genome.jgipsf.org/Phatr2/Phatr2.home.ht ml http://genome.jgipsf.org/Fracy1/Fracy1.home.h tml http://genome.jgipsf.org/Phyra1_1/Phyra1_1.h ome.html http://genome.jgipsf.org/Physo1_1/Physo1_1.h ome.html http://genome.jgipsf.org/Auran1/Auran1.home. html (Armbrust et al. 2004) (Bowler et al. 2008) (Tyler et al. 2006) (Tyler et al. 2006) 172 Table 4.3: Genome sequence databases searched with web address and references. Continued Plasmodium vivax PlasmoDB Toxoplasma gondii Cryptosporidium muris Paramecium tetraurelia ToxoDB http://cryptodb.org/cryptodb/ Paramecium DB http://paramecium.cgm.cnrsgif.fr/ http://genome.jgipsf.org/Dicpu1/Dicpu1.home. html http://genome.jgipsf.org/Cioin2/Cioin2.home.h tml http://genome.jgipsf.org/Nemve1/Nemve1.hom e.html http://genome.jgipsf.org/Triad1/Triad1.home.ht ml http://genome.jgipsf.org/Monbr1/Monbr1.hom e.html http://www.broadinstitute.org/ annotation/genome/multicellu larity_project/GenomeDescrip tions.html#%3Ci%3ESalping oeca_rosetta%3C/i%3Eformer ly_known_as_%3Ci%3EProte rospongia_sp.%3C/i%3E_AT CC_50818] http://genome.jgipsf.org/Lacbi1/Lacbi1.home.h tml http://genome.jgipsf.org/Copci1/Copci1.home. html http://genome.jgipsf.org/Mucci2/Mucci2.home. html http://genome.jgipsf.org/Batde5/Batde5.home. html JGI Ciona intestinalis JGI Nematostella vectensis JGI Trichoplax adhaerens JGI Monosiga brevicollis JGI Capsaspora owcarzaki Thecamonas trahens http://toxodb.org/toxo/ CryptoDB Dictyostelium purpureum Salpingoeca rosetta http://plasmodb.org/plasmo/ BROAD Origins of Multicellularity Database Laccaria bicolor JGI Coprinus cinereus JGI Mucor circinelloides JGI Batrachochytrium dendrobatidis JGI (Aurrecoechea et al. 2009b) (Aurrecoechea et al. 2007) (Heiges et al. 2006) (Arnaiz et al. 2007) (Putnam et al. 2007) (Srivastava et al. 2008) (King et al. 2008) (Martin et al. 2008) 173 CHAPTER 5 CONCLUDING REMARKS The studies presented in this thesis were designed to provide insight into the evolutionary history of meiosis. Although different hypotheses for the prevalence and maintenance of meiosis at the population level are well developed, scant data elucidating the origin and subsequent evolution of meiotic genes are available. In the following text, I will bring the research presented in this thesis into context by outlining our current understanding of meiosis at the levels of populations, individuals, and genes. I will also present a unifying hypothesis for the origin of meiosis. Finally, I will suggest experiments that will further elucidate the origin and evolution of meiosis. Why meiosis? Meiosis is necessary for sexual reproduction in eukaryotes (Weismann, Parker, and Ronnfeldt 1893). Two (usually haploid) products of meiosis (e.g. spores and gametes) are combined (cell fusion), yielding offspring with the parental numbers of homologous chromosomes (usually diploid) (Figure 1.3 – B) (Weismann, Parker, and Ronnfeldt 1893). Thus, the halving of organisms’ genomes during meiosis ensures the maintenance of ploidy in offspring; sexual reproduction results in the alternation of haploid and diploid phases in eukaryotic life cycles (Maynard Smith and Szathmary 1995). There are several costs associated with sexual reproduction: 1) the time and energy to switch from mitotic to meiotic cell divisions; 2) the search for appropriate mating partners; 3) the risks of failing to find appropriate mates; 4) the risk of contracting sexually transmitted diseases; 5) the disruption of genomes that are well adapted to their environments; and 6) the transmission of only half of genetic material to offspring (the twofold cost of sex) (Nei 1967; Lewontin 1971; Feldman 1972; Maynard Smith 1978; Michod and Levin 1988; Kondrashov 1993; Barton and Charlesworth 1998; West, Lively, and Read 1999; Otto and Lenormand 2002). These costs of sexual reproduction 174 and meiosis would seem to be prohibitively expensive, giving a fitness advantage to asexually reproducing populations. Despite the costs associated with sexual reproduction, it is pervasive among eukaryotes (Bell 1982). Obligate asexual lineages are uncommon among eukaryotes and persist for relatively short periods of evolutionary time (White 1978; Bell 1982; Richards 1986). These observations beg the following question: Why should so many eukaryotes take these risks? In essence, this is the “paradox of sex” (Michod and Levin 1988; Kondrashov 1993; Barton and Charlesworth 1998; West, Lively, and Read 1999; Otto and Lenormand 2002). Some organisms reduce the costs of sex by alternating between sexual and asexual modes of reproduction (facultative sex) (Dacks and Roger 1999). However, the questions of why facultatively sexual organisms should bother with meiosis at all and why a great number of organisms rely exclusively upon sexual reproduction remain. The question of why eukaryotes undergo the costly process of sexual reproduction is often answered with the benefits of genetic recombination to produce variable offspring, upon which natural selection can act (Fisher 1930; Muller 1932; Hill and Robertson 1966), especially in response to changing environments (Van Valen 1973). In fact, that recombination increases the efficacy of natural selection has been demonstrated convincingly in the laboratory with the fruit fly Drosophila melanogaster (Rice and Chippindale 2001) and the green alga Chlamydomonas reinhardtii (Kaltz and Bell 2002). In these organisms, populations that underwent multiple generations of sexual reproduction were more fit than asexually reproducing populations. Populations with genetic recombination may also be able to purge deleterious mutations more rapidly than exclusively asexual populations (Muller 1964; Kondrashov 1988). Although various hypotheses have been offered, revealing differences of opinion regarding the importance of selection for positive mutations and the elimination of deleterious mutations (i.e. the roles of natural selection and random genetic drift, respectively), there is little doubt that 175 the population level effects of genetic recombination provide sufficient selection for the long-term maintenance of meiosis (Otto and Gerstein 2006). More contentious, is the notion that there are short-term benefits of genetic recombination at the level of the individual. Prior to the origin of meiosis, eukaryotes must have relied only upon asexual modes of reproduction (probably mitosis) (Szathmary and Smith 1995). However, once meiosis arose, these organisms (like many extant eukaryotes) were probably facultative sexual reproducers; sometimes they reproduced sexually and sometimes they reproduced asexually. Therefore, to study the selective advantages of meiosis that led to its origin, we can observe the conditions in which extant facultatively sexual organisms undergo meiosis (Michod, Bernstein, and Nedelcu 2008). It is well known that several unicellular eukaryotes that normally divide by mitosis will occasionally, when exposed to environmental stressors, divide by meiosis (Michod, Bernstein, and Nedelcu 2008). For example, both Saccharomyces cerevisiae (Herskowitz 1988) and C. reinhardtii (Sager and Granick 1954) will switch from mitotic to meiotic divisions in nutrient-poor media and Volvox carteri will undergo meiosis during heat shock (Kirk and Kirk 1986). Thus it is tempting to conclude that these organisms undergo meiosis to introduce variability to their offspring that may deal better with these stressful environments than their parents (Otto and Lenormand 2002; Otto and Gerstein 2006; Otto 2008). However, in the green alga C. reinhardtii, researchers found that calculated fitnesses of sexually reproducing populations were lower than those of asexually reproducing populations during the first generation (Colegrave, Kaltz, and Bell 2002). Only after subsequent episodes of sexual reproduction did fitnesses of the sexually reproducing populations exceed those of asexually reproducing populations (Colegrave, Kaltz, and Bell 2002). This negative, early effect of genetic recombination was also shown separately in D. melanogaster (Charlesworth and Barton 1996). 176 The observations that sexually reproducing organisms are initially less fit than asexually reproducing populations may be explained by a concept called recombination load (Charlesworth and Barton 1996; Colegrave, Kaltz, and Bell 2002). Simply put, the variation in the fitness of a population increases during the first sexually reproducing generation due to genetic recombination (Otto and Lenormand 2002). Previously linked genes become shuffled by genetic recombination, producing novel combinations of genes (Agrawal 2006). The first, most obvious, problem is that combinations of genes that have been selected for in a given environment are broken apart (Charlesworth and Barton 1996). However, many organisms benefit from inheriting combinations of genes that increase their fitnesses, while other organisms inherit deleterious combinations (Kouyos, Otto, and Bonhoeffer 2006). This is why genetic recombination increases the efficacy of natural selection; beneficial combinations of genes are selected for, increasing in populations, and deleterious combinations are purged from populations (Feldman, Christiansen, and Brooks 1980; Kondrashov 1984; Kondrashov 1988). The problem is that, initially, the increased fitness provided by the beneficial combinations of genes is outweighed by the decrease in fitness caused by the deleterious combinations of genes and the breaking apart of previously fit genomes (Otto and Lenormand 2002). Theoretically, there are conditions (weak and negative epistasis) in which short-term advantages of genetic recombination could be realized and positively selected (Otto and Lenormand 2002). However, evidence that these conditions exist in nature is weak (Elena and Lenski 1997; Rice 2002; Bonhoeffer et al. 2004). For these reasons, it is unlikely that the production of variable offspring upon which natural selection acts could have provided the immediate selective benefits required for the origin of meiosis. Some environmental conditions, such as exposure to metabolically or environmentally produced oxygen-containing compounds, can result in double-strand DNA breaks (i.e. oxidative stress) (Nedelcu and Michod 2003; Nedelcu, Marcu, and Michod 2004). While single-strand DNA damage can be repaired using the 177 complementary strand of DNA in a helix, double-strand DNA damage requires recombination with homologous chromosomes (Michod, Bernstein, and Nedelcu 2008). Thus meiosis may have arisen as an adaptation for damage DNA repair (Bernstein et al. 1984). This possibility is evidenced by S. pombe and V. carteri, both of which undergo meiosis in response to oxidative stress (Bernstein and Johns 1989; Nedelcu and Michod 2003). Furthermore, the connection between DNA damage and recombination is supported by the observations that mutations in recombination genes makes cells sensitive to UV damage and exposure of cells to mutagens increases recombination rates (Bernstein and Bernstein 1991). Of course, the hypothesis in which double-strand DNA damage repair supplies the primary benefit of meiosis relies upon the presence of at least a diploid number of homologous chromosomes. Indeed, diploid yeast cells are more resistant to DNA damage than haploid cells (Herskowitz 1988). However, unlike many diploid eukaryotes (e.g. metazoans (Schrader and Hughes-Schrader 1931)) whose haploid states are transient and associated exclusively with sexual reproduction (e.g. gametes), many eukaryotes, especially unicellular, experience longer haploid stages (Lewis 1985). The question, then, is why would organisms risk the integrity of their genomes by having extended haploid lifecycle stages? If DNA damage occurs during the haploid state and no other appropriate haploid cells are available for cell fusion and replication of chromosomes is not possible, due to the damage, the cells would be seem to be in danger. The benefits of having haploid stages during the lifecycles of unicellular eukaryotes must outweigh the risks of their diminished capacities to repair double-strand DNA damage. Another hypothesis posits that environmental stressors (e.g. oxidative stress and starvation) may provide an advantage to organisms with haploid-diploid ploidy cycles (Maynard Smith and Szathmary 1995). That is, during times of oxidative stress, diploidy may be beneficial for repair of double-strand DNA damage, while, during starvation, haploids may benefit from faster growth relative to diploids (Cleveland 1947; Szathmary 178 et al. 1990; Hurst and Nurse 1991). Haploid populations of S. cerevisiae do, indeed, have higher fitnesses than diploid populations in nutrient-limiting environments (Adams and Hansche 1974). It has been argued, however, that, since the ancestral eukaryote in which meiosis arose was probably phagotrophic, diploidy should have been favored during periods of starvation as diploids are larger and should, therefore, be able to engulf larger prey (Lewis 1985; Maynard Smith and Szathmary 1995). This logic leads us back to the conclusion that diploidy should be beneficial and haploidy should be rare, begging, once again, the question of why eukaryotes should exist as haploids at all. Furthermore, these hypotheses suffer from lack of supporting data (Kondrashov 1994). There is no evidence that ancestral eukaryotes would have been subjected to such alternating environments. The fact that the majority of genetic mutations are deleterious (Lewontin 1974) would seem to support the notion that diploidy should be selectively advantageous, due to the presence of two copies of every gene (Otto and Goldstein 1992). If an allele carrying a deleterious mutation can be masked by the presence of a wildtype allele in a homologous chromosome (i.e. low dominance) then fitnesses of organisms may not be affected (Crow and Kimura 1965; Maynard Smith 1978; Charlesworth 1991; Kondrashov and Crow 1991; Perrot, Richerd, and Valero 1991). Mathematical models confirm this prediction, but only in cases of large genomes with relatively high rates of genetic recombination (Perrot, Richerd, and Valero 1991; Otto and Goldstein 1992). This is because increasing numbers of heterozygous loci result in greater ability to mask deleterious alleles (Otto and Goldstein 1992). However, the increased fitness of heterozygotes (heterosis) has a price: the maintenance of deleterious alleles in the population (mutation load) (Kondrashov 1994). In haploid cells with small genomes and low levels of genetic recombination, deleterious mutations are unlikely to persist as selection should efficiently purge them from the population (Scudo 1967). The ancestral eukaryotes in which meiosis arose most likely had few chromosomes (maybe only one) and relatively low rates of genetic recombination (Michod and Levin 1988). Therefore, 179 they would have benefited from maintaining a haploid number of chromosomes (Otto and Goldstein 1992). In ameiotic haploid organisms, diploidization might occur either endogenously, due to errors that occur during mitotis (Cleveland 1947; Hurst and Nurse 1991), or exogenously, due to the fusion of two haploid cells (Cavalier-Smith 1975) This diploidization is likely to have reduced the fitnesses of the ameiotic ancestral eukaryotes in which meiosis arose (Otto and Goldstein 1992). Therefore, I argue that meiosis arose due to the selective benefits of retrieving haploid numbers of chromosomes after spontaneous diploidization. Furthermore, I propose also that meiosis could only have arisen in the presence of a strong constant selective pressure. Therefore, the cause of these diploidization events is important to consider. If mutations occurred that resulted in endogenous diploidization then they would simply have been selected against and purged from the population. Thus, cytological sources of diploidization would not have provided the constant selective pressures necessary for the origin of meiosis. However, diploidization that occurred because of repeated fusions of two haploid eukaryotes could have provided a constant selective force. Such fusions could be explained as an artifact of the ancestral eukaryotes’ phagocytic lifestyle. That is, when one cell attempted to engulf another eukaryotic cell (either by accident or cannibalism), their membranes could have occasionally fused. Unlike endogenous sources of diploidization, such an exogenous source could not easily be purged from populations. A method of identification could have evolved in order to avoid fusions but, if cannibalization was common, then chemical signaling may not have been selectively advantageous. Of course, the problem could have been remedied by abandoning their phagotrophic lifestyles but this solution would have required a switch to another food source, an endeavor that would most certainly have required many changes to the cells. Simply put, meiosis was a less costly way, evolutionarily speaking, to cope with constant and spontaneous diploidization than eliminating the cause (phagocytosis) altogether. This 180 scenario provides the constant, immediate selective benefits to individuals that would have been necessary for meiosis to arise. Meiosis arose from mitosis There are two main theories for the origin of meiotic genes: 1) meiotic genes arose directly from prokaryotic genes encoding products that were involved primarily in transformation (reviewed in (Bernstein and Bernstein 2010)); and 2) meiotic genes arose from genes encoding products that were involved primarily in mitosis (reviewed in (Wilkins and Holliday 2009)). Distinguishing between these possibilities is important to our understanding of the origin and evolution of meiosis. Prokaryotic organisms are able to exchange genetic material via parasexual processes (i.e. conjugation (Lederberg and Tatum 1946), transformation (Griffith 1928; Avery, Macleod, and McCarty 1944), and transduction (Lederberg et al. 1951)), utilizing recombination enzymes that are also important for DNA damage repair (Maynard Smith and Szathmary 1995). In this regard, prokaryotic parasexual processes are analogous to sexual reproduction in eukaryotes. More specifically, recombination of prokaryotic genomes during transformation appears similar to meiotic recombination in eukaryotes (Bernstein and Bernstein 2010). Indeed, many genes necessary for bacterial transformation are orthologs of genes necessary for recombination of nonsister homologous chromosomes during meiosis (Marcon and Moens 2005). In addition, bacterial and eukaryotic orthologs may have similar functions during transformation and meiosis, respectively. For example, bacterial RecA, which stimulates DNA strand exchange during transformation, is orthologous to the eukaryotic gene encoding Dmc1, which stimulates interhomolog DNA strand exchange during meiosis in most eukaryotes. In addition, both transformation and meiosis can be induced by similar types of stress. Following these observations, it has been proposed that meiosis in eukaryotes arose immediately from eubacterial transformation (Bernstein and Bernstein 2010). This 181 hypothesis explains the evolution of sex as a continuous evolutionary process from bacteria to eukaryotes (Bernstein and Bernstein 2010). Central to the argument that meiotic recombination in eukaryotes arose directly from eubacterial transformation is the observation that many genes were horizontally transferred from mitochondria (likely the result of the engulfment of eubacteria by early eukaryotes (Margulis 1970)) to the nuclear genome of eukaryotes (Gabaldon and Huynen 2003). Eubacterial recA homologs and eukaryotic recA homologs (Rad51 and Dmc1) share a high level of sequence similarity (0.20 and 0.23, respectively; Figure 3.14). Therefore, eukaryotic Rad51 and Dmc1 homologs may have arisen from recA orthologs that were transferred from eubacteria after their engulfment by eukaryotes (Lin et al. 2006). However, this model also predicts that Rad51 and Dmc1 should be more closely related to eubacterial recA genes than to archaebacterial RadA genes and distance analyses indicate that Rad51 and Dmc1 are most similar to archaebacterial RadA genes (0.43 and 0.45, respectively; Figure 3.14). Also, phylogenetic analyses indicate that Rad51 and Dmc1 share a more recent ancestor with archaebacterial RadA genes than with eubacterial recA genes (Stassen et al. 1997; Lin et al. 2006). In sum, these data indicate that eukaryotes inherited a recA homolog (RadA) vertically from archaebacteria and not horizontally from a eubacteria. Since the first eukaryotes were certain to have been capable of nuclear divisions (i.e. mitosis), it is most likely that mitosis arose very early during eukaryotic evolution. The protoeukaryotes could have been mitotically dividing organisms that were also capable of bacteria-like transformation. Then meiosis could have arisen from transformation in the presence of mitosis. The crux of this argument is that meiotic recombination originating from bacterial transformation would have been a continuous evolutionary process (Bernstein and Bernstein 2010). That is, if mitosis arose first and there was neither bacteria-like transformation nor meiosis then a gap exists, during which eukaryotes did not undergo genetic recombination or sex (Bernstein and Bernstein 2010). 182 This argument assumes that nonsister homologous recombination did not occur during mitosis. However, crossing over has been shown in animal and fungal vegetative cells (mitotic crossing-over), albeit at much lower frequencies than meiotic crossing over (Cardoso et al.; Xu and Rubin 1993). Genetic recombination could have occurred if protoeukaryotes were capable of mitosis but neither transformation nor meiosis; there need not have been a “sex gap” during eukaryotic evolution if mitosis arose first. Phylogenetic and distance analyses of the translated protein sequences of eukaryotic and prokaryotic recA homologs indicate that the eukaryotic Rad51 and Dmc1 genes are paralogs (Figure 3.14). That is, the genes encoding Rad51, which functions during both mitotic and meiotic DNA strand exchange reactions in model organisms, and Dmc1, which functions only during meiotic DNA strand exchange reactions in model organisms, arose by a single gene duplication event that occurred during eukaryotic evolution. There are three possible outcomes of gene duplication events: 1) the “extra” gene copy quickly degrades and its products (if any) do not function (nonfunctionalization) (Ohno 1970); 2) a division of labor occurs such that the two gene copies encode products that perform distinct complementary functions previously accomplished by the products a single gene (subfunctionalization) (Force et al. 1999); or, 3) one gene copy, free from the constraints of purifying selection, is free to mutate and its products then perform novel functions (neofunctionalization) (Ohno 1970). Since both the eukaryotic Rad51 and Dmc1 genes have been retained, either subfunctionalization or neofunctionalization of the genes occurred after they arose. Either the ancestral gene encoded products that functioned during both mitotic and meiotic DNA strand exchange and the duplication event yielded genes whose products divided these functions or the ancestral gene encoded products that functioned during only one reaction (mitotic or meiotic DNA strand exchange) and the gene duplication event resulted in the origin of a novel function. Put another way, either both mitotic and meiotic DNA strand exchange reactions were present at the time of the gene duplication or one arose from the other. 183 In addition to Dmc1 and Rad51, there are several genes whose products are known to function only during meiosis in model organisms that are paralogs of genes encoding products that function during both mitosis and meiosis (Chapter 4). These genes encode products that are involved in several important (if not critical) events that occur during meiosis, including sister chromatid cohesion, dsDNA cutting, DNA strand exchange, and Holliday junction resolution. Thus the phenomenon is not restricted to genes encoding products involved in DNA strand exchange reactions but include many other genes necessary for successful completion of meiosis. Additionally, the distributions of these genes among diverse eukaryotes and phylogenetic analyses indicate that many of these duplications occurred prior to the common ancestor of all extant eukaryotes, making it possible that they all occurred at the same time or during a very small window during eukaryotic evolution. It is likely that either both mitosis and meiosis were present at the time of the gene duplication event(s) or one arose from the other. Since the earliest eukaryotes were also most likely haploid, it seems unlikely that meiosis could have been the primary means of reproduction as it would have required two rounds of DNA synthesis or some combination of cell fusions and DNA synthesis to obtain the appropriate numbers of chromosomes. Although some single-celled organisms (e.g. Saccharomyces cerevisiae) have haploid stages of their lifecycles, during which they fuse with other cells to form diploid cells that may ultimately undergo meiosis, most of their nuclear divisions are mitotic (Herskowitz 1988). These observations and the greater cytological and genetical complexity of meiosis (Chapter 1) indicate that mitosis most likely arose first and meiosis is a derived process that arose later during eukaryotic evolution (Cavalier-Smith 1981b; Simchen and Hugerat 1993; Wilkins and Holliday 2009). In total, these data indicate that meiosis may have arisen from mitosis de novo as a result of one or more largescale gene duplication events. Below, I propose an 184 evolutionary model that includes a preadaptation that could have provided the selective benefits necessary for such a profound event to occur. A model for the evolution of meiotic DNA strand exchange genes The results obtained and the observations made during the studies performed in Chapters 2 through 4 revealed three major points regarding the evolution of meiotic DNA strand exchange genes: 1) While meiotic DNA strand exchange genes are often lost, Rad51 appears to be present in all but one eukaryotic genome studied; 2) In Saccharomyces cerevisiae, rad51 functional mutations or Rad51 overexpression rescue(s) the null mutant phenotypes of other DNA strand exchange genes studied in Chapter 2 (Table 2.5); and, 3) Rad51 and Dmc1 may have overlapping functions in some organisms, such that one paralog may perform the activities of the other. These points have culminated in a model of meiotic DNA strand exchange gene evolution that explains the various complements of genes observed in different eukaryotes (Figure 5.1). The presence of ten DNA strand exchange genes (Rad52, Rad59, Rad51, Rad55, Rad57, Dmc1, Hop2, Mnd1, Rad54, and Rdh54) in representative genomes of all the eukaryotic supergroups studied in Chapter 2 (Opisthokonta, Amoebozoa, Archaeplastida, Chromalveolata, and Excavata) indicate that they were likely to have been present in the last eukaryotic common ancestor. It is, therefore, feasible that the ancestor of eukaryotes was capable of meiotic DNA strand exchange and meiosis (Figures 2.1 and 5.1 – A). Also, given their distributions, two genes (Rad59 and Rad54) may have arisen later during eukaryotic evolution (Figures 2.1 and 5.1 – B). So, although eukaryotes began with a core set of meiotic DNA strand exchange machinery, additional genes have since been added during the evolution of different eukaryotic lineages. However, the distributions of meiotic DNA strand exchange genes also indicate that frequent independent losses of important genes have occurred. These apparently contradictory observations beg the question: How can eukaryotes lose genes so important for meiotic 185 DNA strand exchange in model organisms and, by inference, in the last common ancestor of all extant eukaryotes? Only one organism (Giardia intestinalis) is confirmed to be without a Rad51 gene, while other genes have often been lost (Figures 2.1, 3.5. and 4.1). Hence, I hypothesized that a connection exists between the nearly ubiquitous presence of Rad51 among eukaryotes and the frequent loss of other meiotic DNA strand exchange genes in independent eukaryotic lineages. As it happens, these observations can be explained by the following: 1) There are no known suppressors of rad51 animal or fungal null mutant phenotypes; and 2) Overexpression or functional mutations of the Rad51 gene suppresses rad52, dmc1, rad55, rad57, hop2, mnd1, rad54, and rdh54 Saccharomyces cerevisiae null mutants (Milne and Weaver 1993; Klein 1997; Bishop et al. 1999; Krejci et al. 2002; Tsubouchi and Roeder 2003; Henry et al. 2006; Schild and Wiese 2009). I hypothesized that changes in Rad51 expression or changes in its coding sequence may result in the relaxation of purifying selection on meiotic DNA strand exchange genes (such as the Dmc1 gene) (Figure 5.1 – C). That is, when overexpressed or mutated Rad51 products may ‘fill-in’ for missing components, performing their functions, or rendering the functions of other gene products altogether unnecessary. Such a dynamic would theoretically result in relaxation of the normally purifying selection that serves to preserve genes in populations of organisms. This relaxation of selection may then result in the loss of DNA strand exchange genes (Figure 5.1 – D). In addition, some meiotic DNA strand exchange components are known to interact only with a limited set of proteins (e.g. Hop2 and Mnd1 proteins only interact with Dmc1) (Chen et al. 2004; Henry et al. 2006). Therefore, the loss of the Dmc1 gene may leave the Hop2 and Mnd1 genes vulnerable to loss (Figure 5.1 – E). Finally, the complements of meiotic DNA strand exchange genes and the interactions of their products may provide a feedback loop in which subsequent mutations changing the expression of Rad51 genes or creating beneficial functional mutants further 186 alter gene combinations (Figure 5.1 – F). Although lineage-specific genes and protein interactions are almost certainly affecting the complements of DNA strand exchange genes observed in different eukaryotes, this general model provides a eukaryote-wide hypothesis for understanding their evolution. There is some preliminary evidence which suggests that one meiotic DNA strand exchange protein may perform the functions of another. As stated previously, G. intestinalis is the only organism known to lack a Rad51 gene. However, during the study presented in Chapter 3, a search was conducted in the genome of a closely related diplomonad (Spironucleus vortens) with database mining and degenerate PCR, and no Rad51 gene was found (data not shown). Interestingly, G. intestinalis does contain two copies of the Dmc1 gene, both appearing to encode proteins that function during nuclear divisions in cysts (Poxleitner et al. 2008). It is possible that one copy of Dmc1 may encode products that perform the functions normally completed by Rad51. The Dmc1 proteins of G. intestinalis and S. vortens appear to have residues that are highly conserved among Rad51 protein sequences, with G. intestinalis Dmc1-A appearing slightly more ‘Rad51-like’ than Dmc1-B, especially at amino acid positions 331 and 332 (Figure 5.2). Residue D332 has been determined in eubacterial RecA and archaebacterial RadA proteins to bind DNA (Story, Weber, and Steitz 1992; Shin et al. 2003; Chen et al. 2007). The functions of these amino acids in Rad51 and Dmc1 proteins are unknown and it is possible that residues 331 and 332 are responsible for Rad51- or Dmc1-specific functions. Further studies will be needed to determine if these residues confer Rad51- or Dmc1-specific functions, but the possibility that one paralog may perform the functions of another in G. intestinalis is intriguing. Whether these sites are useful as diagnostic characters for Rad51- or Dm1-function (regardless of the paralog being observed) and if there is any functional significance of variations at these sites are questions worthy of scientific investigation. The point here is that the functions of 187 meiotic DNA strand exchange genes and the interactions between them may be more dynamic than previously supposed. A model for the origin of meiosis The results of the scientific studies presented in this thesis have culminated in a cohesive model for the origin of meiosis that I will now present. There are four main events that distinguish meiosis from mitosis: 1) the pairing of homologous chromosomes during meiosis I; 2) DNA strand exchange (recombination) between non-sister homologous chromosomes; 3) sister-chromatid cohesion that persists through the first meiotic division; and 4) the absence of DNA replication (S-phase) upon entering the second meiotic division (Wilkins and Holliday 2009). Although there are other differences between meiosis and mitosis (described in Chapter 1 and summarized in Figure 1.3), understanding the possible the origins of these four novel steps is considered by many to be necessary for surmising the origin of meiosis itself (Kleckner 1996; Villeneuve and Hillers 2001; Wilkins and Holliday 2009). That all (or almost all) of these steps, each requiring its own set of specialized machinery, are necessary in eukaryotes for successful completion of meiosis would seem to exclude any gradualist explanations for the origin of meiosis. However, to suggest that these complex processes could have arisen simultaneously seems to defy logic. For these reasons, the origin of meiosis is considered one of the most formidable problems in evolutionary studies (Maynard Smith 1978; Hamilton 1999; Wilkins and Holliday 2009). The following evolutionary model, including mechanisms for the origins of the novel steps described above, explains the origin of meiosis in a manner that is both feasible and testable. Although I find much agreement with models presented by other researchers (especially (Wilkins and Holliday 2009) and (Cavalier-Smith 2002d)), the timing of important events and the mechanisms proposed here, responsible for the origins of the pairing of homologous chromosomes, prolonged sister-chromatid cohesion, and meiotic recombination, are, I think, unique. 188 I have shown that several genes whose products are known to function only during meiosis in model organisms must have been present in the common ancestor to all known extant eukaryotes (Chapters 3 and 4). Therefore, meiosis must have arisen in eukaryotes that existed prior to the last eukaryotic common ancestor. Such eukaryotes were probably phagotrophic, single-celled organisms with haploid numbers of chromosomes (possibly one chromosome) contained within nuclei (Figure 5.3 – A) (Cavalier-Smith 1975; Hurst and Nurse 1991; Cavalier-Smith 2002a; Wilkins and Holliday 2009). Like many extant eukaryotes, ancestral eukaryotes may have frequently engulfed prokaryotic organisms and, occasionally, other eukaryotes (Figure 5.3 – B) (Adl et al. 2005). It is possible that during phagocytosis, rather than one eukaryotic cell engulfing and digesting another eukaryotic cell, the cell membranes became fused, especially if the cells were genetically identical (Figure 5.3 – C). That the fusion of haploid eukaryotic cells may have been the precursor to meiosis is not a new concept, having been proposed on numerous occasions, probably due to its similarity to syngamy during the haploid-diploid cycles of many extant eukaryotes (Maynard Smith and Szathmary 1995). Following eukaryotic cell fusions, nuclear envelopes could have been followed quickly by nuclear fusions (Figure 5.3 – D). Again, such fusions may have been reminiscent of nuclear fusions observed during the sexual haploid-diploid lifecycles of extant eukaryotes (Wilkins and Holliday 2009). The fusion of two haploid eukaryotic cells would, of course, have yielded a single diploid eukaryotic cell (Figure 5.3 – E). I believe that life would have proceeded somewhat normally for such newly formed diploid eukaryotes, until, that is, they attempted to undergo mitosis. The cells may have entered pre-mitotic S phase (DNA synthesis), copying each of the chromosomes present in the newly diploid nuclei. However, due to changes in gene expression levels and/or stoichiometry of protein and DNA molecules caused by the presence of diploid numbers of chromosomes, mitosis may not have proceeded normally. Recall, that during mitosis, Rad51 proteins function 189 during DNA strand exchange between sister chromatids (and, rarely, between non-sister homologous chromosomes) (Nishinaka et al. 1998; Krogh and Symington 2004; LopezCasamichana et al. 2008), while, during meiosis, Dmc1 proteins are necessary for DNA strand exchange between non-sister homologous chromosomes (Bishop et al. 1992; Bishop 1994; Bishop et al. 1999; Sehorn et al. 2004; Sauvageau et al. 2005) (Figure 1.4 and Table 2.5). Interhomolog DNA strand exchange during meiosis in Saccharomyces cerevisiae dmc1 null mutants is greatly reduced (Table 2.5) (Bishop 1994). However, overexpression of Rad51 significantly diminishes this phenotype, stimulating interhomolog DNA strand exchange (Bishop et al. 1999; Tsubouchi and Roeder 2003). In Chapters 3 and 4 I demonstrated that both Rad51 and Dmc1 genes are present in representatives of all known eukaryotic supergroups and, so, are likely to have been present in the last common ancestor of all extant eukaryotes (Figures 3.2 – 3.12 and 4.1). However, mitosis most likely arose prior to the origin of meiosis (Cavalier-Smith 1981b; Simchen and Hugerat 1993). It is well known that Rad51 and Dmc1 are paralogs (genes arising from a common ancestral gene by duplication) (Stassen et al. 1997; Ramesh, Malik, and Logsdon 2005; Lin et al. 2006; Malik et al. 2008). Therefore, it is also likely that the ancestor of Rad51 and Dmc1 was most similar to Rad51 (I will call this ancestral gene Rad51’), encoding products that functioned during mitotic DNA strand exchange. Thus I propose that the change in the numbers of chromosomes from a haploid to a diploid number resulted in ‘overexpression’ of Rad51’ genes, increasing the numbers of Rad51’ proteins relative to the numbers of DNA molecules. In addition to DNA strand exchange between sister chromatids, this overexpression could have stimulated DNA strand exchange between non-sister homologs (Figure 5.3 – F). Because pairing of non-sister homologous chromosomes is important for successful completion of the reductional division of meiosis, its origin is considered key to the origin of meiosis itself (Wilkins and Holliday 2009). However, without sustained pairing of sister-chromatids through the first division and monopolor attachment of 190 spindles to each chromosome, equational divisions are equally likely to occur (Watanabe and Nurse 1999; Toth et al. 2000; Yokobayashi, Yamamoto, and Watanabe 2003; Hauf and Watanabe 2004). Although meiosis could have evolved with the equational divisions occurring first and the reductional division occurring second, rather than the other way around, we can imagine a mechanism by which sister-chromatids could have stayed bound until the second division. The Rad21 gene encodes products that bind sisterchromatids during mitosis (Table 4.1) (Gruber, Haering, and Nasmyth 2003). During meiosis, Rec8, a paralog of Rad21 (Parisi et al. 1999), performs a similar function (Parisi et al. 1999; Watanabe and Nurse 1999; Toth et al. 2000; Gruber, Haering, and Nasmyth 2003; Yokobayashi, Yamamoto, and Watanabe 2003). Like Dmc1, Rec8 proteins are known to function only during meiosis in model organisms (Parisi et al. 1999; Watanabe and Nurse 1999; Toth et al. 2000; Yokobayashi, Yamamoto, and Watanabe 2003). Again, because we expect that eukaryotes were capable of mitotic nuclear divisions prior to the origin of meiotic divisions, we also expect that the ancestor of Rad21 and Rec8 was most similar to Rad21; encoding products that functioned in a manner similar to Rad21 proteins in extant organisms (I will call this ancestral gene Rad21’). In S. cerevisiae, expression of Rad21 by a Rec8 promoter in null rec8 mutants results in meiosis-like monopolor attachment of microtubules to chromosomes, rather than the mitosis-like bipolar attachment normally seen in null rec8 mutants (Figure 1.3) (Toth et al. 2000). In Schizosaccharomyces pombe null rec8 mutants, Rad21 will relocate to centromeres (Yokobayashi, Yamamoto, and Watanabe 2003). Both experiments resulted in equational, rather than reductional, divisions during meiosis I (Figure 1.3) (Toth et al. 2000; Yokobayashi, Yamamoto, and Watanabe 2003). That is, although Rad21 attaches to centromeres and monopolor attachment of microtubules are rescued, the reductional divisions are not. However, I suggest that Rad21 overexpression in addition to Rad51 overexpression may result in retrieving the reductional division during meiosis I in yeast rec8/dmc1 double null mutants. Similarly, changes in the numbers of Rad21’ proteins 191 relative to the numbers of DNA molecules in primitive eukaryotes (in the presence of increased numbers of Rad51’ proteins) could have resulted in the monopolor attachment of mitotic spindles to and extended sister-chromatid cohesion. Essentially, a meiosis Ilike reductional division may have resulted. Although I cannot find other examples in the current data, it is possible that additional genes acted similarly when overexpressed to achieve pairing of homologous chromosomes or suppression of DNA synthesis upon entering the second round of meiosis in ancestral eukaryotes. However, these steps may be otherwise explained (Wilkins and Holliday 2009). In S. cerevisiae, pairing of homologous chromosomes occurs during the G1 lifecycle stage (prior to pre-meiotic DNA synthesis) (Weiner and Kleckner 1994). Pairing is interrupted during the pre-meiotic S-phase and restored during meiotic prophase I. Like the pre-meiotic pairing that occurs during G1, meiotic pairing initially occurs in the absence of meiotic recombination and synaptonemal complex formation (Burgess, Kleckner, and Weiner 1999). Similarly, pairing of homologous chromosomes occurs in mitotically dividing cells S. cerevisiae during G1, pairing is interrupted during pre-mitotic S-phase, and pairing is restored during G2 (Burgess, Kleckner, and Weiner 1999). Pairing of non-sister homologous chromosomes in somatic cells has also been observed in Diptera and a variety of plants (Stack and Brown 1969). Therefore, a mechanism for homologous pairing may have existed in ancestral eukaryotes, prior to the origin of meiosis. In addition, the changes the interactions of mitotic spindles with homolog kinetochores could have contributed to prolonged sister-chromatid cohesion, through the first division (Wilkins and Holliday 2009). The suppression of DNA synthesis after one (reductional) division, as cells enter into a second (equational) division, distinguishes meiosis from mitosis. In Xenopus laevis, S. cerevisiae, and S. pombe, pre-mitotic DNA synthesis is stimulated by a licensing reaction, in which a complex (composed of Mcm2-7) is loaded onto chromatin 192 by Origin of Replication Complexes (ORCs), Cdc6, and Cdt1 (Blow and Dutta 2005). In budding and fission yeasts, the activities of these Mcm complex ‘loaders’ are downregulated by Cycling Dependent Kinases (CDKs) during S-phase and early mitosis, preventing DNA synthesis (Broek et al. 1991; Hayles et al. 1994; Dahmann, Diffley, and Nasmyth 1995; Diffley 1996; Piatti et al. 1996). In animals, upregulation of CDKs and inhibition of Cdt1 by geminin act together to suppresses DNA synthesis during mitosis (Wohlschlegel et al. 2000; Tada et al. 2001; Lee et al. 2004). In animals and fission yeast, overexpression of Cdt1 and Cdc6 results in extensive re-replication during mitosis (Nishitani et al. 2000; Vaziri et al. 2003; Thomer et al. 2004; Arias and Walter 2005). However, in S. cerevisiae and X. laevis, significant re-replication of DNA occurs only when CDKs or geminin are inactivated (Nguyen, Co, and Li 2001; Li and Blow 2005). Overexpression of CDKs and/or geminin should, then, suppress DNA synthesis. It is possible that, in ancestral eukaryotes, changes in the numbers and/or stoichiometry of CDKs caused by the presence of diploid numbers of chromosomes in otherwise haploid cells resulted in suppression of DNA synthesis after the first division. At that point, the cells could have simply entered into a normal mitotic division, yielding haploid cells (Figure 5.3 – G). Assuming the presence of small genomes, selection likely favored haploid cells over diploid cells early during eukaryotic evolution. Therefore, diploid eukaryotes arising from the fusion of two haploid eukaryotes should have been at a selective disadvantage. This selective force may have resulted in further refinement of the process described here. Eventually large-scale gene duplication events, possibly due to frequent unequal pairing of non-sister homologous chromosomes, yielded the many paralogous gene groups seen today (Figure 4.16). The presence of gene paralogs allowed for divisions of labor to occur (Ohno 1970; Ridley 2004) such that some genes would encode products that functioned predominantly during mitosis, but, on the occasions in which cell fusions occurred, the other genes could have functioned to reduce the numbers of 193 chromosomes. As genomes became recombined, the longer-term benefits of genetic recombination may have been realized and meiosis would have become even more refined, including enhanced mechanisms for cell fusions, dsDNA cuts, crossing-over, cross-over interference, and synaptonemal complex formation. In summary, the data presented in this thesis supports the idea that meiosis arose from mitosis by large-scale gene duplication following a preadaptation that served to reduce increased numbers of chromosomes (from diploid to haploid) caused by erroneous eukaryotic cell-cell fusions. Future directions The model for the origin of meiosis presented in this thesis makes two major predictions: 1) During mitosis, overexpression of Rad51 and Rad21 genes should promote reductional divisions; and 2) Meiosis should be possible in the absence of meiosis-specific machinery. Both of these hypotheses can be tested using modern genetic techniques. However, it should be noted that only positive results would be informative, while negative results could arise from behaviors that evolved since meiosis arose (e.g. cell cycle checkpoints). Below, I suggest experiments designed to explore the models presented in this thesis and provide further insight into the origin and evolution of meiosis. Whether reductional divisions can be produced during mitosis and whether the reductional division in dmc1/rec8 null mutants can be rescued, by overexpression of Rad51 and Rad21 could be tested with Saccharomyces cerevisiae. As described previously, increasing Rad51 copy number in S. cerevisiae null dmc1 mutants rescues the null mutant phenotype (Bishop et al. 1999), characterized by defective recombination, accumulation of double-strand break recombination intermediates, failure to form normal synaptonemal complexes, and arrest late in meiotic prophase I (Bishop 1994) (Table 2.5). Recall, that there is no null mutant phenotype during mitosis (Bishop et al. 1992). Interestingly, overexpression of another gene involved in meiosis (Rad54) also rescues 194 null dmc1 phenotypes (Tsubouchi and Roeder 2003), but it probably does so by removing double-strand break recombination intermediates (Petukhova, Stratton, and Sung 1998; Petukhova et al. 1999; Kiianitsa, Solinger, and Heyer 2002), rather than by performing the job of Dmc1, as the increased numbers of Rad51 are likely to do. Although overexpressing Rad51 rescues the null dmc1 mutant phenotype in S. cerevisiae, it is unlikely to rescue a null dmc1/rec8 mutant alone. The S. cerevisiae null rec8 mutant phenotype includes the loss of monopolor attachment of microtubules and equational divisions during meiosis I (Parisi et al. 1999; Watanabe and Nurse 1999). The increased expression of Rad21 in rec8 mutants rescues monopolor attachment of microtubules but does not retrieve reductional divisions (Toth et al. 2000). I propose that Rad51 and Rad21 overexpression in UV radiated S. cerevisiae cells may rescue the reductional divisions in null rec8 mutants by increasing the numbers of DNA strand exchange events and extending sister-chromatid cohesion through meiosis I. In addition, I propose that Rad51 and Rad21 overexpression should rescue reductional divisions in dmc1/rec8 double mutants. These experiments would test the idea that overexpression of paralogs whose products function during both mitosis and meiosis (e.g. Rad51 and Rad21) results in the completion of functions normally fulfilled by products that function only during meiosis (e.g. Dmc1 and Rec8). To test whether overexpression of Rad51 and Rad21 is sufficient to explain the origin of meiosis, the same experiments could be performed during mitosis. If reductional divisions are observed, then the overexpression of genes is sufficient to explain the origin of meiosis. 195 A B Rad52 Rad51 Rad55 Rad57 Dmc1 Hop2 Mnd1 Rdh54 Rad52 Rad51 Rad55 Rad57 Dmc1 Hop2 Mnd1 Rdh54 Rad54 Rad59 C Rad51 rad51 D Rad52 Rad51 Rad55 Rad57 Dmc1X Hop2 Mnd1 Rdh54 Rad54 Rad59 E Rad52 Rad51 Rad55 Rad57 Dmc1X Hop2 X Mnd1 X Rdh54 Rad54 Rad59 F Figure 5.1: General model for the evolution of DNA strand exchange genes. A. many DNA strand exchange genes arose very early during eukaryotic evolution, B. additional components may have arisen later by gene duplication, C. Rad51 gene overexpression or mutation results in relaxed selection for retention of other components, D. some components may be lost, E. other components known to function only in complexes may be lost (i.e. Hop2/Mnd1 heterodimers are known only to function with Dmc1 proteins), and F. suites of genes result in further selection for rad51 mutations. Components in bold indicate they are known to function only during meiosis in model organisms. 196 Dmc1 Rad51 Position Saccharomyces Homo Entamoeba Oryza Plasmodium Trichomonas Cercomonas Giardia A Giardia B Spironucleus Saccharomyces Homo Entamoeba Oryza Plasmodium Trichomonas Gymnophrys Rad51 Dmc1 248 265 286 288 302 318 331/2 A-A-Y-T-H-G-VD A-A-Y-T-H-G-VD A-A-Y-T-H-S-VD A-A-Y-T-H-G-VD A-A-Y-S-H-G-VD A-A-Y-T-H-G-VD A-A-Y-T-H-G-VD A-A-F-V-K-N-VD M-L-F-V-K-N-VD M-I-F-V-K-N-VD S-L-F-V-K-N-PG L-V-F-V-K-N-PG L-V-F-V-K-N-PG I-L-F-V-K-N-PG L-S-F-V-K-N-PG L-A-F-V-T-N-PD T-V-F-V-K-N-PG 85 97 84 89 78 92 75/97 47 40 95 82 68 100 97/64 Figure 5.2: Alignment of conserved Rad51 and Dmc1 residues. Percent identities determined from the alignments of 98 Rad51 and 51 Dmc1 protein sequences (Chapter 3) are indicated. Amino acid residues that are at least 75% conserved in Rad51 are highlighted in yellow, Dmc1 in green. The actual percent identities are provided for each paralog below. The Saccharomyces cerevisiae Rad51 amino acid residue is indicated above for reference. Representatives are provided here for each eukaryotic supergroup (Opisthokonta are labeled purple, Amoebozoa blue, Archeplastida green, Chromalveolata orange, Excavata brown, and Rhizaria eggplant). Giardia intestinalis and Spironucleus vortens Dmc1 protein sequence data are also provided for comparison 197 A) Two genetically similar/identical nucleated cells (possibly sisters) with haploid numbers of linear chromosomes B) One cell attempts to engulf the other C) Fusion of cell membranes E) Single nucleus with diploid number of chromosomes x F) Entry into “mitosis”, DNA synthesis, Rad51 and Rad21 overexpression, DNA strand exchange, and pairing of homologous chromosomes G) CDK overexpression, suppression of DNA synthesis, and entry into normal haploid mitosis D) Fusion of nuclear envelopes Figure 5.3: Model for mitotic ploidy reduction in ancestral eukaryotes. 198 198 REFERENCES Aboussekhra, A., R. Chanet, A. Adjiri, and F. Fabre. 1992. Semidominant suppressors of srs2 helicase mutation of Saccharomyces cerevisiae map in the RAD51 gene, whose sequence predicts a protein with similarities to prokaryotic recA proteins. Molecular and Cellular Biology 12:3224-3234. Acharya, N., L. Haracska, R. E. Johnson, I. Unk, S. Prakash, and L. Prakash. 2005. Complex formation of yeast Rev1 and Rev7 proteins: a novel role for the polymerase-associated domain. Molecular and Cellular Biology 25:9734-9740. Adams, J., and P. E. Hansche. 1974. Population Studies in Microorganisms .1. Evolution of Diploidy in Saccharomyces cerevisiae. Genetics 76:327-338. Adl, S. M., B. S. Leander, A. G. B. Simpson, J. M. Archibald, O. R. Anderson, D. Bass, S. S. Bowser, G. Brugerolle, M. A. Farmer, S. Karpov, M. Kolisko, C. E. Lane, D. J. Lodge, D. G. Mann, R. Meisterfeld, L. Mendoza, O. Moestrup, S. E. Mozley-Standridge, A. V. Smirnov, and F. Spiegel. 2007. Diversity, nomenclature, and taxonomy of protists. Systematic Biology 56:684-689. Adl, S. M., A. G. Simpson, M. A. Farmer, R. A. Andersen, O. R. Anderson, J. R. Barta, S. S. Bowser, G. Brugerolle, R. A. Fensome, S. Fredericq, T. Y. James, S. Karpov, P. Kugrens, J. Krug, C. E. Lane, L. A. Lewis, J. Lodge, D. H. Lynn, D. G. Mann, R. M. McCourt, L. Mendoza, O. Moestrup, S. E. Mozley-Standridge, T. A. Nerad, C. A. Shearer, A. V. Smirnov, F. W. Spiegel, and M. F. Taylor. 2005. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists. J Eukaryot Microbiol 52:399-451. Agrawal, A. F. 2006. Evolution of sex: Why do organisms shuffle their genotypes? Current Biology 16:R696-R704. Aihara, H., Y. Ito, H. Kurumizaka, S. Yokoyama, and T. Shibata. 1999. The N-terminal domain of the human Rad51 protein binds DNA: Structure and a DNA binding surface as revealed by NMR. Journal of Molecular Biology 290:495-504. Allison, P. D. 1999. Logistic Regression Using SAS Theory and Application. SAS Institute, Inc., Cary, NC. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389-3402. Anuradha, S., and K. Muniyappa. 2004a. Saccharomyces cerevisiae Hop1 zinc finger motif is the minimal region required for its function in Vitro. Journal of Biological Chemistry 279:28961-28969. Anuradha, S., and K. Muniyappa. 2004b. Meiosis-specific yeast Hop1 protein promotes synapsis of double-stranded DNA helices via the formation of guanine quartets. Nucleic Acids Research 32:2378-2385. Arbel, A., D. Zenvirth, and G. Simchen. 1999. Sister chromatid-based DNA repair is mediated by RAD54, not by DMC1 or TID1. EMBO J 18:2648-2658. 199 Archetti, M. 2004. Loss of complementation and the logic of two-step meiosis. Journal of Evolutionary Biology 17:1098-1105. Archibald, J. M. 2008. The eocyte hypothesis and the origin of eukaryotic cells. Proc Natl Acad Sci U S A 105:20049-20050. Arias, E. E., and J. C. Walter. 2005. Replication-dependent destruction of Cdt1 limits DNA replication to a single round per cell cycle in Xenopus egg extracts. Genes Dev 19:114-126. Armbrust, E. V., J. A. Berges, C. Bowler, B. R. Green, D. Martinez, N. H. Putnam, S. Zhou, A. E. Allen, K. E. Apt, M. Bechner, M. A. Brzezinski, B. K. Chaal, A. Chiovitti, A. K. Davis, M. S. Demarest, J. C. Detter, T. Glavina, D. Goodstein, M. Z. Hadi, U. Hellsten, M. Hildebrand, B. D. Jenkins, J. Jurka, V. V. Kapitonov, N. Kroger, W. W. Lau, T. W. Lane, F. W. Larimer, J. C. Lippmeier, S. Lucas, M. Medina, A. Montsant, M. Obornik, M. S. Parker, B. Palenik, G. J. Pazour, P. M. Richardson, T. A. Rynearson, M. A. Saito, D. C. Schwartz, K. Thamatrakoln, K. Valentin, A. Vardi, F. P. Wilkerson, and D. S. Rokhsar. 2004. The genome of the diatom Thalassiosira pseudonana: Ecology, evolution, and metabolism. Science 306:79-86. Arnaiz, O., S. Cain, J. Cohen, and L. Sperling. 2007. ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data. Nucleic Acids Res 35:D439-444. Aslett, M., C. Aurrecoechea, M. Berriman, J. Brestelli, B. P. Brunk, M. Carrington, D. P. Depledge, S. Fischer, B. Gajria, X. Gao, M. J. Gardner, A. Gingle, G. Grant, O. S. Harb, M. Heiges, C. Hertz-Fowler, R. Houston, F. Innamorato, J. Iodice, J. C. Kissinger, E. Kraemer, W. Li, F. J. Logan, J. A. Miller, S. Mitra, P. J. Myler, V. Nayak, C. Pennington, I. Phan, D. F. Pinney, G. Ramasamy, M. B. Rogers, D. S. Roos, C. Ross, D. Sivam, D. F. Smith, G. Srinivasamoorthy, C. J. Stoeckert, Jr., S. Subramanian, R. Thibodeau, A. Tivey, C. Treatman, G. Velarde, and H. Wang. 2010. TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 38:D457-462. Atcheson, C. L., B. DiDomenico, S. Frackman, R. E. Esposito, and R. T. Elder. 1987. Isolation, DNA sequence, and regulation of a meiosis-specific eukaryotic recombination gene. Proc Natl Acad Sci U S A 84:8035-8039. Aurrecoechea, C., J. Brestelli, B. P. Brunk, J. M. Carlton, J. Dommer, S. Fischer, B. Gajria, X. Gao, A. Gingle, G. Grant, O. S. Harb, M. Heiges, F. Innamorato, J. Iodice, J. C. Kissinger, E. Kraemer, W. Li, J. A. Miller, H. G. Morrison, V. Nayak, C. Pennington, D. F. Pinney, D. S. Roos, C. Ross, C. J. Stoeckert, Jr., S. Sullivan, C. Treatman, and H. Wang. 2009a. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis. Nucleic Acids Res 37:D526-530. Aurrecoechea, C., J. Brestelli, B. P. Brunk, J. Dommer, S. Fischer, B. Gajria, X. Gao, A. Gingle, G. Grant, O. S. Harb, M. Heiges, F. Innamorato, J. Iodice, J. C. Kissinger, E. Kraemer, W. Li, J. A. Miller, V. Nayak, C. Pennington, D. F. Pinney, D. S. Roos, C. Ross, C. J. Stoeckert, Jr., C. Treatman, and H. Wang. 2009b. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res 37:D539543. 200 Aurrecoechea, C., M. Heiges, H. Wang, Z. Wang, S. Fischer, P. Rhodes, J. Miller, E. Kraemer, C. J. Stoeckert, Jr., D. S. Roos, and J. C. Kissinger. 2007. ApiDB: integrated resources for the apicomplexan bioinformatics resource center. Nucleic Acids Res 35:D427-430. Avery, O. T., C. M. Macleod, and M. McCarty. 1944. Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types : Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type Iii. J Exp Med 79:137-158. Bai, Y., A. P. Davis, and L. S. Symington. 1999. A novel allele of RAD52 that causes severe DNA repair and recombination deficiencies only in the absence of RAD51 or RAD59. Genetics 153:1117-1130. Bai, Y., and L. S. Symington. 1996. A Rad52 homolog is required for RAD51independent mitotic recombination in Saccharomyces cerevisiae. Genes Dev 10:2025-2037. Baldauf, S. L. 2008. An overview of the phylogeny and diversity of eukaryotes. Journal of Systematics and Evolution 46:263-273. Baldauf, S. L. 2003. The deep roots of eukaryotes. Science 300:1703-1706. Baldauf, S. L., and J. D. Palmer. 1993. Animals and fungi are each others closest relatives - congruent evidence from multiple proteins. Proceedings of the National Academy of Sciences of the United States of America 90:11558-11562. Baldauf, S. L., J. D. Palmer, and W. F. Doolittle. 1996. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proceedings of the National Academy of Sciences of the United States of America 93:7749-7754. Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972-977. Barbier, G., C. Oesterhelt, M. D. Larson, R. G. Halgren, C. Wilkerson, R. M. Garavito, C. Benning, and A. P. M. Weber. 2005. Comparative genomics of two closely related unicellular thermo-acidophilic red algae, Galdieria sulphuraria and Cyanidioschyzon merolae, reveals the molecular basis of the metabolic flexibility of Galdieria sulphuraria and significant differences in carbohydrate metabolism of both algae. Plant Physiology 137:460-474. Barton, N. H., and B. Charlesworth. 1998. Why sex and recombination? Science 281:1986-1990. Baudat, F., and S. Keeney. 2001. Meiotic recombination: Making and breaking go hand in hand. Current Biology 11:R45-R48. Bell, G. 1982. The Masterpiece of Nature: The Evolution and Genetics of Sexuality. University of California Press, Berkeley. Bergerat, A., B. de Massy, D. Gadelle, P. C. Varoutas, A. Nicolas, and P. Forterre. 1997. An atypical topoisomerase II from Archaea with implications for meiotic recombination. Nature 386:414-417. 201 Bernstein, C., and H. Bernstein. 1991. Aging, Sex and DNA Repair. Academic Press, San Diego. Bernstein, C., and V. Johns. 1989. Sexual Reproduction as a Response to H2o2 Damage in Schizosaccharomyces pombe. Journal of Bacteriology 171:1893-1897. Bernstein, H., and C. Bernstein. 2010. Evolutionary Origin of Recombination during Meiosis. Bioscience 60:498-505. Bernstein, H., H. C. Byerly, F. A. Hopf, and R. E. Michod. 1984. Origin of Sex. Journal of Theoretical Biology 110:323-351. Berriman, M.E. GhedinC. Hertz-FowlerG. BlandinH. RenauldD. C. BartholomeuN. J. LennardE. CalerN. E. HamlinB. HaasW. BohmeL. HannickM. A. AslettJ. ShallomL. MarcelloL. H. HouB. WicksteadU. C. M. AlsmarkC. ArrowsmithR. J. AtkinA. J. BarronF. BringaudK. BrooksM. CarringtonI. CherevachT. J. ChillingworthC. ChurcherL. N. ClarkC. H. CortonA. CroninR. M. DaviesJ. DoggettA. DjikengT. FeldblyumM. C. FieldA. FraserI. GoodheadZ. HanceD. HarperB. R. HarrisH. HauserJ. HostetterA. IvensK. JagelsD. JohnsonJ. JohnsonK. JonesA. X. KerhornouH. KooN. LarkeS. LandfearC. LarkinV. LeechA. LineA. LordA. MacLeodP. J. MooneyS. MouleD. M. A. MartinG. W. MorganK. MungallH. NorbertczakD. OrmondG. PaiC. S. PeacockJ. PetersonM. A. QuailE. RabbinowitschM. A. RajandreamC. ReitterS. L. SalzbergM. SandersS. SchobelS. SharpM. SimmondsA. J. SimpsonL. TaltonC. M. R. TurnerA. TaitA. R. TiveyS. Van AkenD. WalkerD. WanlessS. L. WangB. WhiteO. WhiteS. WhiteheadJ. WoodwardJ. WortmanM. D. AdamsT. M. EmbleyK. GullE. UlluJ. D. BarryA. H. FairlambF. OpperdoesB. G. BarretJ. E. DonelsonN. HallC. M. FraserS. E. Melville, and N. M. El-Sayed. 2005. The genome of the African trypanosome Trypanosoma brucei. Science 309:416-422. Bishop, D. K. 1994. recA homologs Dmc1 and Rad51 interact to form multiple nuclearcomplexes prior to meiotic chromosome synapsis. Cell 79:1081-1092. Bishop, D. K., Y. Nikolski, J. Oshiro, J. Chon, M. Shinohara, and X. Chen. 1999. High copy number suppression of the meiotic arrest caused by a dmc1 mutation: REC114 imposes an early recombination block and RAD54 promotes a DMC1independent DSB repair pathway. Genes Cells 4:425-444. Bishop, D. K., D. Park, L. Xu, and N. Kleckner. 1992. DMC1: a meiosis-specific yeast homolog of E. coli recA required for recombination, synaptonemal complex formation, and cell cycle progression. Cell 69:439-456. Bishop, D. K., and D. Zickler. 2004. Early decision; meiotic crossover interference prior to stable strand exchange and synapsis. Cell 117:9-15. Bleuyard, J. Y., M. E. Gallego, and C. I. White. 2006. Recent advances in understanding of the DNA double-strand break repair machinery of plants. DNA Repair (Amst) 5:1-12. Blow, J. J., and A. Dutta. 2005. Preventing re-replication of chromosomal DNA. Nat Rev Mol Cell Biol 6:476-486. 202 Bochkareva, E., S. Korolev, S. P. Lees-Miller, and A. Bochkarev. 2002. Structure of the RPA trimerization core and its role in the multistep DNA-binding mechanism of RPA. EMBO J 21:1855-1863. Bonhoeffer, S., C. Chappey, N. T. Parkin, J. M. Whitcomb, and C. J. Petropoulos. 2004. Evidence for positive epistasis in HIV-1. Science 306:1547-1550. Borner, G. V., N. Kleckner, and N. Hunter. 2004. Crossover/noncrossover differentiation, synaptonemal complex formation, and regulatory surveillance at the leptotene/zygotene transition of meiosis. Cell 117:29-45. Borts, R. H., S. R. Chambers, and M. F. Abdullah. 2000. The many faces of mismatch repair in meiosis. Mutat Res 451:129-150. Bowler, C., A. E. Allen, J. H. Badger, J. Grimwood, K. Jabbari, A. Kuo, U. Maheswari, C. Martens, F. Maumus, R. P. Otillar, E. Rayko, A. Salamov, K. Vandepoele, B. Beszteri, A. Gruber, M. Heijde, M. Katinka, T. Mock, K. Valentin, F. Verret, J. A. Berges, C. Brownlee, J. P. Cadoret, A. Chiovitti, C. J. Choi, S. Coesel, A. De Martino, J. C. Detter, C. Durkin, A. Falciatore, J. Fournet, M. Haruta, M. J. Huysman, B. D. Jenkins, K. Jiroutova, R. E. Jorgensen, Y. Joubert, A. Kaplan, N. Kroger, P. G. Kroth, J. La Roche, E. Lindquist, M. Lommer, V. Martin-Jezequel, P. J. Lopez, S. Lucas, M. Mangogna, K. McGinnis, L. K. Medlin, A. Montsant, M. P. Oudot-Le Secq, C. Napoli, M. Obornik, M. S. Parker, J. L. Petit, B. M. Porcel, N. Poulsen, M. Robison, L. Rychlewski, T. A. Rynearson, J. Schmutz, H. Shapiro, M. Siaut, M. Stanley, M. R. Sussman, A. R. Taylor, A. Vardi, P. von Dassow, W. Vyverman, A. Willis, L. S. Wyrwicz, D. S. Rokhsar, J. Weissenbach, E. V. Armbrust, B. R. Green, Y. Van de Peer, and I. V. Grigoriev. 2008. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature 456:239-244. Brill, S. J., and B. Stillman. 1991. Replication factor-A from Saccharomyces cerevisiae is encoded by three essential genes coordinately expressed at S phase. Genes Dev 5:1589-1600. Brinkmann, H., and H. Philippe. 2007. The diversity of eukaryotes and the root of the eukaryotic tree. Pp. 20-37. Eukaryotic Membranes and Cytoskeleton: Origins and Evolution. Springer-Verlag Berlin, Berlin. Brocks, J. J., G. A. Logan, R. Buick, and R. E. Summons. 1999. Archean molecular fossils and the early rise of eukaryotes. Science 285:1033-1036. Broek, D., R. Bartlett, K. Crawford, and P. Nurse. 1991. Involvement of p34cdc2 in establishing the dependency of S phase on mitosis. Nature 349:388-393. Brown, J. R., and W. F. Doolittle. 1995. Root of the Universal Tree of Life Based on Ancient Aminoacyl-Transfer-Rna Synthetase Gene Duplications. Proceedings of the National Academy of Sciences of the United States of America 92:2441-2445. Brown, J. W., and U. Sorhannus. 2010. A molecular genetic timescale for the diversification of autotrophic stramenopiles (Ochrophyta): substantive underestimation of putative fossil ages. PLoS One 5. Brush, G. S. 2002. Recombination functions of Replication Protein A. Current Organic Chemistry 6:795-813. 203 Burgess, S. M., N. Kleckner, and B. M. Weiner. 1999. Somatic pairing of homologs in budding yeast: existence and modulation. Genes Dev 13:1627-1641. Burki, F., A. Kudryavtsev, M. V. Matz, G. V. Aglyamova, S. Bulman, M. Fiers, P. J. Keeling, and J. Pawlowski. Evolution of Rhizaria: new insights from phylogenomic analysis of uncultivated protists. BMC Evol Biol 10:377. Burki, F., and J. Pawlowski. 2006. Monophyly of Rhizaria and multigene phylogeny of unicellular bikonts. Mol Biol Evol 23:1922-1930. Burki, F., K. Shalchian-Tabrizi, M. Minge, A. Skjaeveland, S. I. Nikolaev, K. S. Jakobsen, and J. Pawlowski. 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS One 2:e790. Burki, F., K. Shalchian-Tabrizi, and J. Pawlowski. 2008. Phylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes. Biol Lett 4:366-369. Callan, H. G. 1972. Replication of DNA in Chromosomes of Eukaryotes. Proceedings of the Royal Society of London Series B-Biological Sciences 181:19-&. Camerini-Otero, R. D., and P. Hsieh. 1995. Homologous recombination proteins in prokaryotes and eukaryotes. Annu Rev Genet 29:509-552. Cameron, A. C., and P. K. Trivedi. 1998. Regression analysis of count data. Cambridge University Press, Cambridge, UK ; New York, NY, USA. Cardoso, R. A., L. T. Pires, T. D. Zucchi, F. D. Zucchi, and T. M. Zucchi. Mitotic crossing-over induced by two commercial herbicides in diploid strains of the fungus Aspergillus nidulans. Genet Mol Res 9:231-238. Carlton, J. M., R. P. Hirt, J. C. Silva, A. L. Delcher, M. Schatz, Q. Zhao, J. R. Wortman, S. L. Bidwell, U. C. Alsmark, S. Besteiro, T. Sicheritz-Ponten, C. J. Noel, J. B. Dacks, P. G. Foster, C. Simillion, Y. Van de Peer, D. Miranda-Saavedra, G. J. Barton, G. D. Westrop, S. Muller, D. Dessi, P. L. Fiori, Q. Ren, I. Paulsen, H. Zhang, F. D. Bastida-Corcuera, A. Simoes-Barbosa, M. T. Brown, R. D. Hayes, M. Mukherjee, C. Y. Okumura, R. Schneider, A. J. Smith, S. Vanacova, M. Villalvazo, B. J. Haas, M. Pertea, T. V. Feldblyum, T. R. Utterback, C. L. Shu, K. Osoegawa, P. J. de Jong, I. Hrdy, L. Horvathova, Z. Zubacova, P. Dolezal, S. B. Malik, J. M. Logsdon, Jr., K. Henze, A. Gupta, C. C. Wang, R. L. Dunne, J. A. Upcroft, P. Upcroft, O. White, S. L. Salzberg, P. Tang, C. H. Chiu, Y. S. Lee, T. M. Embley, G. H. Coombs, J. C. Mottram, J. Tachezy, C. M. Fraser-Liggett, and P. J. Johnson. 2007. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315:207-212. Carpenter, A. T. C. 1987. Gene Conversion, Recombination Nodules, and the Initiation of Meiotic Synapsis. Bioessays 6:232-236. Cavalier-Smith, T. 1987a. The origin of eukaryotic and archaebacterial cells. Ann N Y Acad Sci 503:17-54. Cavalier-Smith, T. 2002a. The phagotrophic origin of eukaryotes and phylogenetic classification of protozoa. International Journal of Systematic and Evolutionary Microbiology 52:297-354. 204 Cavalier-Smith, T. 2003a. Protist phylogeny and the high-level classification of Protozoa. European Journal of Protistology 39:338-348. Cavalier-Smith, T. 1987b. The origin of cells: A symbiosis between genes, catalysts, and membranes. Cold Spring Harb Symp Quant Biol 52:805-824. Cavalier-Smith, T. 2004. Only six kingdoms of life. Proc Biol Sci 271:1251-1262. Cavalier-Smith, T. 2009. Kingdoms Protozoa and Chromista and the eozoan root of the eukaryotic tree. Biol Lett 6:342-345. Cavalier-Smith, T. 1981a. Eukaryotic kingdoms: Seven or nine? Biosystems 14:461-481. Cavalier-Smith, T. 1975. Origin of Nuclei and of Eukaryotic Cells. Nature 256:463-468. Cavalier-Smith, T. 2002b. Chloroplast evolution: secondary symbiogenesis and multiple losses. Curr Biol 12:R62-64. Cavalier-Smith, T. 1988. Origin of the cell nucleus. Bioessays 9:72-78. Cavalier-Smith, T. 1989. Molecular phylogeny. Archaebacteria and Archezoa. Nature 339:l00-01. Cavalier-Smith, T. 2002c. The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification. Int J Syst Evol Microbiol 52:7-76. Cavalier-Smith, T. 2003b. Genomic reduction and evolution of novel genetic membranes and protein-targeting machinery in eukaryote-eukaryote chimaeras (meta-algae). Philos Trans R Soc Lond B Biol Sci 358:109-133; discussion 133-104. Cavalier-Smith, T. 2010. Origin of the cell nucleus, mitosis and sex: Roles of intracellular coevolution. Biol Direct 5:7. Cavalier-Smith, T. 2002d. Origins of the machinery of recombination and sex. Heredity 88:125-141. Cavalier-Smith, T. 1981b. The origin and early evolution of the eukaryotic cell in M. J. Carlile, J. F. Collins, and B. E. B. Moseley, eds. Molecular and Cellular Aspects of Microbial Evolution. Cambridge University Press, Cambridge, UK. Cavalier-Smith, T. 1987c. The origin of fungi and pseudofungi. Pp. 339-353 in A. D. M. Rayner, ed. Evolutionary biology of fungi. Cambridge Univ. Press, Cambridge. Cavalier-Smith, T., and E. E. Chao. 2010. Phylogeny and evolution of Apusomonadida (Protozoa: Apusozoa): New genera and species. Protist. Cavalier-Smith, T., and E. E. Chao. 2003a. Phylogeny and classification of phylum Cercozoa (Protozoa). Protist 154:341-358. Cavalier-Smith, T., and E. E. Chao. 2003b. Phylogeny of choanozoa, apusozoa, and other protozoa and early eukaryote megaevolution. J Mol Evol 56:540-563. 205 Cavalier-Smith, T., and E. E. Chao. 2003c. Molecular phylogeny of centrohelid heliozoa, a novel lineage of bikont eukaryotes that arose by ciliary loss. J Mol Evol 56:387396. Cavalier-Smith, T., and E. E. Chao. 2006. Phylogeny and megasystematics of phagotrophic heterokonts (kingdom Chromista). J Mol Evol 62:388-420. Chandley, A. C. 1966. Studies on Oogenesis in Drosophila Melanogaster with 3hThymidine Label. Experimental Cell Research 44:201-&. Chang, Y. X., L. Gong, W. Y. Yuan, X. W. Li, G. X. Chen, X. H. Li, Q. F. Zhang, and C. Y. Wu. 2009. Replication Protein A (RPA1a) is required for meiotic and somatic DNA repair but is dispensable for DNA replication and homologous recombination in rice. Plant Physiology 151:2162-2173. Charlesworth, B. 1991. Evolution - When to Be Diploid. Nature 351:273-274. Charlesworth, B., and N. H. Barton. 1996. Recombination load associated with selection for increased recombination. Genetical Research 67:27-41. Chen, G., S. S. F. Yuan, W. Liu, Y. Xu, K. Trujillo, B. W. Song, F. Cong, S. P. Goff, Y. Wu, R. Arlinghaus, D. Baltimore, P. J. Gasser, M. S. Park, P. Sung, and E. Lee. 1999. Radiation-induced assembly of Rad51 and Rad52 recombination complex requires ATM and c-Abl. Journal of Biological Chemistry 274:12748-12752. Chen, L. T., T. P. Ko, Y. C. Chang, K. A. Lin, C. S. Chang, A. H. J. Wang, and T. F. Wang. 2007. Crystal structure of the left-handed archaeal RadA helical filament: identification of a functional motif for controlling quaternary structures and enzymatic functions of RecA family proteins. Nucleic Acids Research 35:17871801. Chen, Y. K., C. H. Leng, H. Olivares, M. H. Lee, Y. C. Chang, W. M. Kung, S. C. Ti, Y. H. Lo, A. H. Wang, C. S. Chang, D. K. Bishop, Y. P. Hsueh, and T. F. Wang. 2004. Heterodimeric complexes of Hop2 and Mnd1 function with Dmc1 to promote meiotic homolog juxtaposition and strand assimilation. Proc Natl Acad Sci U S A 101:10572-10577. Chen, Z. C., H. J. Yang, and N. P. Pavletich. 2008. Mechanism of homologous recombination from the RecA-ssDNA/dsDNA structures. Nature 453:489-U483. Chi, P., Y. Kwon, C. Seong, A. Epshtein, I. Lam, P. Sung, and H. L. Klein. 2006. Yeast recombination factor Rdh54 functionally interacts with the Rad51 recombinase and catalyzes Rad51 removal from DNA. J Biol Chem 281:26268-26279. Churchill, F. B. 1970. Hertwig, Weismann, and the meaning of the reduction division, circa 1890. Isis 61:429-457. Clark, A. J., and S. J. Sandler. 1994. Homologous genetic recombination: the pieces begin to fall into place. Crit Rev Microbiol 20:125-142. Cleveland, L. R. 1956. Brief Accounts of the Sexual Cycles of the Flagellates of Cryptocercus. Journal of Protozoology 3:161-180. Cleveland, L. R. 1947. The Origin and Evolution of Meiosis. Science 105:287-289. 206 Cole, E. S., D. Cassidy-Hanley, J. Hemish, J. Tuan, and P. J. Bruns. 1997. A mutational analysis of conjugation in Tetrahymena thermophila. 1. Phenotypes affecting early development: meiosis to nuclear selection. Dev Biol 189:215-232. Colegrave, N., O. Kaltz, and G. Bell. 2002. The ecology and genetics of fitness in Chlamydomonas. VIII. The dynamics of adaptation to novel environments after a single episode of sex. Evolution 56:14-21. Collins, J. E., C. Wright, C. A. Edwards, M. P. Davis, J. A. Grinham, C. G. Cole, M. E. Goward, B. Aguado, M. Mallya, Y. Mokrab, E. J. Huckle, D. M. Beare, and I. Dunham. 2004. A genome annotation-driven approach to cloning the human ORFeome. Genome Biology 5:11. Conway, A. B., T. W. Lynch, Y. Zhang, G. S. Fortin, C. W. Fung, L. S. Symington, and P. A. Rice. 2004. Crystal structure of a Rad51 filament. Nature Structural & Molecular Biology 11:791-796. Corbett, K. D., and J. M. Berger. 2003. Structure of the topoisomerase VI-B subunit: implications for type II topoisomerase mechanism and evolution. EMBO J 22:151-163. Cox, C. J., P. G. Foster, R. P. Hirt, S. R. Harris, and T. M. Embley. 2008. The archaebacterial origin of eukaryotes. Proc Natl Acad Sci U S A 105:20356-20361. Cox, M. M. 2007. Motoring along with the bacterial RecA protein. Nature Reviews Molecular Cell Biology 8:127-138. Cox, M. M. 2003. The bacterial RecA protein as a motor protein. Annual Review of Microbiology 57:551-577. Cox, M. M. 1993. Relating biochemistry to biology: how the recombinational repair function of RecA protein is manifested in its molecular properties. Bioessays 15:617-623. Crow, J. F., and M. Kimura. 1965. Evolution in sexual and asexual populations. American Naturalist 99:439-450. d'Erfurth, I., S. Jolivet, N. Froger, O. Catrice, M. Novatchkova, and R. Mercier. 2009. Turning meiosis into mitosis. PLoS Biol 7:e1000124. Dacks, J., and A. J. Roger. 1999. The first sexual lineage and the relevance of facultative sex. J Mol Evol 48:779-783. Dacks, J. B., and W. F. Doolittle. 2001. Reconstructing/deconstructing the earliest eukaryotes: How comparative genomics can help. Cell 107:419-425. Dahmann, C., J. F. Diffley, and K. A. Nasmyth. 1995. S-phase-promoting cyclindependent kinases prevent re-replication by inhibiting the transition of replication origins to a pre-replicative state. Curr Biol 5:1257-1269. Darwin, C. 1859. On the origin of species by means of natural selection. J. Murray, London,. 207 Davis, A. P., and L. S. Symington. 2001. The yeast recombinational repair protein Rad59 interacts with Rad52 and stimulates single-strand annealing. Genetics 159:515525. Davis, A. P., and L. S. Symington. 2003. The Rad52-Rad59 complex interacts with Rad51 and Replication Protein A. DNA Repair 2:1127-1134. de la Cruz, J., D. Kressler, and P. Linder. 1999. Unwinding RNA in Saccharomyces cerevisiae: DEAD-box proteins and related families. Trends Biochem Sci 24:192198. DePamphilis, M. L. 1996. DNA replication in eukaryotic cells. Cold Spring Harbor Laboratory Press, [Plainview, New York]. Dernburg, A. F., K. McDonald, G. Moulder, R. Barstead, M. Dresser, and A. M. Villeneuve. 1998. Meiotic recombination in C. elegans initiates by a conserved mechanism and is dispensable for homologous chromosome synapsis. Cell 94:387-398. Diffley, J. F. 1996. Once and only once upon a time: specifying and regulating origins of DNA replication in eukaryotic cells. Genes Dev 10:2819-2830. Doolittle, W. F., C. L. Nesbo, E. Bapteste, and O. Zhaxybayeva. 2008. Lateral gene transfer. Pp. 45-79 in M. Pagel, and A. Pomiankowski, eds. In Evolutionary Genomics and Proteomics. Sinauer. Dudas, A., and M. Chovanec. 2004. DNA double-strand break repair by homologous recombination. Mutation Research-Reviews in Mutation Research 566:131-167. Dunn, C. W., A. Hejnol, D. Q. Matus, K. Pang, W. E. Browne, S. A. Smith, E. Seaver, G. W. Rouse, M. Obst, G. D. Edgecombe, M. V. Sorensen, S. H. D. Haddock, A. Schmidt-Rhaesa, A. Okusu, R. M. Kristensen, W. C. Wheeler, M. Q. Martindale, and G. Giribet. 2008. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745-U745. Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:1-19. Edlind, T. D., J. Li, G. S. Visvesvara, M. H. Vodkin, G. L. McLaughlin, and S. K. Katiyar. 1996. Phylogenetic analysis of beta-tubulin sequences from amitochondrial protozoa. Molecular Phylogenetics and Evolution 5:359-367. Elena, S. F., and R. E. Lenski. 1997. Test of synergistic interactions among deleterious mutations in bacteria. Nature 390:395-398. Embley, T. M., and W. Martin. 2006. Eukaryotic evolution, changes and challenges. Nature 440:623-630. Eme, L., D. Moreira, E. Talla, and C. Brochier-Armanet. 2009. A complex cell division machinery was present in the last common ancestor of eukaryotes. PLoS One 4:e5021. 208 Enomoto, R., T. Kinebuchi, M. Sato, H. Yagi, H. Kurumizaka, and S. Yokoyama. 2006. Stimulation of DNA strand exchange by the human TBPIP/Hop2-Mnd1 complex. Journal of Biological Chemistry 281:5575-5581. Fast, N. M., J. C. Kissinger, D. S. Roos, and P. J. Keeling. 2001. Nuclear-encoded, plastid-targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids. Mol Biol Evol 18:418-426. Feldman, M. W. 1972. Selection for Linkage Modification .1. Random Mating Populations. Theoretical Population Biology 3:324-&. Feldman, M. W., F. B. Christiansen, and L. D. Brooks. 1980. Evolution of Recombination in a Constant Environment. Proceedings of the National Academy of Sciences of the United States of America-Biological Sciences 77:4838-4841. Felsenstein, J. 2004. Inferring Phylogenies. Pp. 664. Sinauer Associates, Inc., Suderland, MA. Fenchel, T., and B. J. Finlay. 2004. The ubiquity of small species: Patterns of local and global diversity. Bioscience 54:777-784. Feng, Q., L. During, A. A. de Mayolo, G. Lettier, M. Lisby, N. Erdeniz, U. H. Mortensen, and R. Rothstein. 2007. Rad52 and Rad59 exhibit both overlapping and distinct functions. DNA Repair 6:27-37. Filippo, J. S., P. Sung, and H. Klein. 2008. Mechanism of eukaryotic homologous recombination. Annual Review of Biochemistry 77:229-257. Firmenich, A. A., M. Elias-Arnanz, and P. Berg. 1995. A novel allele of Saccharomyces cerevisiae RFA1 that is deficient in recombination and repair and suppressible by RAD52. Mol Cell Biol 15:1620-1631. Fisher, R. A. 1930. The genetical theory of natural selection. The Clarendon press, Oxford,. Flaus, A., D. M. A. Martin, G. J. Barton, and T. Owen-Hughes. 2006. Identification of multiple distinct Snf2 subfamilies with conserved structural motifs. Nucleic Acids Research 34:2887-2905. Flemming, W. 1878. Zur Kenntniss der Zelle und ihrer Theilungs-Erscheinungen. Schriften des Naturwissenschaftlichen Vereins für Schleswig-Holstein 3:23-27. Flowers, J. M., S. I. Li, A. Stathos, G. Saxer, E. A. Ostrowski, D. C. Queller, J. E. Strassmann, and M. D. Purugganan. 2010. Variation, Sex, and Social Cooperation: Molecular Population Genetics of the Social Amoeba Dictyostelium discoideum. Plos Genetics 6:-. Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan, and J. Postlethwait. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545. Fortin, G. S., and L. S. Symington. 2002. Mutations in yeast Rad51 that partially bypass the requirement for Rad55 and Rad57 in DNA repair by increasing the stability of Rad51-DNA complexes. Embo Journal 21:3160-3170. 209 Fung, C. W., G. S. Fortin, S. E. Peterson, and L. S. Symington. 2006. The rad51-K191R ATPase-defective mutant is impaired for presynaptic filament formation. Mol Cell Biol 26:9544-9554. Fung, C. W., A. M. Mozlin, and L. S. Symington. 2009. Suppression of the doublestrand-break-repair defect of the Saccharomyces cerevisiae rad57 mutant. Genetics 181:1195-1206. Gabaldon, T., and M. A. Huynen. 2003. Reconstruction of the proto-mitochondrial metabolism. Science 301:609. Game, J. C., and R. K. Mortimer. 1974. A genetic study of x-ray sensitive mutants in yeast. Mutat Res 24:281-292. Gasior, S. L., H. Olivares, U. Ear, D. M. Hari, R. Weichselbaum, and D. K. Bishop. 2001. Assembly of RecA-like recombinases: Distinct roles for mediator proteins in mitosis and meiosis. Proceedings of the National Academy of Sciences of the United States of America 98:8411-8418. Gasior, S. L., A. K. Wong, Y. Kora, A. Shinohara, and D. K. Bishop. 1998. Rad52 associates with RPA and functions with rad55 and rad57 to assemble meiotic recombination complexes. Genes Dev 12:2208-2221. Germot, A., H. Philippe, and H. LeGuyader. 1997. Evidence for loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 in Nosema locustae. Molecular and Biochemical Parasitology 87:159-168. Gogarten, J. P., H. Kibak, P. Dittrich, L. Taiz, E. J. Bowman, B. J. Bowman, M. F. Manolson, R. J. Poole, T. Date, T. Oshima, J. Konishi, K. Denda, and M. Yoshida. 1989. Evolution of the Vacuolar H+-Atpase - Implications for the Origin of Eukaryotes. Proceedings of the National Academy of Sciences of the United States of America 86:6661-6665. Goh, C. S., A. A. Bogan, M. Joachimiak, D. Walther, and F. E. Cohen. 2000. Coevolution of proteins with their interaction partners. Journal of Molecular Biology 299:283-293. Gray, M. W. 1989. The evolutionary origins of organelles. Trends Genet 5:294-299. Gray, M. W., and W. F. Doolittle. 1982. Has the endosymbiont hypothesis been proven? Microbiol Rev 46:1-42. Griffith, F. 1928. The Significance of Pneumococcal Types. J Hyg (Lond) 27:113-159. Griffiths, A. J. F., J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart. 2000. Introduction to genetic analysis. W.H. Freeman and Co., New York. Grigorescu, A. A., J. H. A. Vissers, D. Ristic, Y. Z. Pigli, T. W. Lynch, C. Wyman, and P. A. Rice. 2009. Inter-subunit interactions that coordinate Rad51s activities. Nucleic Acids Research 37:557-567. Grishchuk, A. L., R. Kraehenbuehl, M. Molnar, O. Fleck, and J. Kohli. 2004. Genetic and cytological characterization of the RecA-homologous proteins Rad51 and Dmc1 of Schizosaccharomyces pombe. Curr Genet 44:317-328. 210 Gruber, S., C. H. Haering, and K. Nasmyth. 2003. Chromosomal cohesin forms a ring. Cell 112:765-777. Guindon, S., F. Delsuc, J. F. Dufayard, and O. Gascuel. 2009. Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol 537:113-137. Hackett, J. D., H. S. Yoon, S. Li, A. Reyes-Prieto, S. E. Rummele, and D. Bhattacharya. 2007. Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolates. Mol Biol Evol 24:1702-1713. Hall, T. Z. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids Symp. Ser. 41:95-98. Hamilton, W. J. 1999. Evolution of Sex. Oxford University Press, Oxford. Hampl, V., L. Hug, J. W. Leigh, J. B. Dacks, B. F. Lang, A. G. Simpson, and A. J. Roger. 2009. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic "supergroups". Proc Natl Acad Sci U S A 106:3859-3864. Han, T. M., and B. Runnegar. 1992. Megascopic eukaryotic algae from the 2.1-billionyear-old Negaunee iron-formation, Michigan. Science 257:232-235. Hartung, F., K. J. Angelis, A. Meister, I. Schubert, M. Melzer, and H. Puchta. 2002. An archaebacterial topoisomerase homolog not present in other eukaryotes is indispensable for cell proliferation of plants. Curr Biol 12:1787-1791. Hartung, F., and H. Puchta. 2001. Molecular characterization of homologues of both subunits A (SPO11) and B of the archaebacterial topoisomerase 6 in plants. Gene 271:81-86. Hashimoto, T., Y. Nakamura, T. Kamaishi, and M. Hasegawa. 1997. Early evolution of eukaryotes inferred from protein phylogenies of translation elongation factors 1 alpha and 2. Archiv Fur Protistenkunde 148:287-295. Hauf, S., and Y. Watanabe. 2004. Kinetochore orientation in mitosis and meiosis. Cell 119:317-327. Hayles, J., D. Fisher, A. Woollard, and P. Nurse. 1994. Temporal order of S phase and mitosis in fission yeast is determined by the state of the p34cdc2-mitotic B cyclin complex. Cell 78:813-822. Hays, S. L., A. A. Firmenich, and P. Berg. 1995. Complex formation in yeast doublestrand break repair: participation of Rad51, Rad52, Rad55, and Rad57 proteins. Proc Natl Acad Sci U S A 92:6925-6929. Hays, S. L., A. A. Firmenich, P. Massey, R. Banerjee, and P. Berg. 1998. Studies of the interaction between Rad52 protein and the yeast single-stranded DNA binding protein RPA. Mol Cell Biol 18:4400-4406. 211 Heiges, M., H. Wang, E. Robinson, C. Aurrecoechea, X. Gao, N. Kaluskar, P. Rhodes, S. Wang, C. Z. He, Y. Su, J. Miller, E. Kraemer, and J. C. Kissinger. 2006. CryptoDB: a Cryptosporidium bioinformatics resource update. Nucleic Acids Res 34:D419-422. Henry, J. M., R. Camahort, D. A. Rice, L. Florens, S. K. Swanson, M. P. Washburn, and J. L. Gerton. 2006. Mnd1/Hop2 facilitates Dmc1-dependent interhomolog crossover formation in meiosis of budding yeast. Mol Cell Biol 26:2913-2923. Herskowitz, I. 1988. Life-Cycle of the Budding Yeast Saccharomyces cerevisiae. Microbiological Reviews 52:536-553. Hill, W. G., and Robertson. 1966. Effect of Linkage on Limits to Artificial Selection. Genetical Research 8:269-&. Hirt, R. P., B. Healy, C. R. Vossbrinck, E. U. Canning, and T. M. Embley. 1997. A mitochondrial Hsp70 orthologue in Vairimorpha necatrix: Molecular evidence that microsporidia once contained mitochondria. Current Biology 7:995-998. Hirt, R. P., J. M. Logsdon, B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to Fungi: Evidence from the largest subunit of RNA Polymerase II and other proteins. Proceedings of the National Academy of Sciences of the United States of America 96:580-585. Hoffmann, E. R., P. V. Shcherbakova, T. A. Kunkel, and R. H. Borts. 2003. MLH1 mutations differentially affect meiotic functions in Saccharomyces cerevisiae. Genetics 163:515-526. Holzen, T. M., P. P. Shah, H. A. Olivares, and D. K. Bishop. 2006. Tid1/Rdh54 promotes dissociation of Dmc1 from nonrecombinogenic sites in meiotic chromatin. Genes & Development 20:2593-2604. Horiike, T., K. Hamada, S. Kanaya, and T. Shinozawa. 2001. Origin of eukaryotic cell nuclei by symbiosis of Archaea in Bacteria is revealed by homology-hit analysis. Nature Cell Biology 3:210-214. Hunter, N., and R. H. Borts. 1997. Mlh1 is unique among mismatch repair proteins in its ability to promote crossing-over during meiosis. Genes Dev 11:1573-1582. Hurst, L. D., and P. Nurse. 1991. A Note on the Evolution of Meiosis. Journal of Theoretical Biology 150:561-563. Huxley, J. 1942. Evolution, the modern synthesis. G. Allen & Unwin ltd, London,. Ishibashi, T., S. Kimura, and K. Sakaguchi. 2006. A higher plant has three different types of RPA heterotrimeric complex. J Biochem 139:99-104. Ishibashi, T., A. Koga, T. Yamamoto, Y. Uchiyama, Y. Mori, J. Hashimoto, S. Kimura, and K. Sakaguchi. 2005. Two types of Replication Protein A in seed plants. FEBS J 272:3270-3281. Ito, M., and M. H. Takegami. 1982. Commitment of Mitotic Cells to Meiosis during the G2 Phase of Pre-Meiosis. Plant and Cell Physiology 23:943-952. 212 Iwabe, N., K. Kuma, M. Hasegawa, S. Osawa, and T. Miyata. 1989. Evolutionary Relationship of Archaebacteria, Eubacteria, and Eukaryotes Inferred from Phylogenetic Trees of Duplicated Genes. Proceedings of the National Academy of Sciences of the United States of America 86:9355-9359. Iwabe, N., K. Kuma, H. Kishino, M. Hasegawa, and T. Miyata. 1991. Evolution of RnaPolymerases and Branching Patterns of the 3 Major Groups of Archaebacteria. Journal of Molecular Evolution 32:70-78. Janouskovec, J., A. Horak, M. Obornik, J. Lukes, and P. J. Keeling. 2010. A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proceedings of the National Academy of Sciences of the United States of America 107:10949-10954. Janssens, F. A. 1909. La Theorie de la chiasmatypie. Noouvelle interpretation des cineses de maturation. La Cellule 25:387-411. John, B. 1990. Meiosis. Cambridge University Press, Cambridge [England] ; New York. Johnson, R. D., and L. S. Symington. 1995. Functional differences and interactions among the putative RecA homologs Rad51, Rad55, and Rad57. Mol Cell Biol 15:4843-4850. Jorgensen, A., and E. Sterud. 2007. Phylogeny of Spironucleus (Eopharyngia : Diplomonadida : Hexamitinae). Protist 158:247-254. Kadyk, L. C., and L. H. Hartwell. 1992. Sister chromatids are preferred over homologs as substrates for recombination repair in Saccharomyces cerevisiae. Genetics 132:387-402. Kaltz, O., and G. Bell. 2002. The ecology and genetics of fitness in Chlamydomonas. XII. Repeated sexual episodes increase rates of adaptation to novel environments. Evolution 56:1743-1753. Kamaishi, T., T. Hashimoto, Y. Nakamura, F. Nakamura, S. Murata, N. Okada, K. Okamoto, M. Shimizu, and M. Hasegawa. 1996. Protein phylogeny of translation elongation factor EF-1 alpha suggests microsporidians are extremely ancient eukaryotes. Journal of Molecular Evolution 42:257-263. Kathiresan, A., G. S. Khush, and J. Bennett. 2002. Two rice DMC1 genes are differentially expressed during meiosis and during haploid and diploid mitosis. Sexual Plant Reproduction 14:257-267. Katinka, M. D., S. Duprat, E. Cornillot, G. Metenier, F. Thomarat, G. Prensier, V. Barbe, E. Peyretaillade, P. Brottier, P. Wincker, F. Delbac, H. El Alaoui, P. Peyret, W. Saurin, M. Gouy, J. Weissenbach, and C. P. Vivares. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414:450-453. Keane, T. M., C. J. Creevey, M. M. Pentony, T. J. Naughton, and J. O. McLnerney. 2006. Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. Bmc Evolutionary Biology 6:17. 213 Keeling, P. J. 2010. The endosymbiotic origin, diversification and fate of plastids. Philosophical Transactions of the Royal Society B-Biological Sciences 365:729748. Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Molecular Biology and Evolution 13:1297-1305. Keeney, S., C. N. Giroux, and N. Kleckner. 1997. Meiosis-specific DNA double-strand breaks are catalyzed by Spo11, a member of a widely conserved protein family. Cell 88:375-384. Kiianitsa, K., J. A. Solinger, and W. D. Heyer. 2002. Rad54 protein exerts diverse modes of ATPase activity on duplex DNA partially and fully covered with Rad51 protein. J Biol Chem 277:46205-46215. Kim, E., A. G. B. Simpson, and L. E. Graham. 2006. Evolutionary relationships of apusomonads inferred from taxon-rich analyses of 6 nuclear encoded genes. Molecular Biology and Evolution 23:2455-2466. Kimura, S., and K. Sakaguchi. 2006. DNA repair in plants. Chem Rev 106:753-766. King, N., M. J. Westbrook, S. L. Young, A. Kuo, M. Abedin, J. Chapman, S. Fairclough, U. Hellsten, Y. Isogai, I. Letunic, M. Marr, D. Pincus, N. Putnam, A. Rokas, K. J. Wright, R. Zuzow, W. Dirks, M. Good, D. Goodstein, D. Lemons, W. Li, J. B. Lyons, A. Morris, S. Nichols, D. J. Richter, A. Salamov, J. G. Sequencing, P. Bork, W. A. Lim, G. Manning, W. T. Miller, W. McGinnis, H. Shapiro, R. Tjian, I. V. Grigoriev, and D. Rokhsar. 2008. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451:783-788. Kirk, D. L., and M. M. Kirk. 1986. Heat-Shock Elicits Production of Sexual Inducer in Volvox. Science 231:51-54. Kleckner, N. 1996. Meiosis: how could it work? Proc Natl Acad Sci U S A 93:81678174. Klein, H. L. 1997. RDH54, a RAD54 homologue in Saccharomyces cerevisiae, is required for mitotic diploid-specific recombination and repair and for meiosis. Genetics 147:1533-1543. Knoll, A. H. 2003. Life on a young planet : the first three billion years of evolution on Earth. Princeton University Press, Princeton, N.J. Kolas, N. K., and D. Durocher. 2006. DNA repair: DNA polymerase zeta and Rev1 break in. Curr Biol 16:R296-299. Kolisko, M., I. Cepicka, V. Hampl, J. Leigh, A. J. Roger, J. Kulda, A. G. Simpson, and J. Flegr. 2008. Molecular phylogeny of diplomonads and enteromonads based on SSU rRNA, alpha-tubulin and HSP90 genes: implications for the evolutionary history of the double karyomastigont of diplomonads. BMC Evol Biol 8:205. Komori, K., T. Miyata, H. Daiyasu, H. Toh, H. Shinagawa, and Y. Ishino. 2000. Domain analysis of an archaeal RadA protein for the strand exchange activity. Journal of Biological Chemistry 275:33791-33797. 214 Kondrashov, A. S. 1993. Classification of Hypotheses on the Advantage of Amphimixis. Journal of Heredity 84:372-387. Kondrashov, A. S. 1984. Deleterious Mutations as an Evolutionary Factor .1. The Advantage of Recombination. Genetical Research 44:199-217. Kondrashov, A. S. 1994. The Asexual Ploidy Cycle and the Origin of Sex. Nature 370:213-216. Kondrashov, A. S. 1988. Deleterious mutations and the evolution of sexual reproduction. Nature 336:435-440. Kondrashov, A. S., and J. F. Crow. 1991. Haploidy or diploidy: which is better? Nature 351:314-315. Koonin, E. V. 2010. The origin and early evolution of eukaryotes in the light of phylogenomics. Genome Biology 11:-. Kouyos, R. D., S. P. Otto, and S. Bonhoeffer. 2006. Effect of varying epistasis on the evolution of recombination. Genetics 173:589-597. Krejci, L., J. Damborsky, B. Thomsen, M. Duno, and C. Bendixen. 2001. Molecular dissection of interactions between Rad51 and members of the recombinationrepair group. Molecular and Cellular Biology 21:966-976. Krejci, L., B. Song, W. Bussen, R. Rothstein, U. H. Mortensen, and P. Sung. 2002. Interaction with Rad51 is indispensable for recombination mediator function of Rad52. J Biol Chem 277:40132-40141. Krogh, B. O., and L. S. Symington. 2004. Recombination proteins in yeast. Annu. Rev. Genet. 38:233-271. Kudoh, A., S. Iwahori, Y. Sato, S. Nakayama, H. Isomura, T. Murata, and T. Tsurumi. 2009. Homologous recombinational repair factors are recruited and loaded onto the viral DNA genome in Epstein-Barr virus replication compartments. Journal of Virology 83:6641-6651. Kuhn, C. D., S. R. Geiger, S. Baumli, M. Gartmann, J. Gerber, S. Jennebach, T. Mielke, H. Tschochner, R. Beckmann, and P. Cramer. 2007. Functional architecture of RNA Polymerase I. Cell 131:1260-1272. Lake, J. A., E. Henderson, M. Oakes, and M. W. Clark. 1984. Eocytes - a New Ribosome Structure Indicates a Kingdom with a Close Relationship to Eukaryotes. Proceedings of the National Academy of Sciences of the United States of America-Biological Sciences 81:3786-3790. Lake, J. A., and M. C. Rivera. 1994. Was the Nucleus the 1st Endosymbiont? Proceedings of the National Academy of Sciences of the United States of America 91:2880-2881. Lartillot, N., T. Lepage, and S. Blanquart. 2009. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286-2288. 215 Latypov, V., M. Rothenberg, A. Lorenz, G. Octobre, O. Csutak, E. Lehmann, J. Loidl, and J. Kohli. 2010. Roles of Hop1 and Mek1 in Meiotic Chromosome Pairing and Recombination Partner Choice in Schizosaccharomyces pombe. Molecular and Cellular Biology 30:1570-1581. Lederberg, J., E. M. Lederberg, N. D. Zinder, and E. R. Lively. 1951. Recombination analysis of bacterial heredity. Cold Spring Harb Symp Quant Biol 16:413-443. Lederberg, J., and E. L. Tatum. 1946. Gene recombination in Escherichia coli. Nature 158:558. Lee, C., B. Hong, J. M. Choi, Y. Kim, S. Watanabe, Y. Ishimi, T. Enomoto, S. Tada, and Y. Cho. 2004. Structural basis for inhibition of the replication licensing factor Cdt1 by geminin. Nature 430:913-917. Lee, K. Y., and K. J. Myung. 2008. PCNA modifications for regulation of postreplication repair pathways. Molecules and Cells 26:5-11. Leipe, D. D., J. H. Gunderson, T. A. Nerad, and M. L. Sogin. 1993. Small subunit ribosomal RNA+ of Hexamita inflata and the quest for the first branch in the eukaryotic tree. Mol Biochem Parasitol 59:41-48. Leu, J. Y., P. R. Chua, and G. S. Roeder. 1998. The meiosis-specific Hop2 protein of S. cerevisiae ensures synapsis between homologous chromosomes. Cell 94:375-386. Lewis, W. M. 1985. Nutrient Scarcity as an Evolutionary Cause of Haploidy. American Naturalist 125:692-701. Lewontin, R. C. 1971. Effect of Genetic Linkage on Mean Fitness of Population. Proceedings of the National Academy of Sciences of the United States of America 68:984-&. Lewontin, R. C. 1974. The genetic basis of evolutionary change. Columbia University Press, New York,. Li, A., and J. J. Blow. 2005. Cdt1 downregulation by proteolysis and geminin inhibition prevents DNA re-replication in Xenopus. EMBO J 24:395-404. Lichten, M. 2001. Meiotic recombination: Breaking the genome to save it. Current Biology 11:R253-R256. Lima-de-Faria, A. 1969. Handbook of molecular cytology. Pp. xv, 1508 p. with illus. Frontiers of biology (Amsterdam), v. 15. North-Holland Pub. Co., Amsterdam,. Lin, Y., and G. R. Smith. 1994. Transient, meiosis-induced expression of the rec6 and rec12 genes of Schizosaccharomyces pombe. Genetics 136:769-779. Lin, Z. G., H. Z. Kong, M. Nei, and H. Ma. 2006. Origins and evolution of the recA/RAD51 gene family: Evidence for ancient gene duplication and endosymbiotic gene transfer. Proceedings of the National Academy of Sciences of the United States of America 103:10328-10333. 216 Lopez-Casamichana, M., E. Orozco, L. A. Marchat, and C. Lopez-Camarillo. 2008. Transcriptional profile of the homologous recombination machinery and characterization of the EhRAD51 recombinase in response to DNA damage in Entamoeba histolytica. Bmc Molecular Biology 9:16. Maeshima, K., K. Morimatsu, A. Shinohara, and T. Horii. 1995. Rad51 Homologs in Xenopus laevis - 2 Distinct Genes Are Highly Expressed in Ovary and Testis. Gene 160:195-200. Malik, S.-B., A. W. Pightling, L. M. Stefaniak, A. M. Schurko, and J. M. Logsdon. 2008. An expanded inventory of conserved meiotic genes provides evidence for sex in Trichomonas vaginalis. PLoS One 3:1-13. Malik, S. B. 2007. The Early Evolution of Meiotic Genes. Pp. 238. Biology. University of Iowa, Iowa City. Malik, S. B., M. A. Ramesh, A. M. Hulstrand, and J. M. Logsdon, Jr. 2007. Protist homologs of the meiotic Spo11 gene and topoisomerase VI reveal an evolutionary history of gene duplication and lineage-specific loss. Mol Biol Evol 24:28272841. Marcon, E., and P. B. Moens. 2005. The evolution of meiosis: recruitment and modification of somatic DNA-repair proteins. Bioessays 27:795-808. Margulis, L. 1970. Origin of eukaryotic cells; evidence and research implications for a theory of the origin and evolution of microbial, plant, and animal cells on the Precambrian earth. Yale University Press, New Haven,. Martin, F., A. Aerts, D. Ahren, A. Brun, E. G. Danchin, F. Duchaussoy, J. Gibon, A. Kohler, E. Lindquist, V. Pereda, A. Salamov, H. J. Shapiro, J. Wuyts, D. Blaudez, M. Buee, P. Brokstein, B. Canback, D. Cohen, P. E. Courty, P. M. Coutinho, C. Delaruelle, J. C. Detter, A. Deveau, S. DiFazio, S. Duplessis, L. FraissinetTachet, E. Lucic, P. Frey-Klett, C. Fourrey, I. Feussner, G. Gay, J. Grimwood, P. J. Hoegger, P. Jain, S. Kilaru, J. Labbe, Y. C. Lin, V. Legue, F. Le Tacon, R. Marmeisse, D. Melayah, B. Montanini, M. Muratet, U. Nehls, H. Niculita-Hirzel, M. P. Oudot-Le Secq, M. Peter, H. Quesneville, B. Rajashekar, M. Reich, N. Rouhier, J. Schmutz, T. Yin, M. Chalot, B. Henrissat, U. Kues, S. Lucas, Y. Van de Peer, G. K. Podila, A. Polle, P. J. Pukkila, P. M. Richardson, P. Rouze, I. R. Sanders, J. E. Stajich, A. Tunlid, G. Tuskan, and I. V. Grigoriev. 2008. The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis. Nature 452:88-92. Martin, W. 1999. A briefly argued case that mitochondria and plastids are descendants of endosymbionts, but that the nuclear compartment is not. Proceedings of the Royal Society of London Series B-Biological Sciences 266:1387-1395. Masson, J. Y., and S. C. West. 2001. The Rad51 and Dmc1 recombinases: a non-identical twin relationship. Trends in Biochemical Sciences 26:131-136. 217 Matsuzaki, M., O. Misumi, I. T. Shin, S. Maruyama, M. Takahara, S. Y. Miyagishima, T. Mori, K. Nishida, F. Yagisawa, Y. Yoshida, Y. Nishimura, S. Nakao, T. Kobayashi, Y. Momoyama, T. Higashiyama, A. Minoda, M. Sano, H. Nomoto, K. Oishi, H. Hayashi, F. Ohta, S. Nishizaka, S. Haga, S. Miura, T. Morishita, Y. Kabeya, K. Terasawa, Y. Suzuki, Y. Ishii, S. Asakawa, H. Takano, N. Ohta, H. Kuroiwa, K. Tanaka, N. Shimizu, S. Sugano, N. Sato, H. Nozaki, N. Ogasawara, Y. Kohara, and T. Kuroiwa. 2004. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428:653-657. Maynard Smith, J. 1978. The evolution of sex. Cambridge University Press, Cambridge [Eng.] ; New York. Maynard Smith, J., and E. Szathmary. 1995. The major transitions in evolution. Freeman, Oxford. Mehdiabadi, N. J., M. R. Kronforst, D. C. Queller, and J. E. Strassmann. 2009. Phylogeny, Reproductive Isolation and Kin Recognition in the Social Amoeba Dictyostelium purpureum. Evolution 63:542-548. Mehdiabadi, N. J., M. R. Kronforst, D. C. Queller, and J. E. Strassmann. 2010. Phylogeography and sexual macrocyst formation in the social amoeba Dictyostelium giganteum. Bmc Evolutionary Biology 10:-. Merchant, S. S.S. E. ProchnikO. VallonE. H. HarrisS. J. KarpowiczG. B. WitmanA. TerryA. SalamovL. K. Fritz-LaylinL. Marechal-DrouardW. F. MarshallL. H. QuD. R. NelsonA. A. SanderfootM. H. SpaldingV. V. KapitonovQ. RenP. FerrisE. LindquistH. ShapiroS. M. LucasJ. GrimwoodJ. SchmutzP. CardolH. CeruttiG. ChanfreauC. L. ChenV. CognatM. T. CroftR. DentS. DutcherE. FernandezH. FukuzawaD. Gonzalez-BallesterD. Gonzalez-HalphenA. HallmannM. HanikenneM. HipplerW. InwoodK. JabbariM. KalanonR. KurasP. A. LefebvreS. D. LemaireA. V. LobanovM. LohrA. ManuellI. MeierL. MetsM. MittagT. MittelmeierJ. V. MoroneyJ. MoseleyC. NapoliA. M. NedelcuK. NiyogiS. V. NovoselovI. T. PaulsenG. PazourS. PurtonJ. P. RalD. M. RianoPachonW. RiekhofL. RymarquisM. SchrodaD. SternJ. UmenR. WillowsN. WilsonS. L. ZimmerJ. AllmerJ. BalkK. BisovaC. J. ChenM. EliasK. GendlerC. HauserM. R. LambH. LedfordJ. C. LongJ. MinagawaM. D. PageJ. PanW. PootakhamS. RojeA. RoseE. StahlbergA. M. TerauchiP. YangS. BallC. BowlerC. L. DieckmannV. N. GladyshevP. GreenR. JorgensenS. MayfieldB. MuellerRoeberS. RajamaniR. T. SayreP. BroksteinI. DubchakD. GoodsteinL. HornickY. W. HuangJ. JhaveriY. LuoD. MartinezW. C. NgauB. OtillarA. PoliakovA. PorterL. SzajkowskiG. WernerK. ZhouI. V. GrigorievD. S. Rokhsar, and A. R. Grossman. 2007. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318:245-250. Michod, R. E., H. Bernstein, and A. M. Nedelcu. 2008. Adaptive value of sex in microbial pathogens. Infect Genet Evol 8:267-285. Michod, R. E., and B. R. Levin. 1988. The Evoluton of Sex: An Examination of Current Ideas. Sinauer Press, Sunderland, MA. Miller, M., M. Holder, R. Vos, P. Midford, T. Liebowitz, L. Chan, P. Hoover, and T. Warnow. 2009. The CIPRES Portals. 218 Milne, G. T., and D. T. Weaver. 1993. Dominant negative alleles of Rad52 reveal a DNA repair/recombination complex including Rad51 and Rad52. Genes Dev 7:17551765. Minge, M. A., J. D. Silberman, R. J. S. Orr, T. Cavalier-Smith, K. Shalchian-Tabrizi, F. Burki, A. Skjaeveland, and K. S. Jakobsen. 2009. Evolutionary position of breviate amoebae and the primary eukaryote divergence. Proceedings of the Royal Society B-Biological Sciences 276:597-604. Miyagawa, K., T. Tsuruga, A. Kinomura, K. Usui, M. Katsura, S. Tashiro, H. Mishima, and K. Tanaka. 2002. A role for RAD54B in homologous recombination in human cells. Embo Journal 21:175-180. Moore, D. P., and T. L. Orr-Weaver. 1998. Chromosome segregation during meiosis: building an unambivalent bivalent. Curr Top Dev Biol 37:263-299. Moreira, D., S. von der Heyden, D. Bass, P. Lopez-Garcia, E. Chao, and T. CavalierSmith. 2007. Global eukaryote phylogeny: Combined small- and large-subunit ribosomal DNA trees support monophyly of Rhizaria, Retaria and Excavata. Molecular Phylogenetics and Evolution 44:255-266. Mozlin, A. M., C. W. Fung, and L. S. Symington. 2008. Role of the Saccharomyces cerevisiae Rad51 paralogs in sister chromatid recombination. Genetics 178:113126. Muller, H. J. 1964. The Relation of Recombination to Mutational Advance. Mutat Res 106:2-9. Muller, H. J. 1932. Some genetic aspects of sex. American Naturalist 66:118-138. Muller, M. 1993. The hydrogenosome. J Gen Microbiol 139:2879-2889. Muniyappa, K., S. Anuradha, and B. Byers. 2000. Yeast meiosis-specific protein Hop1 binds to G4 DNA and promotes its formation. Molecular and Cellular Biology 20:1361-1369. Nedelcu, A. M., O. Marcu, and R. E. Michod. 2004. Sex as a response to oxidative stress: a twofold increase in cellular reactive oxygen species activates sex genes. Proc Biol Sci 271:1591-1596. Nedelcu, A. M., and R. E. Michod. 2003. Sex as a response to oxidative stress: the effect of antioxidants on sexual induction in a facultatively sexual lineage. Proc Biol Sci 270 Suppl 2:S136-139. Nei, M. 1967. Modification of Linkage Intensity by Natural Selection. Genetics 57:625. Nei, M., and S. Kumar. 2000. Molecular Evolution and Phylogenetics. Oxford University Press, Oxford. Nguyen, V. Q., C. Co, and J. J. Li. 2001. Cyclin-dependent kinases prevent DNA rereplication through multiple mechanisms. Nature 411:1068-1073. 219 Nichols, M. D., K. DeAngelis, J. L. Keck, and J. M. Berger. 1999. Structure and function of an archaeal topoisomerase VI subunit with homology to the meiotic recombination factor Spo11. EMBO J 18:6177-6188. Nicklas, R. B. 1977. Chromosome distribution: experiments on cell hybrids and in vitro. Philos Trans R Soc Lond B Biol Sci 277:267-276. Nimonkar, A. V., I. Amitani, R. J. Baskin, and S. C. Kowalczykowski. 2007. Single molecule Imaging of Tid1/Rdh54, a rad54 homolog that translocates on duplex DNA and can disrupt joint molecules. Journal of Biological Chemistry 282:30776-30784. Nishinaka, T., A. Shinohara, Y. Ito, S. Yokoyama, and T. Shibata. 1998. Base pair switching by interconversion of sugar puckers in DNA extended by proteins of RecA-family: a model for homology search in homologous genetic recombination. Proc Natl Acad Sci U S A 95:11071-11076. Nishitani, H., Z. Lygerou, T. Nishimoto, and P. Nurse. 2000. The Cdt1 protein is required to license DNA for replication in fission yeast. Nature 404:625-628. Noble, S. M., and C. Guthrie. 1996. Identification of novel genes required for yeast premRNA splicing by means of cold-sensitive mutations. Genetics 143:67-80. Octobre, G., A. Lorenz, J. Loidl, and J. Kohli. 2008. The Rad52 homologs Rad22 and rtil of Schizosaccharomyces pombe are not essential for meiotic interhomolog DNA strand exchange, but are required for meiotic intrachromosomal recombination and mating type-related DNA repair. Genetics 178:2399-2412. Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin, New York,. Okada, H., Y. Hirota, R. Moriyama, Y. Saga, and K. Yanagisawa. 1986. Nuclear-Fusion in Multinucleated Giant-Cells during the Sexual Development of Dictyostelium discoideum. Developmental Biology 118:95-102. Okorokov, A. L., Y. L. Chaban, D. V. Bugreev, J. Hodgkinson, A. V. Mazin, and E. V. Orlova. 2010. Structure of the hDmc1-ssDNA Filament Reveals the Principles of Its Architecture. PLoS One 5:14. Orr-Weaver, T. L. 1995. Meiosis in Drosophila: seeing is believing. Proc Natl Acad Sci U S A 92:10443-10449. Otto, S. 2008. Sexual reproduction and the evolution of sex. Nature Education 1. Otto, S. P., and A. C. Gerstein. 2006. Why have sex? The population genetics of sex and recombination. Biochemical Society Transactions 34:519-522. Otto, S. P., and D. B. Goldstein. 1992. Recombination and the Evolution of Diploidy. Genetics 131:745-751. Otto, S. P., and T. Lenormand. 2002. Resolving the paradox of sex and recombination. Nature Reviews Genetics 3:252-261. 220 Palenik, B., J. Grimwood, A. Aerts, P. Rouze, A. Salamov, N. Putnam, C. Dupont, R. Jorgensen, E. Derelle, S. Rombauts, K. Zhou, R. Otillar, S. S. Merchant, S. Podell, T. Gaasterland, C. Napoli, K. Gendler, A. Manuell, V. Tai, O. Vallon, G. Piganeau, S. Jancek, M. Heijde, K. Jabbari, C. Bowler, M. Lohr, S. Robbens, G. Werner, I. Dubchak, G. J. Pazour, Q. Ren, I. Paulsen, C. Delwiche, J. Schmutz, D. Rokhsar, Y. Van de Peer, H. Moreau, and I. V. Grigoriev. 2007. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A 104:7705-7710. Pannunzio, N. R., G. M. Manthey, and A. M. Bailis. 2008. RAD59 is required for efficient repair of simultaneous double-strand breaks resulting in translocations in Saccharomyces cerevisiae. DNA Repair 7:788-800. Paques, F., and J. E. Haber. 1999. Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiology and Molecular Biology Reviews 63:349-+. Parfrey, L. W., E. Barbero, E. Lasser, M. Dunthorn, D. Bhattacharya, D. Patterson, and L. Katz. 2006. Evaluating Support for the Current Classification of Eukaryotic Diversity. Plos Genetics 2:2062-2073. Parfrey, L. W., J. Grant, Y. I. Tekle, E. Lasek-Nesselquist, H. G. Morrison, M. L. Sogin, D. J. Patterson, and L. A. Katz. 2010. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst Biol. Parisi, S., M. J. McKay, M. Molnar, M. A. Thompson, P. J. van der Spek, E. van DrunenSchoenmaker, R. Kanaar, E. Lehmann, J. H. Hoeijmakers, and J. Kohli. 1999. Rec8p, a meiotic recombination and sister chromatid cohesion phosphoprotein of the Rad21p family conserved from fission yeast to humans. Mol Cell Biol 19:3515-3528. Patron, N. J., Y. Inagaki, and P. J. Keeling. 2007. Multiple gene phylogenies support the monophyly of cryptomonad and haptophyte host lineages. Curr Biol 17:887-891. Patterson, D. J. 1999. The diversity of eukaryotes. American Naturalist 154:S96-S124. Pellegrini, L., D. S. Yu, T. Lo, S. Anand, M. Lee, T. L. Blundell, and A. R. Venkitaraman. 2002. Insights into DNA recombination from the structure of a RAD51-BRCA2 complex. Nature 420:287-293. Perrot, V., S. Richerd, and M. Valero. 1991. Transition from haploidy to diploidy. Nature 351:315-317. Petersen, G., and O. Seberg. 2002. Molecular evolution and phylogenetic application of DMC1. Molecular Phylogenetics and Evolution 22:43-50. Petersen, G., O. Seberg, and C. Baden. 2004. A phylogenetic analysis of the genus Psathyrostachys (Poaceae) based on one nuclear gene, three plastid genes, and morphology. Plant Systematics and Evolution 249:99-110. 221 Petes, T. D., R. E. Malone, and L. S. Symington. 1991. Recombination in Yeast. Pp. 407521 in J. R. Broach, E. W. Jones, and J. R. Pringle, eds. The Molecular and Cellular Biology of the Yeast Saccharomyces: Genome Dynamics, Protein Synthesis, and Energetics. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. Petukhova, G., S. Stratton, and P. Sung. 1998. Catalysis of homologous DNA pairing by yeast Rad51 and Rad54 proteins. Nature 393:91-94. Petukhova, G., S. Van Komen, S. Vergano, H. Klein, and P. Sung. 1999. Yeast Rad54 promotes Rad51-dependent homologous DNA pairing via ATP hydrolysis-driven change in DNA double helix conformation. J Biol Chem 274:29453-29462. Pevsner, J. 2009. Bioinformatics and Functional Genomics. Wiley-Blackwell, Hoboken, NJ. Piatti, S., T. Bohm, J. H. Cocker, J. F. Diffley, and K. Nasmyth. 1996. Activation of Sphase-promoting CDKs in late G1 defines a "point of no return" after which Cdc6 synthesis cannot promote DNA replication in yeast. Genes Dev 10:1516-1531. Pool, R. 1990. The Third Kingdom of Life. Science 247:159. Poole, A. M., and D. Penny. 2007. Evaluating hypotheses for the origin of eukaryotes. Bioessays 29:74-84. Poxleitner, M. K., M. L. Carpenter, J. J. Mancuso, C. J. R. Wang, S. C. Dawson, and W. Z. Cande. 2008. Evidence for karyogamy and exchange of genetic material in the binucleate intestinal parasite Giardia intestinalis. Science 319:1530-1533. Proudfoot, C., and R. McCulloch. 2006. Trypanosoma brucei DMC1 does not act in DNA recombination, repair or antigenic variation in bloodstream stage cells. Mol Biochem Parasitol 145:245-253. Putnam, N. H., M. Srivastava, U. Hellsten, B. Dirks, J. Chapman, A. Salamov, A. Terry, H. Shapiro, E. Lindquist, V. V. Kapitonov, J. Jurka, G. Genikhovich, I. V. Grigoriev, S. M. Lucas, R. E. Steele, J. R. Finnerty, U. Technau, M. Q. Martindale, and D. S. Rokhsar. 2007. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317:86-94. Ramesh, M. A., S. B. Malik, and J. M. Logsdon. 2005. A phylogenomic inventory of meiotic genes: Evidence for sex in Giardia and an early eukaryotic origin of meiosis. Current Biology 15:185-191. Reeb, V. C., M. T. Peglar, H. S. Yoon, J. R. Bai, M. Wu, P. Shiu, J. L. Grafenberg, A. Reyes-Prieto, S. E. Rummele, J. Gross, and D. Bhattacharya. 2009. Interrelationships of chromalveolates within a broadly sampled tree of photosynthetic protists. Molecular Phylogenetics and Evolution 53:202-211. 222 Rensing, S. A., D. Lang, A. D. Zimmer, A. Terry, A. Salamov, H. Shapiro, T. Nishiyama, P. F. Perroud, E. A. Lindquist, Y. Kamisugi, T. Tanahashi, K. Sakakibara, T. Fujita, K. Oishi, I. T. Shin, Y. Kuroki, A. Toyoda, Y. Suzuki, S. Hashimoto, K. Yamaguchi, S. Sugano, Y. Kohara, A. Fujiyama, A. Anterola, S. Aoki, N. Ashton, W. B. Barbazuk, E. Barker, J. L. Bennetzen, R. Blankenship, S. H. Cho, S. K. Dutcher, M. Estelle, J. A. Fawcett, H. Gundlach, K. Hanada, A. Heyl, K. A. Hicks, J. Hughes, M. Lohr, K. Mayer, A. Melkozernov, T. Murata, D. R. Nelson, B. Pils, M. Prigge, B. Reiss, T. Renner, S. Rombauts, P. J. Rushton, A. Sanderfoot, G. Schween, S. H. Shiu, K. Stueber, F. L. Theodoulou, H. Tu, Y. Van de Peer, P. J. Verrier, E. Waters, A. Wood, L. Yang, D. Cove, A. C. Cuming, M. Hasebe, S. Lucas, B. D. Mishler, R. Reski, I. V. Grigoriev, R. S. Quatrano, and J. L. Boore. 2008. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319:64-69. Rice, W. R. 2002. Experimental tests of the adaptive significance of sexual recombination. Nature Reviews Genetics 3:241-251. Rice, W. R., and A. K. Chippindale. 2001. Sexual recombination and the power of natural selection. Science 294:555-559. Richards, A. J. 1986. Plant breeding systems. G. Allen & Unwin, London ; Boston. Ridley, M. 2004. Evolution. Blackwell Science Ltd., Malden, MA. Rodriguez-Ezpeleta, N., H. Brinkmann, S. C. Burey, B. Roure, G. Burger, W. Loffelhardt, H. J. Bohnert, H. Philippe, and B. F. Lang. 2005. Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr Biol 15:1325-1330. Roger, A. J. 1999. Reconstructing early events in eukaryotic evolution. American Naturalist 154:S146-S163. Roger, A. J., and L. A. Hug. 2006. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Pp. 1039-1054. Royal Society. Roger, A. J., and A. G. B. Simpson. 2009. Evolution: Revisiting the Root of the Eukaryote Tree. Current Biology 19:R165-R167. Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425:798-804. Ruckert, J. 1892. Zur Entwicklungsgeschichte des Ovarioleies bei Selachien. Anatomischer Anzeiger 7:107-158. Saeki, T., I. Machida, and S. Nakai. 1980. Genetic control of diploid recovery after gamma-irradiation in the yeast Saccharomyces cerevisiae. Mutat Res 73:251-265. Sagan, L. 1967. On the origin of mitosing cells. J Theor Biol 14:255-274. Sager, R., and S. Granick. 1954. Nutritional Control of Sexuality in Chlamydomonas reinhardtii. Journal of General Physiology 37:729-742. 223 Sakaguchi, K., T. Ishibashi, Y. Uchiyama, and K. Iwabata. 2009. The multi-Replication Protein A (RPA) system - a new perspective. Febs Journal 276:943-963. Sakane, I., C. Kamataki, Y. Takizawa, M. Nakashima, S. Toki, H. Ichikawa, S. Ikawa, T. Shibata, and H. Kurumizaka. 2008. Filament formation and robust strand exchange activities of the rice DMC1A and DMC1B proteins. Nucleic Acids Research 36:4266-4276. Sandler, S. J., L. H. Satin, H. S. Samra, and A. J. Clark. 1996. recA-like genes from three archaean species with putative protein products similar to Rad51 and Dmc1 proteins of the yeast Saccharomyces cerevisiae. Nucleic Acids Res 24:2125-2132. Sarai, N., W. Kagawa, N. Fujikawa, K. Saito, J. Hikiba, K. Tanaka, K. Miyagawa, H. Kurumizaka, and S. Yokoyama. 2008. Biochemical analysis of the N-terminal domain of human RAD54B. Nucleic Acids Research 36:5441-5450. Sauvageau, S., A. Z. Stasiak, I. Banville, M. Ploquin, A. Stasiak, and J. Y. Masson. 2005. Fission yeast Rad51 and Dmc1, two efficient DNA recombinases forming helical nucleoprotein filaments. Mol Cell Biol 25:4377-4387. Schild, D. 1995. Suppression of a new allele of the yeast RAD52 gene by overexpression of RAD51, mutations in srs2 and ccr4, or mating-type heterozygosity. Genetics 140:115-127. Schild, D., and C. Wiese. 2009. Overexpression of RAD51 suppresses recombination defects: a possible mechanism to reverse genomic instability. Nucleic Acids Res. Schlegel, M. 1994. Molecular Phylogeny of Eukaryotes. Trends in Ecology & Evolution 9:330-335. Schrader, F., and S. Hughes-Schrader. 1931. Haploidy in Metazoa. Quarterly Review of Biology 6:411-438. Schurko, A. M., and J. M. Logsdon. 2008. Using a meiosis detection toolkit to investigate ancient asexual "scandals" and the evolution of sex. Bioessays 30:579-589. Schurko, A. M., M. Neiman, and J. M. Logsdon, Jr. 2009. Signs of sex: what we know and how we know it. Trends Ecol Evol 24:208-217. Scudo, F. M. 1967. Selection on Both Haplo and Diplophase. Genetics 56:693-&. Searfoss, A., T. E. Dever, and R. Wickner. 2001. Linking the 3 ' poly(A) tail to the subunit joining step of translation initiation: Relations of Pab1p, eukaryotic translation initiation factor 5B (Fun12p), and Ski2p-Slh1p. Molecular and Cellular Biology 21:4900-4908. Sehorn, M. G., S. Sigurdsson, W. Bussen, V. M. Unger, and P. Sung. 2004. Human meiotic recombinase Dmc1 promotes ATP-dependent homologous DNA strand exchange. Nature 429:433-437. Seong, C., S. Colavito, Y. Kwon, P. Sung, and L. Krejci. 2009. Regulation of Rad51 Recombinase Presynaptic Filament Assembly via Interactions with the Rad52 Mediator and the Srs2 Anti-recombinase. Journal of Biological Chemistry 284:24363-24371. 224 Shadwick, L. L., F. W. Spiegel, J. D. Shadwick, M. W. Brown, and J. D. Silberman. 2009. Eumycetozoa = Amoebozoa?: SSUrDNA phylogeny of protosteloid slime molds and its significance for the amoebozoan supergroup. PLoS One 4:e6754. Sherman, F., and H. Roman. 1963. Evidence for 2 Types of Allelic Recombination in Yeast. Genetics 48:255-&. Shin, D. S., L. Pellegrini, D. S. Daniels, B. Yelent, L. Craig, D. Bates, D. S. Yu, M. K. Shivji, C. Hitomi, A. S. Arvai, N. Volkmann, H. Tsuruta, T. L. Blundell, A. R. Venkitaraman, and J. A. Tainer. 2003. Full-length archaeal Rad51 structure and mutants: mechanisms for RAD51 assembly and control by BRCA2. Embo Journal 22:4566-4576. Shinohara, A., H. Ogawa, and T. Ogawa. 1992. Rad51 protein involved in repair and recombination in S. cerevisiae is a RecA-like protein. Cell 69:457-470. Shinohara, A., M. Shinohara, T. Ohta, S. Matsuda, and T. Ogawa. 1998. Rad52 forms ring structures and co-operates with RPA in single-strand DNA annealing. Genes Cells 3:145-156. Shinohara, M., S. L. Gasior, D. K. Bishop, and A. Shinohara. 2000. Tid1/Rdh54 promotes colocalization of Rad51 and Dmc1 during meiotic recombination. Proc Natl Acad Sci U S A 97:10814-10819. Shinozawa, T., T. Horiike, and K. Hamada. 2001. Does endosymbiosis explain the origin of the nucleus? Reply. Nature Cell Biology 3:E173-E174. Simchen, G., and Y. Hugerat. 1993. What Determines Whether Chromosomes Segregate Reductionally or Equationally in Meiosis. Bioessays 15:1-8. Simpson, A. G. 2003. Cytoskeletal organization, phylogenetic affinities and systematics in the contentious taxon Excavata (Eukaryota). Int J Syst Evol Microbiol 53:17591777. Simpson, A. G., Y. Inagaki, and A. J. Roger. 2006. Comprehensive multigene phylogenies of excavate protists reveal the evolutionary positions of "primitive" eukaryotes. Mol Biol Evol 23:615-625. Simpson, A. G. B., and D. J. Patterson. 1999. The ultrastructure of Carpediemonas membranifera (Eukaryota) with reference to the "Excavate hypothesis". European Journal of Protistology 35:353-370. Simpson, A. G. B., and A. J. Roger. 2004. The real 'kingdoms' of eukaryotes. Current Biology 14:R693-R696. Smith, T. F., and M. S. Waterman. 1981. Identification of common molecular subsequences. J Mol Biol 147:195-197. Snowden, T., S. Acharya, C. Butz, M. Berardini, and R. Fishel. 2004. hMSH4-hMSH5 recognizes Holliday Junctions and forms a meiosis-specific sliding clamp that embraces homologous chromosomes. Mol Cell 15:437-451. 225 Sogin, M., H. Elwood, and J. Gunderson. 1986. Evolutionary diversity of eukaryotic small-subunit rRNA genes. Proceedings of the National Academy of Sciences of the United States of America 83:1383-1387. Sogin, M. L. 1991. Early evolution and the origin of eukaryotes. Curr Opin Genet Dev 1:457-463. Solinger, J. A., K. Kiianitsa, and W. D. Heyer. 2002. Rad54, a Swi2/Snf2-like recombinational repair protein, disassembles Rad51:dsDNA filaments. Mol Cell 10:1175-1188. Sonnhammer, E. L., S. R. Eddy, E. Birney, A. Bateman, and R. Durbin. 1998. Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26:320-322. Soustelle, C., M. Vedel, R. Kolodner, and A. Nicolas. 2002. Replication Protein A is required for meiotic recombination in Saccharomyces cerevisiae. Genetics 161:535-547. Srivastava, M., E. Begovic, J. Chapman, N. H. Putnam, U. Hellsten, T. Kawashima, A. Kuo, T. Mitros, A. Salamov, M. L. Carpenter, A. Y. Signorovitch, M. A. Moreno, K. Kamm, J. Grimwood, J. Schmutz, H. Shapiro, I. V. Grigoriev, L. W. Buss, B. Schierwater, S. L. Dellaporta, and D. S. Rokhsar. 2008. The Trichoplax genome and the nature of placozoans. Nature 454:955-960. Stack, S. M., and W. V. Brown. 1969. Somatic pairing, reduction and recombination: an evolutionary hypothesis of meiosis. Nature 222:1275-1276. Stamatakis, A., P. Hoover, and J. Rougemont. 2008. A Rapid Bootstrap Algorithm for the RAxML Web Servers. Systematic Biology 57:758-771. Stamatakis, A., T. Ludwig, and H. Meier. 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456-463. Stassen, N. Y., J. M. Logsdon, G. J. Vora, H. H. Offenberg, J. D. Palmer, and M. E. Zolan. 1997. Isolation and characterization of rad51 orthologs from Coprinus cinereus and Lycopersicon esculentum, and phylogenetic analysis of eukaryotic recA homologs. Current Genetics 31:144-157. Stechmann, A., and T. Cavalier-Smith. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:89-91. Stechmann, A., and T. Cavalier-Smith. 2003a. The root of the eukaryote tree pinpointed. Curr Biol 13:R665-666. Stechmann, A., and T. Cavalier-Smith. 2003b. Phylogenetic analysis of eukaryotes using heat-shock protein Hsp90. J Mol Evol 57:408-419. Steenkamp, E. T., J. Wright, and S. L. Baldauf. 2006. The protistan origins of animals and fungi. Molecular Biology and Evolution 23:93-106. 226 Stiller, J. W., and L. Harrell. 2005. The largest subunit of RNA Polymerase II from the Glaucocystophyta: functional constraint and short-branch exclusion in deep eukaryotic phylogeny. BMC Evol Biol 5:71. Story, R. M., I. T. Weber, and T. A. Steitz. 1992. The structure of the E. coli recA protein monomer and polymer. Nature 355:318-325. Sugawara, H., K. Iwabata, A. Koshiyama, T. Yanai, Y. Daikuhara, S. H. Namekawa, F. N. Hamada, and K. Sakaguchi. 2009. Coprinus cinereus Mer3 is required for synaptonemal complex formation during meiosis. Chromosoma 118:127-139. Sugawara, N., X. Wang, and J. E. Haber. 2003. In Vivo roles of Rad52, Rad54, and Rad55 proteins in Rad51-mediated recombination. Molecular Cell 12:209-219. Sugimoto-Shirasu, K., N. J. Stacey, J. Corsar, K. Roberts, and M. C. McCann. 2002. DNA topoisomerase VI is essential for endoreduplication in Arabidopsis. Curr Biol 12:1782-1786. Sung, P. 1997. Function of yeast Rad52 protein as a mediator between Replication Protein A and the Rad51 recombinase. J Biol Chem 272:28194-28197. Symington, L. S. 2002. Role of RAD52 epistasis group genes in homologous recombination and double-strand break repair. Microbiology and Molecular Biology Reviews 66:630-+. Syvanen, M. 1985. Cross-Species Gene-Transfer - Implications for a New Theory of Evolution. Journal of Theoretical Biology 112:333-343. Szathmary, E., I. Scheuring, M. Kotsis, and I. Gladkih. 1990. Sexuality of eukaryotic unicells: hyperbolic growth, coesixtence of facultative parthenogens, and the repair hypothesis. Pp. 279-287 in J. Maynard Smith, and G. Vida, eds. Organizational Constraints on the Dynamics of Evolution. Manchester University Press, Manchester. Szathmary, E., and J. M. Smith. 1995. The Major Evolutionary Transitions. Nature 374:227-232. Szekvolgyi, L., and A. Nicolas. 2010. From meiosis to postmeiotic events: Homologous recombination is obligatory but flexible. Febs Journal 277:571-589. Tada, S., A. Li, D. Maiorano, M. Mechali, and J. J. Blow. 2001. Repression of origin assembly in metaphase depends on inhibition of RLF-B/Cdt1 by geminin. Nat Cell Biol 3:107-113. Tatusov, R. L., N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin, E. V. Koonin, D. M. Krylov, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, S. Smirnov, A. V. Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, and D. A. Natale. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41. Tatusov, R. L., M. Y. Galperin, D. A. Natale, and E. V. Koonin. 2000. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research 28:33-36. 227 Thomer, M., N. R. May, B. D. Aggarwal, G. Kwok, and B. R. Calvi. 2004. Drosophila double-parked is sufficient to induce re-replication during development and is regulated by cyclin E/CDK2. Development 131:4807-4818. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 25:4876-4882. Timmermans, M. J., D. Roelofs, J. Marien, and N. M. van Straalen. 2008. Revealing pancrustacean relationships: phylogenetic analysis of ribosomal protein genes places Collembola (springtails) in a monophyletic Hexapoda and reinforces the discrepancy between mitochondrial and nuclear DNA markers. BMC Evol Biol 8:83. Toth, A., K. P. Rabitsch, M. Galova, A. Schleiffer, S. B. Buonomo, and K. Nasmyth. 2000. Functional genomics identifies monopolin: a kinetochore protein required for segregation of homologs during meiosis i. Cell 103:1155-1168. Tovar, J., A. Fischer, and C. G. Clark. 1999. The mitosome, a novel organelle related to mitochondria in the amitochondrial parasite Entamoeba histolytica. Mol Microbiol 32:1013-1021. Tovar, J., G. Leon-Avila, L. B. Sanchez, R. Sutak, J. Tachezy, M. van der Giezen, M. Hernandez, M. Muller, and J. M. Lucocq. 2003. Mitochondrial remnant organelles of Giardia function in iron-sulphur protein maturation. Nature 426:172-176. Tsubouchi, H., and G. S. Roeder. 2003. The importance of genetic recombination for fidelity of chromosome pairing in meiosis. Dev Cell 5:915-925. Tsubouchi, H., and G. S. Roeder. 2002. The Mndl protein forms a complex with Hop2 to promote homologous chromosome pairing and meiotic double-strand break. Molecular and Cellular Biology 22:3078-3088. Tsuzuki, T., Y. Fujii, K. Sakumi, Y. Tominaga, K. Nakao, M. Sekiguchi, A. Matsushiro, Y. Yoshimura, and MoritaT. 1996. Targeted disruption of the Rad51 gene leads to lethality in embryonic mice. Proc Natl Acad Sci U S A 93:6236-6240. Tyler, B. M., S. Tripathy, X. Zhang, P. Dehal, R. H. Jiang, A. Aerts, F. D. Arredondo, L. Baxter, D. Bensasson, J. L. Beynon, J. Chapman, C. M. Damasceno, A. E. Dorrance, D. Dou, A. W. Dickerman, I. L. Dubchak, M. Garbelotto, M. Gijzen, S. G. Gordon, F. Govers, N. J. Grunwald, W. Huang, K. L. Ivors, R. W. Jones, S. Kamoun, K. Krampis, K. H. Lamour, M. K. Lee, W. H. McDonald, M. Medina, H. J. Meijer, E. K. Nordberg, D. J. Maclean, M. D. Ospina-Giraldo, P. F. Morris, V. Phuntumart, N. H. Putnam, S. Rash, J. K. Rose, Y. Sakihama, A. A. Salamov, A. Savidor, C. F. Scheuring, B. M. Smith, B. W. Sobral, A. Terry, T. A. TortoAlalibo, J. Win, Z. Xu, H. Zhang, I. V. Grigoriev, D. S. Rokhsar, and J. L. Boore. 2006. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science 313:1261-1266. Umezu, K., N. Sugawara, C. Chen, J. E. Haber, and R. D. Kolodner. 1998. Genetic analysis of yeast RPA1 reveals its multiple functions in DNA metabolism. Genetics 148:989-1005. 228 van der Giezen, M. 2009. Hydrogenosomes and mitosomes: Conservation and evolution of functions. Journal of Eukaryotic Microbiology 56:221-231. van der Giezen, M., J. Tovar, and C. G. Clark. 2005. Mitochondrion-derived organelles in protists and fungi. Int Rev Cytol 244:175-225. van Keulen, H., S. R. Campbell, S. L. Erlandsen, and E. L. Jarroll. 1991a. Cloning and restriction enzyme mapping of ribosomal DNA of Giardia duodenalis, Giardia ardeae and Giardia muris. Mol Biochem Parasitol 46:275-284. van Keulen, H., S. Horvat, S. L. Erlandsen, and E. L. Jarroll. 1991b. Nucleotide sequence of the 5.8S and large subunit rRNA genes and the internal transcribed spacer and part of the external spacer from Giardia ardeae. Nucleic Acids Res 19:6050. Van Valen, L. 1973. A new evolutionary law. Evol. Theory 1:1-30. Vaziri, C., S. Saxena, Y. Jeon, C. Lee, K. Murata, Y. Machida, N. Wagle, D. S. Hwang, and A. Dutta. 2003. A p53-dependent checkpoint pathway prevents rereplication. Mol Cell 11:997-1008. 229 Venter, J. C., M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J. Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh, and X. Zhu. 2001. The sequence of the human genome. Science 291:1304-1351. Villeneuve, A. M., and K. J. Hillers. 2001. Whence meiosis? Cell 106:647-650. Wang, H., Z. Xu, L. Gao, and B. Hao. 2009. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evol Biol 9:195. Watanabe, Y., and P. Nurse. 1999. Cohesin Rec8 is required for reductional chromosome segregation at meiosis. Nature 400:461-464. Watson, J. D., and F. H. C. Crick. 1953. Genetical Implications of the Structure of Deoxyribonucleic Acid. Nature 171:964-967. 230 Weber, A. P. M., C. Oesterhelt, W. Gross, A. Brautigam, L. A. Imboden, I. Krassovskaya, N. Linka, J. Truchina, J. Schneidereit, H. Voll, L. M. Voll, M. Zimmermann, A. Jamai, W. R. Riekhof, B. Yu, R. M. Garavito, and C. Benning. 2004. EST-analysis of the thermo-acidophilic red microalga Galdieria sulphuraria reveals potential for lipid A biosynthesis and unveils the pathway of carbon export from rhodoplasts. Plant Molecular Biology 55:17-32. Weiner, B. M., and N. Kleckner. 1994. Chromosome pairing via multiple interstitial interactions before and during meiosis in yeast. Cell 77:977-991. Weismann, A., W. N. Parker, and H. Ronnfeldt. 1893. The germ-plasm: a theory of heredity. C. Scribner's sons, New York,. West, S. A., C. M. Lively, and A. F. Read. 1999. A pluralist approach to sex and recombination. Journal of Evolutionary Biology 12:1003-1012. West, S. C. 1992. Enzymes and molecular mechanisms of genetic recombination. Annu Rev Biochem 61:603-640. White, M. J. D. 1978. Modes of Speciation. Freeman, San Francisco. Wickstead, B., K. Gull, and T. A. Richards. 2010. Patterns of kinesin evolution reveal a complex ancestral eukaryote with a multifunctional cytoskeleton. Bmc Evolutionary Biology 10:-. Wilkins, A. S., and R. Holliday. 2009. The evolution of meiosis from mitosis. Genetics 181:3-12. Williamson, D. H., L. H. Johnston, D. J. Fennell, and G. Simchen. 1983. The Timing of the S-Phase and Other Nuclear Events in Yeast Meiosis. Experimental Cell Research 145:209-217. Woese, C. R., and G. E. Fox. 1977. Phylogenetic Structure of Prokaryotic Domain Primary Kingdoms. Proceedings of the National Academy of Sciences of the United States of America 74:5088-5090. Woese, C. R., O. Kandler, and M. L. Wheelis. 1990. Towards a natural system of organisms-proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences of the United States of America 87:45764579. Wohlschlegel, J. A., B. T. Dwyer, S. K. Dhar, C. Cvetic, J. C. Walter, and A. Dutta. 2000. Inhibition of eukaryotic DNA replication by geminin binding to Cdt1. Science 290:2309-2312. Wold, M. S. 1997. Replication Protein A: A heterotrimeric, single-stranded DNA-binding protein required for eukaryotic DNA metabolism. Annual Review of Biochemistry 66:61-92. Xu, T., and G. M. Rubin. 1993. Analysis of genetic mosaics in developing and adult Drosophila tissues. Development 117:1223-1237. 231 Yin, Y., H. Cheong, D. Friedrichsen, Y. Zhao, J. Hu, S. Mora-Garcia, and J. Chory. 2002. A crucial role for the putative Arabidopsis topoisomerase VI in plant growth and development. Proc Natl Acad Sci U S A 99:10191-10196. Yokobayashi, S., M. Yamamoto, and Y. Watanabe. 2003. Cohesins determine the attachment manner of kinetochores to spindle microtubules at meiosis I in fission yeast. Mol Cell Biol 23:3965-3973. Yoon, H. S., J. Grant, Y. I. Tekle, M. Wu, B. C. Chaon, J. C. Cole, J. M. Logsdon, D. J. Patterson, D. Bhattacharya, and L. A. Katz. 2008. Broadly sampled multigene trees of eukaryotes. Bmc Evolutionary Biology 8:-. Yoon, H. S., J. D. Hackett, C. Ciniglia, G. Pinto, and D. Bhattacharya. 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Molecular Biology and Evolution 21:809-818. Yoon, H. S., J. D. Hackett, G. Pinto, and D. Bhattacharya. 2002. The single, ancient origin of chromist plastids. Proc Natl Acad Sci U S A 99:15507-15512. Zalevsky, J., A. J. MacQueen, J. B. Duffy, K. J. Kemphues, and A. M. Villeneuve. 1999. Crossing over during Caenorhabditis elegans meiosis requires a conserved MutSbased pathway that is partially dispensable in budding yeast. Genetics 153:12711283. Zhou, X. F., Z. G. Lin, and H. Ma. 2010. Phylogenetic detection of numerous gene duplications shared by animals, fungi and plants. Genome Biology 11:-. Zou, Y., Y. Y. Liu, X. M. Wu, and S. M. Shell. 2006. Functions of human Replication Protein A (RPA): From DNA replication to DNA damage and stress responses. Journal of Cellular Physiology 208:267-273.
© Copyright 2025 Paperzz