Student Reader

Eukaryotic genome

A mammalian chromosome is massive, with tens to hundreds of megabases of DNA. The entire haploid human genome contains about 3 billion base pairs of DNA.

sister chromatids figure

Only about 5% of this encodes functional RNAs or Proteins or controls their production. Eukaryotic chromosomes are bound to structural proteins to form chromatin. During metaphase, chromatin is highly condensed into the recognizable structure seen at left. During interphase, chromatin is highly decondensed so that regulatory proteins can access the DNA.

During evolution large rearrangements can occur in the size and number of chromosomes.

Chromosomal rearrangements are rare, but are extremely important for speciation because they make productive mating impossible. The number, sizes and shapes of metaphase chromosomes constitute the karyotype (distinctive for each species). During metaphase, chromosomes are distringuished by banding patterns and chromosome painting.

A syntenic region contains genes that are found in the same order in different species, although not always on the same chromosome. For example, the Indian Muntjac has three large chromosomes and a tiny X chromosome; the very similar Reeves Muntjac has just as much DNA -- and often in the same sequence -- but divided among 23 chromosomes.

Chromosome structure

Chromosome structure changes during the cell cycle.
InterphaseDuring interphase, chromosomes are highly decondensed in most regions, allowing access of regulatory proteins for transcription and replication. Within the nucleus, individual chromosomes are found within diffuse but non-overlapping domains.
M PhaseDuring mitosis, duplicated chromosomes condense into defined sister chromatids to allow their segregation at cytokinesis. After chromosome condensation, the nuclear envelope breaks down in a process controlled by the nuclear lamina so that the chromosomes can segregate to opposite ends. At metaphase, chromosomes are aligned along the metaphase plate and sister chromatids are split at the centromere to segregate to opposite poles of the dividing cell.
The three critical elements of a eukaryotic chromosome required for normal chromosome replication and mitotic segregation are: a replication origin (ARS), centromere and telomeres.

These were identified using yeast: Adding telomeric DNA to a DNA containing an ARS and Centromere allows its maintenance as a linear chromosome. Yeast artificial chromosomes containing ARS, Cen, and Tel elements allowed the cloning of large fragments of human chromosomes.

Progeny of Transfected Cell
Plasmid Recipient Growth Mitotic Segregation Observation
LEU+ Circular LEU- Yeast None Transfection with a LEU+ plasmid does not alone restore LEU to a LEU- cell.
LEU+ ARS+ Circular LEU- Yeast Some Poor Replication occurs, but poor segregation means only ~10% of progeny carry the plasmid.
LEU+ ARS+ CEN+ Circular LEU- Yeast Yes Good A centromeric (CEN) genome fragment is needed for strong segregation.
LEU+ ARS+ CEN+ Linear LEU- Yeast None Linearization (via restriction enzymes) of a TEL- circular plasmid makes it unstable.
LEU+ ARS+ CEN+ TEL+ Linear LEU- Yeast Yes Good Linear plasmids must carry the telomeric (TEL) gene fragment at each each end to remain stable in progeny cells.
The centromere is the region of the chromosome where the sister chromatids are held together.

This assembles a structure called the kinetochore that is required for attachment to microtubules during alignment at the metaphase plate, splitting of the sister chromatids, and movement to the spindle poles. Because of the nature of DNA replication, a linear chromosome requires special sequences at the ends called Telomeres. DNA replication requires an RNA primer to initiate synthesis, which is degraded after priming. The loss of these primers on the lagging strand of the chromosome ends will result in a loss of information with each round of replication.


Consensus splice site sequences occur several times in the long introns of many mammalian genes, yet the spliceosomal snRNPs and splicing factors splice the correct splice sites together with high fidelity via exon definition (aka exon recognition) which is performed by the Cross Exon Recognition Complex (CERC). In addition to splice sites, exons define themselves with exonic splicing enhancers (ESEs) that bind serine/arginine rrich (SR) proteins with 1 or 2 RNP RNA-binding domains.

The exon definition and spliceosome complexes begin assembly as early as transcription; however, exon definition complexes identify exons before spliceosomes identify introns. Exons can thus be inserted into an intron (ie, via recombination) and remain functional, a capacity that enhances the evolution of new genes. Because exons are recognized as units, splice site or ESE mutations can cause: skipping of the entire exon; or, less commonly, activation of a nearby cryptic splice site. Many human disease arise via splice site or ESE mutations that cause aberrant mRNAs.

U1 snRNPRecognizes and interacts with pre-mRNA consensus 5' splice site.
U2 snRNPRecognizes and interacts with Branch Point A.
SR ProteinSerine-Arginine Rich Protein binds to exonic splicing enhancers.
U2AFRecognizes 3' splice site AG (65 kDa subunit) and pyrimidine-rich region (35 kDa subunit)


Introns have sequences that directs the splicing apparatus during RNA splicing, part of RNA processing:

  • Introns begin and end with splice sites that conform to consensus sequences.
  • Introns always begin with a GU encompassed within a larger 5’ splice site consensus.
  • Introns always end with the branch point sequence, several pyrimidines and an AG.

Gene families

ClassLengthCopy #GenomeOverview
Solitary GenesVariable1~15%
Gene FamiliesVariable2 - ~1,000~15%

A significant percentage of human genes are members of gene families. In some cases, the multiple copies allow increased production of identical gene products – rRNA. In other cases, the different family members have different but related functions – beta Globin. About half of all human genes are solitary genes, like the SUR2 gene. This means that there is only one gene of similar sequence and function in the haploid genome.

About half of all human protein coding genes are duplicated, or members of a gene family with >2 closely related genes. For example, the globin genes are members of a gene family. The β-globin genes on chromosome 11 have exons that are >90% identical. They are also >80% identical to the β-globin genes on another chromosome. The different beta Globins have evolved different oxygen affinities and transport properties and are adapted to use in different situations. For example, e globin is expressed in the developing fetus for absorbing oxygen from the maternal Hemoglobin in the placenta.

How did the duplicated genes arise? Gene duplication by unequal crossing-over between homologous repeats during homologous recombination in meiosis. Duplicated genes do not necessarily remain linked at the same chromosomal locus. Later events can move them to other locations in the genome.

Also, there is unclassified spacer DNA that accounts for ~25% of the genome.

Eukaryotic genomeComments