A genomic library contains all the sequences present in the genome of an organism (apart from any sequences, such as telomeres that cannot be readily cloned). It is a collection of cloned, restriction-enzyme-digested DNA fragments containing at least one copy of every DNA sequence in a genome. The entire genome of an organism is represented as a set of DNA fragments inserted into a vector molecule.
Size of Genomic Library:
It is possible to calculate the number (N) of recombinants (plaques or colonies) that must be in a genomic library to give a particular probability of obtaining a given sequence.
The formula is:
N = In (1 – P)/ln (1 – f),
where ‘P’ is the desired probability and ‘f is the fraction of the genome in one insert. For example, for a probability of 0.99 with insert sizes of 20kb this values for the E. coli (4.6 x 106 bp) and human (3 x 109 bp) genomes are:
Ng coli = In (1 – 0.99) / In [1 – (2 x 104/4.6 x 106)] = 1.1 x 103
Nhuman = In (1 – 0.99)/ In [1 – (2 x 104/3 x 109)] = 6.9 x 105
These values explain why it is possible to make good genomic libraries from prokaryotes in plasmids where the insert size is 5-10 kb, as only a few thousand recombinants will be needed.
Types of Genomic Libraries:
Depending on the source of DNA used forced construction of genomic library it is of following two types:
(a) Nuclear Genomic Library:
This is genomic library which includes the total DNA content of the nucleus. While making such a library we specifically extract the nuclear DNA and use it for the making of the library.
(b) Organelle Genomic Library:
In this case we exclude the nuclear DNA and targets the total DNA of either mitochondria, chloroplast or both.
Construction of genomic library
Construction of a genomic DNA library involves isolation, purification and fragmentation of genomic DNA followedby cloning of the fragmented DNA using suitable vectors. The eukaryotic cell nuclei are purified by digestion with protease and organic (phenol-chloroform) extraction. The derived genomic DNA is too large to incorporate into a vector and needs to be broken up into desirable fragment sizes. Fragmentation of DNA can be achieved by physical method and enzymatic method. The library created contains representative copies of all DNA fragments present within the genome.
Mechanisms for cleaving DNA
(a) Physical method
It involves mechanical shearing of genomic DNA using a narrow-gauge syringe needle or sonication to break up the DNA into suitable size fragments that can be cloned. Typically, an average DNA fragment size of about 20 kb is desirable for cloning into λ based vectors. DNA fragmentation is random which may result in variable sized DNA fragments. This method requires large quantities of DNA.
(b) Enzymatic method
• It involves use of restriction enzyme for the fragmentation of purified DNA.
• This method is limited by distribution probability of site prone to the action of restriction enzymes which will generate shorter DNA fragments than the desired size.
• If, a gene to be cloned contains multiple recognition sites for a particular restriction enzyme, the complete digestion will generate fragments that are generally too small to clone. As a consequence, the gene may not be represented within a library.
• To overcome this problem, partial digestion of the DNA molecule is usually carried out using known quantity of restriction enzyme to obtain fragments of ideal size.
• The two factors which govern the selection of the restriction enzymes are- type of ends (blunt or sticky) generated by the enzyme action and susceptibility of the enzyme to chemical modification of bases like methylation which can inhibit the enzyme activity.
• The fragments of desired size can be recovered by either agarose gel electrophoresis or sucrose gradient technique and ligated to suitable vectors.
Cloning of genomic DNA
Various vectors are available for cloning large DNA fragments. λ phage, yeast artificial chromosome, bacterial artificial chromosome etc. are considered as suitable vectors for larger DNA and λreplacement vectors like λDASH and EMBL3 are preferred for construction of genomic DNA library. T4 DNA ligase is used to ligate the selected DNA sequence into the vector.
(1) λ replacement vectors
The λEMBL series of vectors are widely used for genomic library construction. The multiple cloning sites of these vectors flanking the stuffer fragment contain opposed promoters for the T3 and T7 RNA polymerases. The restriction digestion of the recombinant vector generates short fragments of insert DNA left attached to these promoters. This generates RNA probes for the ends of the DNA insert.These vectors can be made conveniently, directly from the vector, without recourse to sub-cloning.
(2) High-capacity vectors
The high capacity cloning vectors used for the construction of genomic libraries are cosmids, bacterial artificial chromosomes (BACs), P1-derived artificial chromosomes (PACs) and yeast artificial chromosomes (YACs). They are designed to handle longer DNA inserts, much larger than for λ replacement vectors. So they require lower number of recombinantsto be screened for identification of a particular gene of interest.
|Up to 20-30 kb||Genome size-47 kb, efficient packaging system, replacement vectors usually employed, used to study individual genes.|
|Cosmids||Up to 40 kb||Contains cos site of λ phage to allow packaging, propagate in E. coli as plasmids, useful for sub-cloning of DNA inserts from YAC, BAC, PAC etc.|
|Fosmids||35-45 kb||Contains F-plasmid origin of replication and λ cossite, low copy number, stable.|
|Bacterial artificial chromosomes (BAC)||Up to 300kb||Based on F-plasmid, relatively large and high capacity vectors.|
P1 artificial chromosomes(PACs)
|Up to 300 kb||Derived from DNA of P1 bacteriophage, combines the features of P1 and BACs, used to clone larger genes and in physical mapping, chromosome walking as well as shotgun sequencing of complex genomes.|
|Yeast artificial chromosomes (YAC)||Up to 2000kb||
Allow identification of successful transformants. (BAC clones are highly stable and highly efficient)
Disadvantages of genomic library
• Genome libraries from eukaryotes having very large genomes contain a lot of DNA which does not code for proteins and also contain non-coding DNA such as repetitive DNA and regulatory regions which makes them less than ideal.
• Genomic library from a eukaryotic organism will not work if the screening method requires the expression of a gene.
Storage of Genomic Library:
Once a genomic library has been made it forms a useful resource for subsequent experiments as well as for the initial purpose for which it was produced. Therefore, it is necessary to store it safely for future use. A random library will consist of a test tube containing a suspension of bacteriophage particle (for a phage vector).
The libraries are stored at – 80°C. Bacterial cells in a plasmid library are protected from the adverse effects of freezing by glycerol, while phage libraries are cryoprotected by dimethyl sulfoxide (DMSO).
Disadvantages of Genomic Library:
The main reason behind making a genomic library is to identify a clone from the library which encodes a particular gene or genes of interest. Genomic libraries are particularly useful when you are working with prokaryotic organisms, which have relatively small genomes.
On the face of it, genome libraries might be expected to be less practical when you are working with eukaryotes, which have very large genomes containing a lot of DNA which does not code for proteins.
A library representation of a eukaryotic organism would contain a very large number of clones, many of which would contain non-coding DNA such as repetitive DNA and regulatory regions. Also, eukaryotic genes often contain introns, which are un-translated regions interrupting the coding sequence.
These regions are normally copied into mRNA in the nucleus but spliced out before the mature mRNA is exported to the cytoplasm for translation into protein. Prokaryotic organisms are unable to do this processing so the mature mRNA cannot be made in E. coli and the protein will not be expressed.
If your screening method requires that the gene be expressed it will not work with a genomic library from a eukaryotic organism.
Applications of Genomic Library:
Genomic library has following applications:
1. It helps in the determination of the complete genome sequence of a given organism.
2. It serves as a source of genomic sequence for generation of transgenic animals through genetic engineering.
3. It helps in the study of the function of regulatory sequences in vitro.
4. It helps in the study of genetic mutations in cancer tissues.
5. Genomic library helps in identification of the novel pharmaceutical important genes.
6. It helps us in understanding the complexity of genomes.
In higher eukaryotes, gene expression is tissue-specific. Only certain cell types show moderate to high expression of a single gene or a group of genes. For example, the genes encoding globin proteins are expressed only in erythrocyte precursor cells, called reticulocytes. Using this information a target gene can be cloned by isolating the mRNA from a specific tissue. The specific DNA sequences are synthesized as copies from mRNAs of a particular cell type, and cloned into bacteriophage vectors. cDNA (complementary DNA) is produced from a fully transcribed mRNA which contains only the expressed genes of an organism. Clones of such DNA copies of mRNAs are called cDNA clones.
A cDNA library is a combination of cloned cDNA fragments constituting some portion of the transcriptome of an organism which are inserted into a number of host cells. In eukaryotic cells, the mRNA is spliced before translation into protein. The DNA synthesized from the spliced mRNA doesn’t have introns or non-coding regions of the gene. As a result, the protein under expression can be sequenced from the DNA which is the main advantage of cDNA cloning over genomic DNA cloning.
Construction of a cDNA Library
The construction of cDNA library involves following steps-
1. Isolation of mRNA
2. Synthesis of first and second strand of cDNA
3. Incorporation of cDNA into a vector
4. Cloning of cDNAs
Synthesis of first and second strand of cDNA
• mRNA being single-stranded cannot be cloned as such and is not a substrate for DNA ligase. It is first converted into DNA before insertion into a suitable vector which can be achieved using reverse transcriptase (RNA-dependent DNA polymerase or RTase) obtained from avian myeloblastosis virus (AMV).
• A short oligo (dT) primer is annealed to the Poly (A) tail on the mRNA.
• Reverse transcriptase extends the 3´-end of the primer using mRNA molecule as a template producing a cDNA: mRNA hybrid.
• The mRNA from the cDNA: mRNA hybrid can be removed by RNase H or Alkaline hydrolysis to give a ss-cDNA molecule.
• No primer is required as the 3´end of this ss-cDNA serves as its own primer generating a short hairpin loop at this end.This free 3´-OH is required for the synthesis of its complementary strand.
• The single stranded (ss) cDNA is then converted into double stranded (ds) cDNA by either RTase or E. coli DNA polymerase.
• The ds-cDNA can be trimmed with S1 nuclease to obtain blunt–ended ds-cDNA molecule followed by addition of terminal transferase to tail the cDNA with C’s and ligation into a vector.
Incorporation of cDNA into a vector
The blunt-ended cDNA termini are modified in order to ligate into a vector to prepare ds-cDNA for cloning. Since blunt-end ligation is inefficient, short restriction-site linkers are first ligated to both ends.
It is a double-stranded DNA segment with a recognition site for a particular restriction enzyme. It is 10-12 base pairs long prepared by hybridizing chemically synthesized complementary oligonucleotides. The blunt ended ds-DNAs are ligated with the linkers by the DNA ligase from T4 Bacteriophage.
The resulting double-stranded cDNAs with linkers at both ends are treated with a restriction enzyme specific for the linker generating cDNA molecules with sticky ends. Problems arise, when cDNA itself has a site for the restriction enzyme cleaving the linkers. This can be overcome using an appropriate modification enzyme (methylase) to protect any internal recognition site from digestion which methylates specific bases within the restriction-site sequence, thereby, preventing the restriction enzyme binding.
Ligation of the digested ds-cDNA into a vector is the final step in the construction of a cDNA library. The vectors (e.g. plasmid or bacteriophage) should be restricted with the same restriction enzyme used for linkers. The E. coli cells are transformed with the recombinant vector, producing a library of plasmid or λ clones. These clones contain cDNA corresponding to a particular mRNA.
cDNAs are usually cloned in phage insertion vectors.Bacteriophage vectors offer the following advantageous over plasmid vectors,
• are more suitable when a large number of recombinants are required for cloning low-abundant mRNAs as recombinant phages are produced by in vitro packaging.
• can easily store and handle large numbers of phage clones, as compared to the bacterial colonies carrying plasmids.
Plasmid vectors are used extensively for cDNA cloning, particularly in the isolation of the desired cDNA sequence involving the screening of a relatively small number of clones.
Commonly used vectors for cDNA cloning and expression
|DNA inserts of 7.6 kb and 7.2 kb, respectively, inserted at a unique EcoRI cloning site; recombinant Lambda gt10 selected on the basis of plaque morphology; Lambda gt11 has E. coli LacZ gene: LacZ and cDNA encoded protein is expressed as fusion protein.|
|Lambda ZAP series (phasmids)||Up to 10 kb DNA insert; therefore, most cDNAs can be cloned; polylinker has six cloning site; T3 and T7 RNA polymerase sites flank the polylinker so that riboprobes of both strands can be prepared; these features are contained in plasmid vector p Bluescript, which is inserted into the phage genome; the plasmid containing cDNA recovered simply by co-infecting the bacteria with a helper f1 phage that helps excise from the phage genome.|
Problems in cDNA preparation
• Large mRNA sequence results in inefficient synthesis of full- length cDNA. This cause problems during expression as it may not contain the entire coding sequence of the gene. This arises because of the poor processivity of RTase purified from avian myeloblastosis virus (AVM) or produced in E.coli from the gene of Moloney murine leukemia virus (MMLV).
• Use of S1 nuclease, the enzyme used to trim the ds cDNA, may remove some important 5´ sequences.
Strategies to overcome the limitations in cDNA preparation
Strategies that can be employed to overcome the above limitations are listed as follows-
• A specially designed E. coli vector can be used to avoid incomplete copying of the RNA.
• The use of single strand specific nuclease can be avoided by adding a poly-C tail to the 3´-end of the single stranded cDNA produced by copying of the mRNA by the enzyme terminal deoxynucleotidyl transferase. Complementary oligonucleotide (Poly-G) is now used as a primer for the synthesis of complementary strand to yield ds-cDNA without a hairpin loop enhancing the full-length cDNA production.
Applications of cDNA libraries/cloning
• Discovery of novel genes.
• in vitro study of gene function by cloning full-length cDNA.
• Determination of alternative splicing in various cell types/tissues.
• They are commonly used for the removal of various non-coding regions from the library.
• Expression of eukaryotic genes in prokaryotes as they lack introns in their DNA and therefore do not have any enzymes to cut it out in transcription process. Gene expression required either for the detection of the clone or the polypeptide product may be the primary objective of cloning.
Comparison of Genomic and cDNALibraries
cDNA library has revolutionized the field of molecular genetics and recombinant DNA technology. It consists of a population of bacterial transformants or phage lysates in which each mRNA isolated from an organism is represented as its cDNA insertion in a vector. cDNA libraries are used to express eukaryotic genes in prokaryotes. In addition, cDNAs are used to generate expressed sequence tags (ESTs) and splices variant analysis.
|Sequences present||Ideally, all genomic sequences||Only structural genes that are transcribed|
|Contents affected by :
(a) Developmental stage
(b) Cell type
|Features of DNA insert(s) representing a gene:
(c) 5´- and 3´- regulatory sequences
As present in genome
Ordinarily, much smaller
|As compared to the genome
(a) Enrichment of sequences
(b) Redundancy in frequency
(c) Variant forms of a gene
In amplified genomic libraries
In amplified libraries
For abundant mRNAs
For rare mRNA species
For such genes, whose RNA transcripts are alternatively spliced