Tuesday 16 December 2008

Primer 2: Genomes

A genome is the collective genetic material, the DNA or RNA, of an organism. It is all the Nuclotide bases found in DNA/RNA - A(T/U)CG - in the order found in the organism. All organisms on earth use DNA/RNA and if they can be used between species - it's all compatible - you could stick mouse DNA into a fly's genome or even a human's genome and it would do something at least. Almost all cells in the human body have the full set of chromosomes in them - the whole genome.

Prokaryotic-celled species such a bacteria have a single long strand of DNA floating freely in the cytoplasm of the cell. Viruses have an RNA (occasionally DNA) genome floating in their capsule (head).

Eukaryotic-celled species such as mammals have numerous chromosomes with DNA wound onto them. Not all the DNA is useful - in humans only 5% of 3 Billion bases codes for proteins! The coding parts are called Exons. The rest of the DNA is often called "junk DNA" but some people think it still plays an role in the cell but they aren't exactly sure what. Non-coding parts are called Introns.

The first genome to ever get sequenced was that of the virus, a Bacteriophage, a virus that infects bacteria. It had an RNA genome of 3569 bases. Viruses are actually not living organisms but instead are a structure made of proteins in which some genetic material is stored in the head, and when the virus manages to latch on to a cell, it creates a pore in the cell membrane and inserts its RNA genome, which the host cell unwittingly takes and replicates to create more viruses. Viruses tend to cause cell death and the cell can explode or bud-off packages to release the virus copies. No-one is really sure where viruses came from or how they came to be.

The first genome of a living organism to be sequenced was that of the bacteria Haemophilus influenzae in 1995, done using the Shotgun method, as a proof of concept. It had 1.83 Million bases (Mb) on one chromosome!

The shotgun method involves cutting up the whole genome using a restriction enzyme (an enzyme that cuts DNA when it finds a specific base sequence) and then analysing the pieces and then sticking them back together like a puzzle. The reading of the bases and the puzzle solving is done by computers. For example, say we cut the genome and we get lots of strands that are similar to each other but overlap each other - we line them up like so:

ATCCCGGATGCTCTG
-----------ATGCTCTGAAAAAATTCCCCCC

The hyphens represent spaces, to fit the overlap, in order to align the sequences. The first fragment shares some of the code from the second fragment so we assume they come from the same place and so in reality the original DNA strand contains the code ATCCCGGATGCTCTGAAAAAATTCCCCCC. The computer does this lots of times with lots of fragments and eventually everything lines up and it publishes the final sequence.

c.f. Eric D. Green (2001); STRATEGIES FOR THE SYSTEMATIC SEQUENCING OF COMPLEX GENOMES; Nature Reviews: Genetics, Aug 2001 Vol2. p573 (here)

What's amazing is that the process is repeated over and over to fill any inconsistancies and gaps in the data and to validate the original sequencing. Some parts of the human genome were repeatedly sequenced up to 12 times. When a draft sequence is published it is sometimes possible to find an X nucleotide base in the middle of a sequence - this signifies that particular base was either unclear or the data was not collected successfully. It is a long and laborious endevour but well worth it as it has led to the genomics revolution which we are currently at the start of - you will soon see so many discoveries thanks to the availability of the genomes of humans and model organisms such as mice, zebra fish, nematode worms, drosophlia flies and the Arabidopsis plant.

The first draft of the human genome was published in late 2000 and then the final draft was published in 2003. It was dicovered the human genome had between 25000 and 30000 genes which was much less than anyone had predicted since even simpler organisms like the rice plant has about 30000 genes (but a smaller genome). Finally, take a look at the table of species that have been sequenced and look at how huge the genome of the single-celled Amoeba is!!! (here)

A good resource, some slides, although slightly innacurate, can be seen here.

No comments:

Post a Comment