The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis. About 10% of the genes are dedicated to the production of 2 families of glycine-rich proteins called PE (proline-glutamine motifs) and PPE (proline-proline-glutamine motifs) with repetitive structure that may represent a source of antigenic variation. Their functions are not known. M. tuberculosis has no plasmids but may harbor mycobacteriophages.
In the past, there is scant information about the molecular basis of mycobacterial virulence. Only three virulence factors had been described before the completion of the genome sequence, including catalase-peroxidase which protects against reactive oxygen products produced by macrophages, mce which encodes the macrophage-colonizing factor, and a sigma factor gene, sigA. Inactivation of these factors/genes leads to attenuation in virulence. Scrutinization of the genome sequence reveals several other genes contributing to virulence. For example, a homologue of smpB, implicated in intracellular survival of Salmonella typhimurium, has been found in the M. tuberculosis genome. Other candidate virulence factors include phospholipases C, lipases, and esterases, which might attack cellular or vacuolar membranes. Some genes identified may be related to prolonged intracellular survival.
As the genome sequence offers an intergrated view of the disease and pathogenesis mechanisms, this knowledge will accelerate our research to fight with tuberculosis at an unprecedented pace.