What is Bioinformatics?
Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states.
Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:
The development and implementation of tools that enable efficient access to, and use and management of, various types of information.
The development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences About bioinformatics.
Bioinformatics is concerned with the application of computational techniques and information technology to biology. The aim of the following web portal is to create an entry point to selected information resources in bioinformatics.
Why is bioinformatics important?
The greatest challenge facing the molecular biology community today is to make sense of the wealth of data that has been produced by the genome sequencing projects. Sequence generation, and its subsequent storage, analysis are computer dependent tasks. molecular biology of an organism research being carried out at different levels such as genome, proteome, transcriptome and metabalome levels.
The first challenge is the intelligent and efficient storage of this mass of data. That responsibility to provide easy and reliable access to this data and incisive computer tools must be developed to allow the extraction of meaningful biological information.Recent years have seen an explosive growth in biological data.
Large sequencing projects are producing increasing quantities of nucleotide sequences. The contents of nucleotide databases are doubling in size approximately every 14 months. The latest release of GenBank (V.102) exceeded one billion base pairs. Not only the size of sequence data is rapidly increasing, but also the number of characterized genes from many organisms and protein structures doubles about every two years.
To cope with this great quantity of data, a new scientific discipline has emerged: bioinformatics, biocomputing or computational biology.
How to become a bioinformatics expert?
Bioinformatics combines the tools and techniques of mathematics, computer science and biology in order to understand the biological significance of a variety of data. So if you like to get into this new scientific field you should be fond of these classic disciplines. Because the field is so new, almost everyone in it did something else before. Some biologist went into bioinformatics by picking up programming but others entered via the reverse route.
Evolutionary BiologyNew insight into the molecular basis of a disease may come from investigating the function of homologs of a disease gene in model organisms. In this case, homology refers to two genes sharing a common evolutionary history. Scientists also use the term homology, or homologous, to simply mean similar, regardless of the evolutionary relationship.Equally exciting is the potential for uncovering evolutionary relationships and patterns between different forms of life. With the aid of nucleotide and protein sequences, it should be possible to find the ancestral ties between different organisms. Thus far, experience has taught us that closely related organisms have similar sequences and that more distantly related organisms have more dissimilar sequences.
Proteins that show a significant sequence conservation, indicating a clear evolutionary relationship, are said to be from the same protein family. By studying protein folds (distinct protein building blocks) and families, scientists are able to reconstruct the evolutionary relationship between two species and to estimate the time of divergence between two organisms since they last shared a common ancestor.
The process of evolution has resulted in the production of DNA sequences that encode proteins with specific functions. In the absence of a protein structure that has been determined by X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, researchers can try to predict the three-dimensional structure using protein or molecular modeling. This method uses experimentally determined protein structures (templates) to predict the structure of another protein that has a similar amino acid sequence (target).
Although molecular modeling may not be as accurate at determining a proteins structure as experimental methods, it is still extremely helpful in proposing and testing various biological hypotheses. Molecular modeling also provides a starting point for researchers wishing to confirm a structure through X-ray crystallography and NMR spectroscopy. Because the different genome projects are producing more sequences and because novel protein folds and families are being determined, protein modeling will become an increasingly important tool for scientists working to understand normal and disease-related processes in living organisms.
Identify the proteins with known three-dimensional structures that are related to the target sequence
Align the related three-dimensional structures with the target sequence and determine those structures that will be used as templates
Construct a model for the target sequence based on its alignment with the template structure(s)
Evaluate the model against a variety of criteria to determine if it is satisfactory
Genomic maps serve as a scaffold for orienting sequence information. A few years ago, a researcher wanting to localize a gene, or nucleotide sequence, was forced to manually map the genomic region of interest, a time-consuming and often painstaking process. Today, thanks to new technologies and the influx of sequence data, a number of high-quality, genome-wide maps are available to the scientific community for use in their research.
Computerized maps make gene hunting faster, cheaper, and more practical for almost any scientist. In a nutshell, scientists would first use a genetic map to assign a gene to a relatively small area of a chromosome. They would then use a physical map to examine the region of interest close up, to determine a genes precise location. In light of these advances, a researchers burden has shifted from mapping a genome or genomic region of interest to navigating a vast number of Web sites and databases.
A genetic algorithm (GA) is an algorithm used to find approximate solutions to difficult-to-solve problems through application of the principles of evolutionary biology to computer science. Genetic algorithms use biologically-derived techniques such as inheritance, mutation, natural selection, and recombination. Genetic algorithms are a particular class ofevolutionary algorithms.
Data mining is the process of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" and "the science of extracting useful information from large data sets or databases". Data mining in relation to Enterprise Resource Planning is the statistical and logical analysis of large sets of transaction data, looking for patterns that can aid decision making.
This section includes some simple rules-of-thumb to apply when performing common bioinformatics tasks.
Facts About Genome Sequencing
What is DNA sequencing?
Whose genome was sequenced in the Human Genome Project?
Who sequenced the human genome?
How is DNA sequencing done?
What is the difference between draft sequence and finished sequence?
How does this differ from the finished human genome announced by HGP researchers in 2003?
What genomes have been sequenced completely?
What happens now that the human genome sequence is completed?
What nonhuman genome sequencing projects are supported by the U.S. Department of Energy?
General Note on Databases
The interdisciplinary nature of bioinformatics will require the use of a variety of discipline-specific databases. The following are those used most frequently, but the researcher should also consult the “Databases for Related Subjects,” listed on the navigation bar to the left, to determine other databases suitable for exploring topics in bioinformatics. Databases outside of the Librarys collection are commonly used, including the public access and commercial databanks and genomic files, listed separately below.
Journals Of Bioinformatics