Overview of Bioinformatics

Tram Ho

General introduction to bioinformatics

What is bioinformatics?

Bioinformatics is an interdisciplinary field of study in biological sciences and computational sciences. Although the term ‘Bioinformatics’ is not really well defined, we can say that this scientific field involves the computational management of all types of molecular biology information. Most of the bioinformatics work being done involves biological data analysis or bioinformatics. Bioinformatics uses the technologies of applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry and biochemistry to solve biological problems.

Bioinformatics and computational science

A commonly used alternative term for bioinformatics is computational biology. However, there is only a very small boundary on the difference between Bioinformatics and Computational Biology, that difference can be stated as follows:

  • Computational Biology focuses on the development and application of data analysis, theory, mathematical modeling and computational simulation methods to study biology, behavior, and social systems.
  • Meanwhile, Bioinformatics focuses on research, development or application of computational tools and methods to expand the use of biological, medical, behavioral or health data, including those of data to collect, store, organize, analyze, or visualize it

The following image from a discussion on ResearchGate gives us a more intuitive view of these two areas.

Samuel, Johnson. (2020). Re: What are the differences between Bioinformatics and Computational Biology? . Retrieved from: https://www.researchgate.net/post/What_are_the_differences_between_Bioinformatics_and_Computational_Biology/5f51aad578613d7eea78eadc/citation/download.

Samuel, Johnson. (2020). Re: What are the differences between Bioinformatics and Computational Biology? . Retrieved from: https://www.researchgate.net/post/What_are_the_differences_between_Bioinformatics_and_Computational_Biology/5f51aad578613d7eea78eadc/citation/download .

Development history of Bioinformatics

Bioinformatics is not a traditional field of study. The term bioinformatics was first introduced in the 1990s. Originally, it handled the management and analysis of data related to DNA, RNA and protein sequences because biological data was generated at unprecedented speed, its management and interpretation always required. has powerful computing tools such as calculators used in Bioinformatics. Therefore, bioinformatics now includes many other types of biological data. Leaving aside the important milestones of Biology as well as Computer Science as well as statistics, we can list the key milestones of Modern Bioinformatics as follows:

  • 1962 – Pauling’s theory of molecular evolution is published
  • 1967 – Margaret Dayhoff Protein Map published. This book is about degenerative encoding of amino acids, which is the premise for the Protein Information Resource database on protein sequencing, the first online database system that can be accessed by computer. far.
  • 1970 – The Needleman-Wunsch algorithm is published. The Needleman – Wunsch algorithm is an algorithm used in bioinformatics to sequence proteins or nucleotides.
  • 1977 – DNA sequencing and software for its analysis appeared, the Staden Package
  • 1981 – The concept of a sequence motif (Doolittle) appeared. A motif sequence can be understood as a sequence of common nucleotide or amino acids and having, or presumably, some biological function.
  • 1982 – Lambda Phage Study published
  • 1983 – Method of querying the public boos databases: Wilbur-Lipman algorithm
  • 1985 – FASTP / FASTN (fast sequence similarity searching) published
  • 1987 – Sequence profiles
  • 1987 – The databases EMBL, Genbank, Swiss Genbank, Swiss-Prot appear
  • 1988 – US National Biotechnology Information Center was established
  • 1988 – Distributed network of EMBnet databases appeared
  • 1990 – BLAST: fast sequence similarity searching BLAST: fast sequence similarity searching
  • 1991 – EST: express seque EST: expressing sequence tag sequencing nce tag sequencing
  • 1993 – The Sanger Center, Hinxton, UK is established
  • 1994 – European Bioinformatics Institute: EMBL (European Bioinformatics Institute), Hinxton, UK was established.
  • 1995 – First bacterial genome published
  • 1996 – The yeast genome was published
  • 1997 – PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search Tool) appears
  • 1998 – Worms (multicellular genome)
  • 2000s to present – Studies on the human and rice genomes published.

In recent times, along with the rapid development of computational science as well as the power of computers has been increased at a high speed, the researchers of Bioinformatics have gained many remarkable achievements in processing big data in the field of biology. One of the most reliable sources of data to search for bioinformatics publications is PubMed – a free database primarily accessible through the MEDLINE database of references and summaries of topics. Life science and biomedical topics.

The Basics of Bioinformatics

The Wikipedia article on Bioinformatics lists key areas of research including Genomics – Genetics, Evolutionary Biology, Genetic Function Analysis, Model Biological Systems and Level Imaging Analysis. and with them their smaller research subdivisions. This article does not repeat all of the above sections, but only focuses on two main parts: Genomics and Proteomics as well as Transcriptomics: the areas listed by some articles next to the two parts above, however, are not. mentioned in the Wikipedia article.


The focus of genomics is the genome, which is the genome of all the DNA in an organism, including its genes. Genes carry the information to make all proteins that are required by all organisms. These proteins determine the shape of the organism, how well its body metabolizes food or fights infections, and sometimes even how it works. DNA is made up of four identical chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times across the entire genome. The specific order of the A, T, C and G is extremely important. The order that underpins the diversity of life, even determines whether an organism is human or another species such as yeast, rice or fruit fly, all have their own genome and are themselves. the focus of genome projects. Because all organisms are related through similarities in DNA sequencing, the insights gained from the non-human genome often lead to new knowledge of human biology. .

One of the possible applications of genomics is illustrated by the following figure:

Using genomic information to develop new treatments-an example of drug protein interaction

Using genomic information to develop new treatments-an example of drug protein interaction. This image was created by the NHS HEE Genomics Education Program. For further information and resources please visit our website www.genomicseducation.hee.nhs.uk


Proteomics focuses on the determination of a protein’s three-dimensional structure, its interactions and its function. Published studies typically focus on the structural prediction and protein-protein interaction described in this section. Functional analyzes include gene expression profiling, prediction of protein-protein interactions, prediction of subcellular localization, regeneration of metabolic pathways, and simulation.

Proteomics focuses on understanding:

  • When and where proteins are expressed
  • production rate, protein breakdown and abundance in a steady state;
  • How proteins are converted
  • Movement of proteins between subcellular compartments;
  • Involvement of proteins in metabolic pathways;
  • How proteins interact with each other.

Proteomics can provide important biological information for many biological problems, such as:

  • Which proteins interact with a particular protein are of interest
  • Which proteins are localized in a subcellular compartment
  • What proteins are involved in a biological process?

The following picture comes from the article What is proteomics? gives us an overview of Proteomics’ areas of research

From the figure above we can see that proteomic experiments typically collect data on three protein properties in a sample: location, abundance / cycle and post-translation modifications. Depending on the experimental design, researchers may be interested in these data directly or can use them to derive additional information such as possible protein interaction partners. other partners color it, or to judge whether a protein is active from post-translation modifications.

Content article references What is proteomics? is posted on the website of The European Bioinformatics Institute <EMBL-EBI


Transcriptomics is the study of transcription – the complete set of RNA copies made by the genome, in specific cases or in a specific cell – using high-flux methods. , such as microarray analysis. Comparing transcriptomes allows to identify genes that are differently expressed in separate cell populations or in response to different treatments.

The main directions of the Transcriptomics study:

  • Describe the different states of cells (ie, growth stages), tissues or cell cycle stages by expression patterns;
  • Explore the molecular mechanisms underneath a phenotype;
  • Identify the biological markers that manifest the difference between an ill state and a healthy state;
  • Distinguish the stages or types of the sub-disease (eg, stage of cancer);
  • Establish causal relationships between genetic variations and gene expression patterns to elucidate the etiology of the diseases.

Refer to Part 3 – Transcriptomics in Bioinformatics for Biomedical Science and Clinical Applications, Woodhead Publishing Series in Biomedicine pages 49-82 by Kung-HaoLiang, see more here

Transcriptomics is still a new field in development so we cannot enumerate its full potential. However, we can still list some of its important uses as follows:

  • Plant breeding and crossbreeding.
  • Stem cell and cancer research.
  • Research on embryogenesis and in vitro fertilization.
  • Study of tissue-specific gene expression.
  • Characteristics of non-coding RNAs.
  • Retrotransposons such as TE are incorporated into the genome through reverse transcription, which can disrupt functions * of the gene and induce epigenetic mutations or changes.
  • Transcription is like the technologies used to detect the presence of transgenic particles in a genome.
  • Transcriptomics-based RNA-seq technology is used to detect pathogenic and virulent strains present in samples.

The above conclusion is from the article Transcriptomics technologies of Rohan Lowe, Neil Shirley, Mark Bleackley, Stephen Dolan, Thomas Shafee, and can be found here.


The above short article has briefly presented the basic information of Bioinformatics including basic concepts, history, distinguishing it from other related fields as well as major research areas. Along with the development of technology, new fields such as Bioinformatics have the opportunity to grow much faster than before and get more attention gradually, for example, in our school Bioinformatics this year opened 3 Classes, unlike many years ago, few people registered so they could not open classes (learn to get knowledge, not because of high scores). Bioinformatics is an interdisciplinary field so to be able to work in this field, we need to equip ourselves with a large amount of knowledge about Biology as well as BigData and more statistically it is a discipline. As a young field, there will be some difficulties for everyone to get used to. Therefore, to be able to update knowledge about this field, we can refer to documents as well as publications from reputable institutions such as the European Bioinformatics Institute , National Center for Biotechnology Information , … as well as sites like PubMed or nature research . This article is all thanks to everyone who took the time to read.

Share the news now

Source : Viblo