Clinical and Biomolecular Ontologies for E-Health

Clinical and Biomolecular Ontologies for E-Health

Mario Ceresa
DOI: 10.4018/978-1-60566-002-8.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter mainly focuses on biomedical knowledge representation and its use in biomedicine. It first illustrates the existent more relevant bioinformatics resources and why they need to be better integrated. Then it describes what the main problems that machines can encounter in processing the factual biomedical knowledge are, what terminologies, classifications and ontologies are, and why they could help in better organizing and exploiting the bioinformatics resources available online. The authors hope that a concise perspective of the field and a list of selected resources, commented with their scope and usability, may help interested people in quickly understanding the main principles of knowledge representation in biomedicine and its high relevance for modern biomedical research and e-health.
Chapter Preview
Top

Introduction

In the current post-genomic era, molecular medicine is increasingly gaining relevance both in health care research and practice. Availability of the complete sequence of the human genome and of new nanotechnology approaches in molecular biology allows quickly and simultaneously studying thousands of genes and their expression levels. Advancements in information technologies and biomedical informatics are providing tools and techniques to manage the amount of data produced, and are making more easily accessible several different databanks of biomolecular information and many methods for their analysis. With the increasing biomolecular and biomedical informatics progresses, many healthcare sites are progressively more offering several genetics tests at relatively low costs. In the near future, biomolecular tests and screenings are expected to revolutionize the diagnosis of inherited diseases in a similar way as imaging tests from different techniques (e.g. computer tomography (CT), magnetic resonance (MR), positron emission tomography (PET), single photon emission computer tomography (SPECT), ultra sounds (US)) have transformed the diagnostic practice of several illnesses and the diagnostic services offered by healthcare providers.

Although genetics tests can now be easily and routinely performed thanks to the automatic or semiautomatic procedures developed, management and interpretation of the data they produce, in particular of the results of more advanced biomolecular exams, still present a number of issues. In fact, they generate a great amount of data that need to be efficiently stored and statistically analyzed in order to identify, among all genes or proteins studied in each test, those significantly altered in the tested conditions. Moreover, to correctly interpret such test results, the known structural, functional, and phenotypic information about the identified genes and protein products need to be further analyzed. Such information - which include presence of specific sequence characteristics and protein domains; cytogenetic localization; expression in different cellular tissues and organs; and involvement in particular biological processes, molecular functions, biochemical pathways, genetic diseases or phenotypes - are increasingly available within numerous distributed databanks, generally easily accessible also through Web interfaces. However, some issues hamper their effective and comprehensive use for the simultaneous analysis of the several hundreds of relevant genes and proteins identified in each biomolecular test. Such difficulties include: spreading of the required information among many heterogeneous databanks, the way most databanks provide these information (i.e. within unstructured HTML pages, one page for each gene or protein entry with all the information in the databank about the entry), and still lack of usage of common terminologies and bio-ontologies to describe biomolecular structural and functional characteristics of genes and their protein products and their phenotypic manifestations.

Key Terms in this Chapter

Terminology: A collection of names of the entities involved in a domain. It simply states which are the principal terms used in the domain without any further information. Though it is a quite simplistic approach, yet it is extremely useful because helps computer programs to recognize the relevant terms and concentrate only on them. Although sometimes could be difficult to understand the difference between a terminology and a controlled vocabulary, the former is just a list of the terms used in a domain, while the latter guarantees that its terms are precise, accurate and unequivocal.

Biomedical Informatics: The discipline that studies biomedical information and knowledge, focusing in particular on their structure, acquisition, integration, management, and optimal use. It adopts and applies results from a variety of other disciplines including Information Science, Computer Science, Cognitive Science, Statistics and Biometrics, Mathematics, Artificial Intelligence, Operations Research, and basic and clinical Health Sciences.

Proteomics: The study of the whole of all possible proteins (amino acid sequences) of an organism, translated from different transcripts (mRNA sequences transcripted from a nucleotide sequence).

Biomolecular or Genomic Databank: A structured repository of biomolecular, genomic or proteomic data, often integrated with their related biological, medical, clinical, or experimental information. Generally it also provides interfaces and tools for browsing, querying, and sometime analyzing the data it contains.

Genomics: The systematic identification and study of Genomes, each of them including all the whole genetic material of a living organism.

Bioinformatics: A join branch of biology and informatics concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. It comprehends all computational methods and theories applicable to molecular biology and the computer-based techniques for solving biological problems, including manipulation of models and datasets.

Classificat ion: A collection of terms organized in categories. Thus, it includes only the is-a relationship between terms. This enables machines to group together bottom level terms up to their higher level ancestor, e.g. grouping all “lipidic methabolism” related terms under the upper term “metabolism.”

Semantic Network: A graph structure useful to represent the knowledge of a domain. It is composed of a set of objects, the graph nodes, which represent the concepts of the domain, and relations among such objects, the graph arches, which represent the domain knowledge. The semantic networks are also a reasoning tool as it is possible to find relations among the concepts of a semantic network that do not have a direct relation among them. To this aim, it is enough “to follow the arrows” of the network arches that exit from the considered nodes and find in which node the paths meet.

Controlled vocabulary: A collection of precise and universally understandable terms that define and identify the concepts of a domain in a unique and unequivocal way, e.g. the anatomical terminology. Such a vocabulary is said controlled because it is defined and maintained updated by people, the curators, who are expert of the domain the vocabulary refers to. Controlled vocabularies are very useful in extended and complex domains, such as Medicine and Biology, where distinct concepts must be identified with high precision in order to codify, analyze, and communicate the domain knowledge. Though they are similar to terminologies, the difference is that a terminology does not guarantee that its terms are precise, accurate and unequivocal, but it is rather a list of used terms for a specific domain.

Ontology: A semantic structure useful to standardize and provide rigorous definitions of the terminology used in a domain and to describe the knowledge of the domain. It is composed of a controlled vocabulary, which describes the concepts of the considered domain, and a semantic network, which describes the relations among such concepts. Each concept is connected to other concepts of the domain through semantic relations that specify the knowledge of the domain. A general concept can be described by several terms that can be synonyms or characteristic of different domains in which the concept exists. For this reason the ontologies tend to have a hierarchical structure, with generic concepts/terms at the higher levels of the hierarchy and specific concepts/terms at the lover levels, connected by different types of relations.

Complete Chapter List

Search this Book:
Reset