Article Preview
Top1. Introduction
A scientific experiment is defined as a series of interconnected operations (Goble et al., 2010), which can be executed using one or more workflows. A scientific workflow is a model or template that represents a sequence of scientific activities implemented by tools in order to reach a certain objective (Deelman et al., 2009). The wide adoption of scientific workflows, as a mechanism to aggregate existing services, has radically revolutionized the way scientists conduct their experiments, since workflows allow to gather evidence for or against a hypothesis, and still demonstrate a known fact (Belhajjame et al, 2011).
According to (Nardi, 2009), users of scientific workflows, most of the time, work in a specific field of research and do not always have a computer science adequate training. Often, they begin an application by copying an existing workflow and then adjusting it to their needs. In this vein, another important issue is the loss of the researcher's knowledge about the experiment (Marinho et al., 2012), due to the delegation of tasks to computers that usually perform isolated actions, without documentation. Thus, to represent and support the development of a scientific experiment, it is necessary to register the associated workflows and their variations, since they can be modified during the research (Mattoso et al., 2010).
One way of storing this data is to use provenance models (Buneman et al., 2001), storing data produced from scientific workflows (Sirqueira et al., 2016). The use of provenance data allows the scientist to compose new workflows based on the reuse of data from previous ones. However, only provenance data used in isolation does not allow adequate control of the experiment and its associated workflows, making it difficult to manage the experiment as a whole. According to Hasan et al. (2007), it is necessary to use independent tools to manage the experiment and analyze its data, considering that Scientific Workflow Management Systems (SWMS) do not have this functionality. It considers only the researcher responsible for the workflow (Pereira et al., 2009), providing no collaboration mechanism, distribution and reuse support. This additional data, i.e., workflow versions, associated workflows, related experiments, and results are important for the publication of the experiment.
In this context, the objective of this work is to treat configuration management of scientific workflows throughout the experiment life cycle, based on the maintenance, evolution, and reuse of experiment´s data to improve the experimentation process and its use in other related contexts. Since each phase of the scientific experiment cycle presents specific tasks, and each modification on the execution of a task generates new versions of the workflow (Sirqueira et al., 2016), we consider this control essential for the proper execution and control of a scientific experiment. This article details the E-SECO ProVersion approach, which extends the E-SECO ecosystem (Freitas et al., 2015), to control and manage scientific workflows related to a given experiment, using provenance data and ontologies. In this vein, the research question can be defined as: Is E-SECO ProVersion architecture capable to derive maintenance and evolution information from experiments and related workflows?
Considering Figure 1, which details the experimentation life cycle of the E-SECO ProVersion approach, the configuration management is performed by the module “Configuration Management”, which encompass the whole process.
Figure 1. Experiment life cycle in E-SECO ProVersion approach