Machine Learning for Accurate Software Development Cost Estimation in Economically and Technically Limited Environments

Machine Learning for Accurate Software Development Cost Estimation in Economically and Technically Limited Environments

Mohammad Alauthman, Ahmad al-Qerem, Someah Alangari, Ali Mohd Ali, Ahmad Nabo, Amjad Aldweesh, Issam Jebreen, Ammar Almomani, Brij B. Gupta
DOI: 10.4018/IJSSCI.331753
Article PDF Download
Open access articles are freely available for download

Abstract

Cost estimation for software development is crucial for project planning and management. Several regression models have been developed to predict software development costs, using historical datasets of previous projects. Accurate cost estimation in software development is heavily influenced by the relevance and quality of the cost estimation dataset and its suitability to the software development environment. The currently available cost estimation datasets are limited to North American and European environments, leaving a gap in the representation of other economically and technically constrained software industries. In this article, the authors evaluate the performance of regression models using the SEERA dataset, which highly represents these constrained environments. This study provides insights into selecting regression models for cost estimation in software development. It highlights the importance of using appropriate models based on the specific software development model and dataset used in the estimation process. In the performance evaluations of eight regression models, including elastic net, lasso regression, linear regression, neural network, RANSACRegressor, random forest, ride regression, and SVM, for cost estimation in different software models, along with correlation coefficients and accuracy indicators, were reported. The results showed that SVM and random forest indicated superior performance. However, the elastic net, lasso regression, linear regression, neural network, and RANSACRegressor models also demonstrated exemplary performance in cost estimation.
Article Preview
Top

1. Introduction

Cost estimation is a critical aspect of software development (Rankovic, Rankovic, Ivanovic, & Lazic, 2021; Rankovic, Rankovic, Ivanovic, & Lazic, 2021; Mukherjee & Malu, 2014), as it helps in predicting the resources required for the project and ensuring that the project is completed within budget and on time (Pandey et al., 2020). However, estimating the cost and effort for different software development models can be challenging due to their unique characteristics and requirements (Boehm, 2017; Kumar et al., 2020).

Several essential features must be considered when estimating the cost and effort for different software development models. One of the most crucial factors is the size of the project, which refers to the number of software components or functions that need to be developed. The larger the project, the more effort and resources it will require, ultimately impacting the cost estimation (Saavedra Martínez et al., 2020; Mahmood et al., 2021). The project's complexity is another critical feature affecting cost and effort estimation. The complexity of the software model can vary based on various factors, such as the number of interrelated components, the number of decision points, and the level of customization required. Developing more complex software models will require more effort and resources, resulting in higher costs (Mahmood et al., 2021).

The development team's expertise is another vital factor when estimating the cost and effort for different software models. The level of experience, knowledge, and skills of the team will significantly impact the development time and cost. A team with more experience and knowledge can develop a project more efficiently, resulting in lower costs (Nassif et al., 2019).

The development process also plays a crucial role in cost and effort estimation. The development process can be iterative or sequential, and each approach has advantages and disadvantages. The sequential approach, also known as the Waterfall model, is more structured, which can help ensure that each development phase is completed before moving on to the next. In contrast, the iterative approach, the Agile model, is more flexible and adaptable, allowing for changes throughout the development process.

Finally, the software development environment also affects cost and effort estimation. The environment can include hardware and software tools, such as integrated development environments, version control systems, and testing tools necessary to complete the project. The cost and availability of these tools and resources will impact the cost estimation for the project. As a result, several important features need to be considered when estimating the cost and effort for different software models. These include the project's size and complexity, the development team's expertise, the development process, and the software development environment. An accurate model for estimating the cost and effort will ensure the project is completed within budget and on time, providing significant benefits to the development team and the organization.

We observed first-hand the challenges that local software development teams faced due to limited resources and infrastructure constraints. Accurately estimating costs was critical for project planning and management under these conditions. However, existing cost estimation techniques and datasets did not adequately account for the realities of working in such constrained environments. We were motivated to address this research gap and help improve cost estimation practices for software teams operating under similar limitations.

Machine learning plays an essential role in determining the critical features that affect the cost and effort estimation of different software models (Safari & Erfani, 2020; Holtkamp et al., 2015; Casado-Lumbreras et al., 2014). With the help of machine learning algorithms, large and complex datasets can be analyzed to identify patterns and relationships between cost and effort variables and the factors that affect them. By utilizing machine learning techniques, such as regression analysis, decision trees, and neural networks, it is possible to identify the most significant variables that impact software development cost and effort.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing