Data 447425 Mechanical 54 $ 168,183,744 $ 15,518,911

Data Collection

The
datasets used in this research are collected from two previous studies, Leonard
model (1988) 6 and Moselhi et al., (2005) 8. 123 data points are generated by
combining these two datasets that can be considered sufficient data in
quantification loss of productivity domain and Table 1 shows the distribution
of the combined dataset.

Best services for writing your paper according to Trustpilot

Premium Partner
From $18.00 per page
4,8 / 5
4,80
Writers Experience
4,80
Delivery
4,90
Support
4,70
Price
Recommended Service
From $13.90 per page
4,6 / 5
4,70
Writers Experience
4,70
Delivery
4,60
Support
4,60
Price
From $20.00 per page
4,5 / 5
4,80
Writers Experience
4,50
Delivery
4,40
Support
4,10
Price
* All Partners were chosen among 50+ writing services by our Customer Satisfaction Team

 

Table 1: Distribution of Combined Dataset

Type of Projects

Number of
CO’s

Value of
Original Contract

Value of
Change Orders

Original
Estimated Hrs.

Actual Hrs.

Cos Hrs.

Electrical

37

 $   
91,984,837

 $ 42,530,607

1395330

2324107

447425

Mechanical

54

 $ 168,183,744

 $ 15,518,911

1815085

2878130

427145

Architectural

5

 $      
6,410,000

 $      
914,273

95280

128787

17116

Mech./Elec.

5

 $   
30,552,000

 $   6,452,000

883430

1190742

143650

Civil

22

 $   
42,538,755

 $  
9,323,214

691136

1161878

190958

Grand Total

123

 $ 339,669,337

 $ 74,739,006

4880263

7683645

1226294

       

Research Methodology

The
developed model for data non-linear regression has several steps. First step is
data preprocessing and enhancement, then use the refined data for feeding into the
developed nonlinear regression model. The last step is to compare and report the
generated results of the developed model with other existing models against a
case study. Figure 4 shows the general overview of the developed model.

Figure 5: General Overview of Developed Model

 

Data
Preprocessing and Enhancement

The
combined dataset has 14 unique parameters with diverse types and scales, namely
type of impact, type of work, original duration, actual duration, extended
duration, original estimated hours, earned hours, actual hours, number of
change orders, frequency, change hours, schedule performance index, average
size, and % of change orders. The values associated with these parameters are
not comparable since they are not aligned. Thus, the process of aligning the
dataset starts off by reordering the big values in the dataset such as actual
and original estimated hours. The pseudocode associated with the aligning
process is as the following.

 

Table 2: Algorithm for Pseudocode for Aligning
the Given Dataset

input = dataset;
int ratio = 100;
int aspect_ratio =
1.25;
m, n = input.size();
for (int i=0; iratio){
foreach(tuplei)
itemi, j = itemi,
j /max(item:,j);

 

In
Table 2, the aspect ratio is set to 1.25. This value is achieved by grid search
methodology and is dependent to the given input dataset and by changing the
given input, this value should be updated as well.

As
a second step for enhancing our dataset, an augmented apriori-like algorithm is
used to maximize the margin around the features especially the ones which are
so close to each other in terms of value. This algorithm firstly finds the
local and global extremum values for scaling the records up. Then, assumes
that, there are arrows drawn from origin to the records with respect to the
extremums. The functionality of these extremums for arrows is setting a knot
that nonlinearly bias them. In other words, the values will be mapped to
another space all the records are represented by arrows and knots. Finding the
maximum margins between these arrows is an easier job by solving the Jacobian matrix.
After all these computation, some hanger values will be generated that their
tensor product with the original records will maximize their cartesian distance
and finally will help the regression algorithm to tune its parameters. Specifically,
for the records with percentage values such as extended duration feature, the
centroid corresponding is computed to all the values along that feature. For
this aim, the gaussian distribution approximation is utilized to find the best
statistical expectation and ideally set to zero and looking for the proper
standard deviation (SD). Finally, 0.25 is reached as mean value and 1.24 as the
SD value. If it is assumed that each row of the dataset is a 14-D vector in
non-cartesian space, then should be able to find its basis vector using
algebraic theorem like Cholesky. The given rank of this factorization will give
us the degree of the nonlinear 6 degree of freedom (6-DOF) to be solved by
Jacobian. Finally, each row is not consistent can be replaced to the mean of
the rows, by the approx. 6-DOF polynomial. For the current dataset, this
technique is applied for tuple 64, 57, 87, and 110.

 

Nonlinear Regression

There
are several ways for finding a polynomial curve that represent the data as
smooth as possible. Though linear regression is a fast and accurate method for
balanced and normalized dataset as created in the previous section, its
functionality varies from dataset to dataset. The simple following rule is for
the processed dataset:

Equation
1

Where
denotes
the hypothesized line that we would like to achieve it and  is the given input. Based on the achieved
results, the RMSE associated with this algorithm was about 21.32% which is
quite high. The next step after linear regression was its nonlinear
counterpart. The common approach for handling nonlinear regression is
approximating it by piecewise linear function. In other words, since in
nonlinear regression the achieved function is no longer a line, thus
nonlinearity will be implemented by several linear functions. Regarding our
implementation, this approach will result in the RMSE value of 17.34%.

The
nonlinear regression can be articulated by formulating the nonlinearity with
bunch of nonlinearity. Firstly, the dataset is patched into seven 2×2 patches
(this is the total number of given features in the dataset) and assigned a
nonlinear sigmoid-like function on each of them. Formula (2) depicts this
nonlinear function.