[kplato] Some code for KPTProject::getDuration()

John D Lamb kplato@kde.org
Sun, 19 Aug 2001 19:03:52 +0100


Some comments:

> Hmm? Doing a rm *.o and an make install takes a whopping 30 seconds
> here, is it so slow for you?

No. it's faster. I wanted some simple test code to show how getDuration
could work and illustrate some of the potential problems.

> Well anyway, it does not compile for me, so I was stuck reading code.
> (the error: undefined reference to `cout' tells all ;)

Curious: it compiles for me. I don't really believe you couldn't guess
the right #include directive. The cout is there to test the code.

> Have you used QT? The reason I use it is because it solves all sorts of
> problems the STL has, not the least being unicode.
> If liking is the problem, use a stripped-down version of QT, its only
> 400Kb

I've used QT, mainly for adapting programs to run with a simple
one-dialog GUI. I like it, but I'm no expert.

Actually, I don't see unicode as a problem because it's not used for
calculating durations etc and so could be put in a derived class. It's
QList and QDateTime that are more difficult to avoid. Although I'd
prefer the calculations to be fully QT-free, I can't think of any
practical way of doing it and I know QT well enough to see that there
are huge advantages in using these classes. On the other hand, it's easy
(if occasionally a little tedious) to use these QT classes with the STL
stuff for
doing calculations. I can implement PERT/CPM, resource allocation etc
without STL, but it is ideally suited to the kind of structure (trees,
digraphs) of the project nodes and for writing robust, efficient
algorithms quickly and accurately. In short STL is good for PERT/CPM
because it uses an algorithm rather than a formula. 

There are other areas where we should expect STL to work well. Even the
apparently simple task of checking that a node dependency make sense
(i.e. if we have P before Q and Q before R, we need to warn the user of
specifying R before P) is much easier in STL: we need to check that R is
not going to be reachable from itself and identify the nodes of any
paths from R to P. You can do this without STL, but that's rather like
trying to write a GUI in X when you could use QT.


> > Second, I haven't yet understood how kplato
> > will deal with activities that have already finished in a project.
> 
> A node has a starttime and an endtime.
> If not started, starttime is 0, if not ended endtime is 0

I think there is rather more to it than that. Consider nodes A, B, C, D
with time dependencies A before B and C and B and C before D and fixed
durations 5, 10, 5 and 5 days respectively. The relative start times
would be (A) 0, (B) 5, (C) 5, (D) 15 assuming all nodes start as early
as possible (more later). My question is if (say) Activity A started on
November 1 and today is November 3, how do we represent start times? At
very least we need to know the current date and which nodes have
started/completed.

> > Third, functions like getDuration will have to call the basic PERT/CPM
> > routines many times if we want a Monte Carlo simulation for the likely
> > outcome of a project.
> 
> But a PERT/CPM duration call will always return the same for a leave node,
> right?

getDuration() won't return the same value each time if we use random
numbers, even for leaf nodes. Maybe this is a problem that could be
deferred till later. If we use subprojects then there is an efficiency
issue. PERT/CPM on the project needs the duration (as a constant value)
of each node at least twice; it's not efficient to use a function call
for this for subprojects. One solution is to store the fixed values as
hidden data in the node.

> If I read your code well, the only thing PERT does is put all tasks
> head to tail and then count the duration of the task.
> It naturally accounts for things like dependencies, risk,
> availability of resources etc.

I'm not sure that this isn't an oversimplification. The PERT/CPM code
(implicitly) finds a critical path. Then the duration is just the some
of the critical path durations. (Actually there's a subtle problem here
- the critical path may not be unique - but the code already deals with
this correctly. Dependencies are dealt with, risk and resources are not
(yet).
 
> What I think we should do is the following
> - only leaf nodes can have an actual duration (the rest only ask the children
>   how long they take to complete)

I think that's right

> - A leaf node can simply return the duration it has set as expectedDuration.

Ok.

> - A leaf node can simply return the duration it has set as randomDuration.

Ok, but we need to be a little careful how we use the randomDuration.
PERT/CPM can only work safely with one value at a time.

> - A leaf node has resources and associated risks as well as start-end-time
>   dependencies which define the distribution which has to be used in
>   parent nodes to calculate the duration.
> - a non-leaf node has to return an expectedDuration based on PERT.

Ok.
 
> but also:
> - a leaf node either has a start time set (which can be 0) or has a start
>   dependency on another node. The last one can be calculated by asking
>   the dependent parent its end time and using the lag of the kptRelation.
>   This means that a non-leaf node can _not_ have a start dependency, right?

Probably these should be earliest start and latest finish. In the
example above, Node C had duration 5 but earliest start 5 and latest
finish 15. It could start any time between 5 and 10 days. The choice
might depend on availability of resources. PERT/CPM gives the earliest
start and latest finish, but also the total float and free float - maybe
these could go in a struct.

If a non-leaf node represents a subproject, it ought to have start
dependencies just like the leaf nodes in the project.

> - for a leaf node: end time = start time + duration ;)

earliest end = earliest start + duration. The actual time chosen to
start an activity need not be the earliest possible time.

> - a non-leaf node can tell its start time by asking all children their
>   start times and returning the smallest one.

True, but there is a more efficient implementation if the non-leaf is a
subproject (I don't know if it could be anything else) because PERT/CPM
applied to a project would effectively have to calculate an earliest
start for the subproject just to calculate everything else.

> for non-leaf nodes the expectedDuration is calculated as:
> - taking the startime of this leaf (as defined a number of points back)
>   taking all the children nodes and asking them their end time
>   (starttime + duration) and taking the largest end time (last one)
>   end time - start time gives expected duration.

Again true, but probably not the most efficient way.

There is another subtle issue with expected durations if the duration is
not fixed: the expected duration of the project is in general greater
than the sum of the expected durations of the critical path nodes. As an
example, if the duration of the node C is not fixed at 5 days but
exponentially distributed with mean 5 days, then the expected duration
of the project would become nearly 21 days.

This is one reason for using a pessimistic calculation for how long an
activity might take.

The alternative method is to use Monte-Carlo. The general problem is not
mathematically tractable.

> 
> As you see I made so many points it gets boring. This is not by excident but
> by design, this is also the design in kptnode, and almost each of the posted
> points above is one method call.
> Therefor the above has to be implemented to do the PERT calculation correctly.
> I am willing to do that, but first I would like to hear opinions if the above
> is correct.
> 
> ps. resource allocation is still missing from the above..

I think it would make sense to deal with resource allocation later.

> 
> Thanx John, it was a really good learning program, just not for prime time
> usage ;)

I'm glad you spotted that one ;-)

Where do you want to go next?
JDL