Multi-Model CMMI® Appraisals – Factors to Consider

In the last few months, I have been frequently asked the question, “Should we do our DEV and SVC appraisal as a single multi-model appraisal?” This question is posed usually by large IT organizations in India. These organizations have already been appraised at ML5 of the DEV model (maybe more than once). And they now are on the verge of their first SVC appraisal in 2012. I guess the issue of multi-model appraisals will become more important in another 1-2 years, when the next round DEV and SVC appraisals are due for many organizations.

Well, the answer is “it depends”:-).

In this note we will try to understand the factors to consider (elaboration of what “it depends” on), so that you can take them into account when you face the situation. This note has been put together with a large dose of inputs from D Sankararaman, Mukul Madan, and V Seshadri. These were validated by Channaveer Patil and Dan He. However, they are not responsible for any errors that may have crept into this note.

Multi-model appraisals are covered in detail in Appendix G of the SCAMPISM A v1.3 Method Definition Document (MDD) downloadable from here.

One appraisal team’s experience on a multi-model appraisal (SCAMPISM v1.2 completed in 2010) at TCS is shared in a SEPGSM 2011 presentation by Ron Radice, et al, is available here.

Disclaimer – this note is not definitive, nor is it an “official” position paper of any organization or lead appraiser. However, it may be considered as one of the inputs while evaluating the option of a multi-model appraisal.

The current queries for multi-model appraisals are typically arising from organizations wanting to do DEV+SVC together, and hence we will use that situation as an example in this note. However, multi-model appraisals could comprise any combination of two or more of DEV, SVC, ACQ and People CMM®, and the factors discussed in this note apply to the other situations as well.

Here are the factors to consider.

Organizational Disruption. If you are a big organization, you could have either two (or more) long organizational disruptions, or one mega-ultra-long disruption. The choice is yours :-) .

Number of ATMs. In multi-model appraisal you are likely to need lesser number of ATMs trained on the models. Assuming that you will try to keep a gap of a few months between the two appraisals (if done separately), the number of ATMs trained on the models may need to be higher, if you are doing the appraisals separately. During the interval between the two appraisals, the ATMs may resign, retire, go on leave, be allocated to some other useful work (assuming that they are still capable of doing some other useful work :-) ), or just refuse to be ATMs again (“not another appraisal as ATM!”). So instead of training a bunch of 10-12 people on the models, you may have to train a higher number if you are doing the appraisals separately.

LA/ ATL requirements. For a multi-model appraisal, you will need to engage a lead appraiser appropriately certified as SCAMPISM-A LA for all the models (constellations, actually) covered in the appraisal. Therefore, the choice of LAs on multi-model appraisals may be significantly lower, especially if your appraisal is “high-maturity” (ML 4 or ML 5).

LA Willingness. The calendar time for the on-site activities for a multi-model appraisal is definitely going to be much higher than for a single model appraisal (this is also discussed as a separate factor later on). LAs may not be willing stay away from their families, pets and home city for such a long time. Or they may demand a fat sum as hardship allowance :-) .

Sampling of Projects (or Workgroups). This does not change whether you are doing a single multi-model appraisal or two separate appraisals. If there were X projects selected for DEV and Y workgroups selected for SVC, then in the multi-model appraisal, the number of instances would be X+Y. Sampling will be done as if they were different appraisals.

Overall Effort. This is one area where there is a lot of misunderstanding. Note that the sample size remains the same (multi or otherwise). Hence, the effort for artefact collection remains similar, the effort for artefact review by the appraisal team is also similar and so is the effort for interviews and discussions. There could be some (a tiny bit) effort reduction in a multi-model appraisal due to the following:

  • Single batch ATM training (instead of possible two batches). However, one batch of ATM training can have a max of 12 participants, so with backup ATMs you may have to run two batches anyways, even for a multi-model appraisal.
  • Sponsor meeting (assuming the same sponsor for both the appraisals)
  • Opening meeting can be a single one instead of two
  • Some economies of scale (not a lot) on artefact collection, artefact review, interviews and preliminary findings for “Oh” areas – organizational PAs like OPD, OPF, etc. However, organizational PAs will have to be investigated from both (DEV and SVC) the contexts explicitly. So the saving would be more in terms of being familiar with the terminology, document architecture and names/ faces of people running the “Oh” processes, assuming that the people are the same in the DEV and SVC contexts.
  • General effort saved for the LA and ATMs due to familiarity with the layout of the office, the security procedure, the parking lot, the cafeteria food, the washrooms, the office furniture, the room freshener, the air-conditioning, etc. (this factor may be invalid, if the appraisal team has to constantly move across buildings and cities anyway).

The project-level (or work-group level) process areas will have to be investigated for each instance separately (either in the DEV or the SVC context). Since the sample size is going to be determined the same way (whether it is a two separate appraisals or a multi-model one), the effort to investigate instance level data is going to be same. This includes the effort for the preliminary findings (or equivalent).

With the above micro-savings, the overall appraisal effort savings (LA + ATMs) is likely to be in the range of 15%-20% (i.e., the total effort for a multi-model appraisal is likely to be around 15-20% less than the total effort for separate appraisals).

Calendar Time. With the large one-time effort for the multi-model appraisal, the calendar time for the onsite period is also likely to be higher (than that of a single model appraisal), because there is a limit to the number of ATMs that an LA can handle. Hence the ATMs will need to be out of their day-job for a longer period. The long drawn absence of the ATMs from their day job can be disruptive. A reported multi-model appraisal done had an onsite period of close to the upper limit of 90 days (see here).

ATM / LA Fatigue. This is where the multi-model (for high maturity, large organizational scope) becomes untenable. As the on-site period start crossing three weeks, the fatigue becomes obvious. In organizations that use standard processes, and have done this over many years, one can expect the documents to be similar. The responses during the interview sessions will also be similar.

For the ATMs, after the glamour of being ATMs, the novelty of PIIDs and Process Area Worksheets, and the thrill of FI-LI-PI-NI-(and NY, of course) wears out, it is an extremely boring, mind-numbing and dull exercise.

(Digression: This may be one of the reasons that LAs have become good storytellers and general entertainers. I know of a LA who sings to keep the ATMs entertained, another one tells jokes on a non-stop basis. Some LAs have started blogging. SEI may have to initiate a study to understand whether LAs have a higher tendency to ….whatever :-) End of Digression).

After around three weeks, the productivity, alertness, and eye for detail falls down steeply for the ATMs as well as the LA. The other issue is the stress on the ATMs of maintaining confidentiality. They cannot talk to their friends and colleagues, or smile at passing acquaintances, because they may be asked that dreaded question “and how are we today?” Those who have been ATMs know about this, others readers may ask their friends/ colleagues who have been ATMs to confirm this :-) .

Target Levels/ Results. For the purpose of the results (ML/ CL), the multi-model appraisal will deliver two results, two different sets of ratings. Your target rating could be different for the two models and the appraisal result rating will also be different for the two models. This is the same as doing two separate appraisals.

Novelty / Publicity Value. “Will we be the first to do a multi-model appraisal?”; “No?”; “Okay, can we be the first do to it in this country?”, “How about this city?” and so on….

Well, if your organization is looking to announce itself as the first in something, we can surely work out some combination of conditions that you will be the first in. Not just the first, but maybe the only one. Ever.

===========================================================================

Having said all that, are there any conditions where it may be worth considering a multi-model appraisal? Yes, if you have the following it may definitely be worth considering:

  • Low number of Process Areas (ML2 kind of stuff) in both the models, and
  • Small organization (number of sampled instances are likely to be low)

Under these circumstances, the number of ATMs for each separate appraisal would be low (say 4-5), so one can increase the ATM team size to 8-10 and run the multi-model appraisal in the same number of days as a single separate one. So the organizational disruption time and LA cost can be much lower.  Also, ATM training can also be done in a single batch (max batch size is 12).

Finally, the issue is a complex one, and let us conclude by saying once again that “it depends” and that you should consult multiple LAs and take their opinions before coming to any firm conclusion.

Also refer to:

Thanks a lot to D Sankararaman, Mukul Madan, V Seshadri, Channaveer Patil, and Dan He.

Please feel feel to share your views, experiences or queries, using the “comments” feature available at the top of this article/ post.

Notes:

Nothing Official About It! – The views presented above are in no manner reflective of the official views of any organization, community, group, or association.

® CMMI and CMM are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.
SM-SCAMPI and SEPG are service marks of Carnegie Mellon University.
LA- is a short form of SCAMPISM Lead Appraiser. (It is not a term of endearment like “da”, “pa”, “ma”, or “po” used in different parts of the world :-) ).
ATM – Appraisal Team Member (not an Automated Teller Machine :-) )

You may also be interested in the following posts uploaded on the same blog:

 

Hi,

If you like the posts on this blog and would like to be informed whenever a new entry is made, here is what you can do:

  • Scroll back to the top of the page
  • On the right hand side there is section called “FollowBlog via Email”
  • In the space provided, type in your email id and click on the “Follow Blog” button (give your personal id, since companies often block wordpress)
  • You will get an email at that email id

What to Expect in the new version of CMMI® for DEV Version 1.3

The first day (17th August 2010)  of SEPG Asia-Pacific 2010 conference covered the changes expected to the CMMI® models, as a part of  release of V1.3. The tutorial was conducted by Mike Phillips of the SEI and was attended by a large group of professionals (mostly from the IT industry).

Here are the key points that I have gathered and my reactions to some of the changes in the DEV model. The detailed presentation can be downloaded from here [Thanks Mike :-) ]

A summary of the changes are available in a presentation titled CMMI v1.3 – What’s New on slideshare.

Changes to the Generic Goals and Generic Practices (DEV Model)

1)    Generic goals 4 and 5 have been removed from the models. So, generic goals stop at GG3 for all process areas. [Reaction: Good. The material in GG4 and GG5 was very scanty, and could not be used to implement CL4 and 5 practices]

2)    No significant changes in the intent of generic practices, other than a few changes in the verbiage for a few generic practices (GP 2.6, 2.9 and 3.2) [Reaction: It would have been nice if some generic practices were merged, for example GP2.8 and GP2.10, to reduce the number of generic practices. Maybe in version 1.4 or later :-) ]

3)    In the model book (or technical report), the generic goals and generic practices are described just once at the start of the document and are not repeated for each process area  [Reaction: Slimmer book to carry, less trees to be chopped, nice touch]

Changes to the Maturity Level 2 Process Areas (DEV Model)

1)         Requirements Management (REQM) has been shifted to the Project Management category of PAs [Reaction: Makes no difference, except that there are no engineering PAs at maturity level 2. Imagine a maturity level 2 development company saying "we have great management and support practices, but our engineering practices may not be...."]

2)         Supplier Agreement Management (SAM) has been simplified. Two contentious practices of SG2 (erstwhile SP2.2 & 2.3), have been converted to sub-practices of other specific practices [Reaction: These two practices were often a source of grief to many organizations in their appraisal. Fantastic!]

Changes to the Maturity Level 3 Process Areas (DEV Model)

1)         The optional IIPD addition (one goal in OPD and one goal in IPM) has now been converted into specific practices in OPD and IPM (one additional practice each) [Reaction: This is pity, because IPPD has a great value. I would have liked to see more emphasis on IPPD with greater clarity, instead of IPPD becoming 2 practices in the whole of CMMI®]

2)         No other changes, other than changes in the language to bring in more clarity.

Changes to High Maturity Process Areas (DEV Model)

1)         OID has been renamed as Organizational Performance Management (OPM). A new goal has been added to align process improvements to business objectives and process performance data. [Reaction: Was always required. Though the change looks big, most high maturity organizations would already be implementing the requirement of this new goal. However, the new name "Organizational Performance Management" is an overkill and misleading]

2)         Quantitative Project Management (QPM) has been made tighter and the requirements are more explicit. No significant change in the intent of the process area.

3)         Causal Analysis and Resolution (CAR) and Organizational Process Performance (OPP) have undergone some changes in the verbiage, though nothing significant in intent.

Version 1.3 of DEV (along with SVC and ACQ) will be released in November 2010.

SCAMPISM-A appraisals using version 1.2 of the model can be conducted for period of 12 months after the release of version 1.3. Organizations aiming for an appraisal in the later part of 2011 should consider switching to version 1.3 right away.

The SCAMPI-A methodology is also undergoing an upgrade. The SCAMPISM methodology upgrade will be released slightly later. So, organizations could use the current SCAMPISM-A version 1.2 to appraise organizations for CMMI® version 1.3 for some time.

Will try and post about SVC and the expected changes to the appraisal methodology sometime soon.

What is (Project) Success in a High Maturity Organization?

Project success is measured by comparing the actual performance with what was budgeted, planned and committed – typically with respect to parameters of cost, schedule and quality. Projects that meet all parameters are considered completely successful, and those that meet some parameters are considered less successful. Projects that fail in most/ all parameters are labeled as failures. Of course, sophisticated systems may even use the extent to which they missed the objectives (near miss or missed by a mile/ kilometer) as a factor in determining the degree of success or failure.

Is this really how a high maturity (HM) organization (in terms of the CMMI® framework) should evaluate project success? I believe that the refinement in process and project management maturity should be used to fine-tune how we evaluate success.

A HM organization is “aware” that all processes have variations inherent in them. It “knows” that projects (that are composed of the processes) have a probability of achieving success in their objectives, but success is not guaranteed. The role of project management (esp. QPM) is to continually evaluate the probability of success and maximize the conditions to improve that probability.

When a single project goes through its life, those probabilities will play out. Which means that even if the probability of completing the project within its budget was 90%, a single project can overshoot the budget. Of course, if we run similar projects millions of times, only 10% of the projects will overshoot the budget; but we have only one project here. In such an “aware” organization, is the use of “actual budget compliance” a right way to measure success? If so, how is this organization different from a non-HM organization?

I believe that in a HM organization, project success should not be measured by after-the-fact results, but by the rigor and continual alignment of the project to maximize the probability of success. So, in a HM organization, a project is successful, if and only if:

*    The project, at start-up, consciously makes choices (composes the defined process, aligns plans) that maximize the probability of meeting its multiple objectives

*    The project continually evaluates the probability of meeting the objectives and revises its choices to maximize its probability of success

Now, in such an organization, the “best project” award may be given to a project which in the conventional sense has actually failed :-) – such an organization would be truly acting on the belief- “if we implement the process, the results will eventually follow”.

Your comments?

What comes first – SPC or a stable process?

An interesting topic, which has been discussed very often. In every discussion, people agree on what is right and what needs to be implemented. But in actual implementation the principles are forgotten. Therefore it is good to re-align ourselves to the basics time and again.

What is often seen in actual implementation of SPC (ineffective and incorrect implementation):

1)    A process is documented and used

2)    Data related to the process is collected

3)    When we need to do sub-process control (because we are aiming for High Maturity rating), an SPC chart is prepared.

4)    Data which are outliers are thrown out (root cause analysis is not possible, because the outlier data belongs belongs to a distant past, and the causes are lost in the mists of time)

5)    Control limits are recalculated

6)    Steps 4) and 5) are repeated till all (remaining) points demonstrate process stability

7)    The SPC parameters (center line, UCL/ UNPL, LCL/ LNPL) are declared as baselines and used for sub-process control. The fact that the limits are too wide or that a lot of data points were thrown out (without changing anything in the process) is ignored.

What we have in the above scenario is a maturity level 2/ 3 organization using maturity level 4 tools. Usage of tools alone does not increase maturity. We cannot create a stable process through the use of SPC, we can only confirm the stability of the process through SPC and get signals when the process is out of control or shows changes in trends.

The More Effective Implementation of SPC:

1)    A process is documented and used. As the process is used, variations in the interpretation of the documented process are qualitatively analyzed. Actions are taken to augment the process definition, training and orientation till the interpretation and the qualitative understanding of the process is consistent.

2)    Process compliance audits (PPQA audits) on the implementation of the process identify more actions that need to be implemented to fine-tune the definition, training and orientation related to the process.

3)    Once the audits show consistent compliance, data related to the process performance are collected. Integrity of the data is checked and the data collection process is streamlined and consolidated- till the collected data demonstrates the required credibility

4)    Now we start looking at the data somewhat quantitatively (without using full SPC) – does the trend chart show stability? Is it showing too much dispersion/ variation? Based on the findings, the definition, training and orientation related to the process is refined further

5)    This is point we start using SPC charts to confirm process stability. Each inflection of instability is analyzed. Corrective and preventive actions are identified to further standardize the process, based on analysis of past instability. Once we are sure that causes of those inflections are removed, we can remove the points from the analysis.

6)    We are still left with points which show instability, and our CAR analysis tells us that some of the causes are truly extremely rare events. These are then removed from the data pool. Now all the remaining points are a part of the process. If the process still shows instability, then we can do further analysis – are these really part of a single process? Beneath the surface, are there two or more processes, and we need to separate out the data (e.g., the process may behave differently in the “performance appraisal season”? :-) )

Having followed all the above steps, we now have a basis (and hence baseline) for an effective implementation of SPC.

Remember: We cannot create a stable process through the use of SPC, we can only confirm the stability of the process through SPC.

Size Does Matter! (for baselines and sub-process control) -Continued

Let us take the example of  examination/ test centers, that run an exam throughout the year, every day. Past one-year data shows – 30% of the candidates pass the exam and 70% fail the exam, all over India.

The Bangalore test center handles around 1000 candidates per month, whereas the Mysore center handles around 100 per month. Over the last one year, both centers have shown the same 30 pass: 70 fail ratio.

For the month of June 2010, one center has reported 38% pass and another has reported 29% pass. Which center (Bangalore or Mysore) is more likely (has a higher probability) to have reported 38%?

Well, Mysore is more likely to have the higher deviation from the average (+8%) than Bangalore (-1%), because Mysore, handling lesser candidates, has a lesser number of opportunities to “average out”. An easy way to figure this out is to take the case of a center that handles only 1 candidate. This center can have either 0% or 100%  pass percentage; a -30% to +70% deviation from the average.

Let us now get back to the process performance baselines that we create and the way we do sub-process control. Here are some things that we need to keep in mind while creating, publishing and using baselines:

1) Baseline (mean and standard deviation) for a sub-process parameter (like coding productivity) will be different depending on whether we consider each the coding phase of each project as a data point, or we consider each program coded in each project as a data point. The standard deviation in the first case (large base) is likely to be smaller than the second case (small base).

2) When we publish performance baseline data, we need to qualify it with the level of detail at which it applies.

3) When we use the baseline data to do sub-process control, it needs to be applied to the same level of detail. So, to do sub-process control on program level coding productivity, we need to use the baseline that was created using programs as data points (not each projects as a data points).

4) Baselines need to be created using similar situations of the base data. For example, we cannot combine the coding productivity on large programs with the productivity on small programs. Even if the average/ mean remains the same, the standard deviation will be higher when we take data from a smaller base as against a larger base.

The above points are not just “nits” but have an impact of the usefulness of baselines and sub-process control. Incorrect usage of baselines leads to incorrect displays process instability / stability.

Size Does Matter! (for baselines and sub-process control)

Here is a small brain-teaser.

Let us take the example of a examination/ test centers, that run an exam throughout the year, every day of the year. Analysis of the past one-year data shows that 30% of the candidates pass the exam and 70% fail the exam, all over India.

The Bangalore test center handles around 1000 candidates per month, whereas the Mysore center handles around 100 per month. Over the last one year, both centers have shown the same 30 pass: 70 fail ratio.

For the month of June 2010, one center has reported 38% pass and another has reported 29% pass. Which center (Bangalore or Mysore) is more likely (has a higher probability) to have reported 38%? Why do you think so?

See my post dated August 3, 2010 for the answer and implications.

Generating Lots of Data through Monte Carlo (a misuse?!?)

I have seen the metrics groups of organizations generating “enough” data for creating process performance baselines, from very few available data points, using Monte Carlo simulation.

Here is the method they use: Ten data points are available; using the pattern of the ten data points, they generate a thousand (or maybe a million) data points using Monte Carlo simulation. Now they feel that they have enough data points to generate a baseline.

But in reality the baseline has been generated using 10 data points. The 1000 data points only give a feeling of having lots of data and this is clearly a misuse of Monte Carlo simulation.

Normal Distribution is Actually Rare

When we often use statistical analysis tools and techniques, the underlying assumption is that process/ sub-process displays a “normal” behavior. Even if the limited data that we have shows non-normal behavior, we assume that the reason is the lack of data, and we approximate the distribution to normal.

This assumption and subsequent analysis, conclusions and decisions are therefore inaccurate, especially if we are combining “assumed” normal behavior across multiple processes, viz Process Performance Modeling.

“Normal” behavior is very rare in real life. For example, you travel from your home to office, let us say usually in 1 hour. The least time you have ever done the trip is in 30 mins. If the distribution was normal, the worst time should have been 1 hour 30 mins (symmetrical on both sides). You will find that on some days that you were delayed, the time could have been 2 or even 3 hours!

Another way of saying that real life does not behave in a “normal” way, is “there is a limit on how well you can do, but no limit on how badly you can screw up!”

There is more on this in the books “Fooled by Randomness” and “Black Swan” by Nassim Taleb — must-reads for anyone involved in high maturity CMMI® implementation.

Follow

Get every new post delivered to your Inbox.

Join 155 other followers