Agile Nuts: Revisiting the Definition of Done

It's good to get into the habit of revisiting, and to some degree, challenging the status quo as part of the continuous improvement process promoted by lean and agile methodologies. Sometimes this can be difficult, or even scary, depending on the kinds of changes being introduced, but these are often the things that can elevate development teams to new heights of productivity and effectiveness. We went through one of these types of changes recently, so I thought I'd share some of the experiences and the results that were attained by revisiting and redefining one team's definition of done.

After months of preparation, the our development team recently split the single team consisting of six developers and two QA analysts into two separate teams consisting of three developers and one QA analyst each as an attempt to scale the development process. However, shortly after that was announced one of the testers was pulled into a separate project for various reasons, which left one of the teams without a dedicated tester. At first this situation left some in the organization concerned, including me, but after giving it some thought, I started to see this as an opportunity rather than a setback: this was the perfect opportunity to try a small experiment regarding how software is written in our shop. Rather than re-merging the two teams back into one, we continued with the two team approach. In order to succeed, the team without the dedicated tester would now have to adopt some of the practices of agile development that had been neglected for various reasons.

The Old Normal

Historically, the definition of done of a given story for the development group primarily consisted of coding and unit tests passing (sometimes, as this was often times left to the very end of the entire development cycle). Once those two items were finished, the developers moved on to the next story, and the story just coded was the QA/tester’s problem. This “definition of done” resulted in a few things (none of them good):

a large queue of incomplete/un-shippable work as QA analyst could not keep up with the pace of the developers

inefficiencies if a bug was found in testing, due to context switching, as the development team had to come back to the previous story, meanwhile adding to the queue of incomplete work as the development had already started on the next story

code could never be promoted to our UAT environment until all the incomplete work was cleared out, which led to delays in letting the product owner and customers interact with the new features, often times only giving them a week or so to try out 40 new features and/or changes.

For the reasons above, the entire group had been encouraged to try to get to more of a classic agile develop-test cycle rather than following the mini-waterfall that had evolved due to some of the traditional QA practices that carried over from the previous waterfall process within the organization. Due to the mini-waterfall that had evolved, it was normal to be operating within a cycle that allotted four weeks for coding and four weeks for testing in an overlapping manner; the result was coding would start on week 1, testing of the first week’s code would begin on week 2, and the last week of coding would end on week 4 with testing ending on week 5 and the developers usually writing any necessary documentation and then coming back to finish up unit tests.

Once communicated that we were still moving forward with the two team approach, the team of three people, we’ll call them Team 1, had to rethink how they were going to code and test their own work. They had to essentially redefine their definition of done.

The New Normal

Team 1 changed their definition of done for a story to consist of: coding, all unit tests passing, adopting a new branching strategy that called for a branch per story, merging code back into the main line within source control, deployment to the QA environment, executing all the user acceptance tests that had been identified to ensure they pass within the QA environment, and writing any documentation/wikis that may be necessary. With the new definition of done, Team 1 effectively eliminated the three problems that resulted from the previous definition of done and essentially gained an extra week of development since they were not constrained by the bottleneck that resulted from the mini-waterfall model.

The team consisting of four members, let’s call them Team 2, continued with the previous definition of done as it related to the coding and QA/testing aspects as the team dynamic between development and QA remained unchanged; if anything the QA bottleneck was eased as the QA analyst only had to worry about testing the work of three developers rather than all six. Our experiment now had our control group, Team 2, and our experimental group, Team 1; the results of Team 2 team provided a baseline to compare the results of the new practices Team 1 had adopted.

The Results

The measurement of productivity in an agile software development team is velocity, measured in terms of story points in our case. Prior to splitting up into the two teams, the group’s collective average velocity was roughly 60 points per sprint, or 240 story points per release (each production release consists of 4 sprints). Assuming the teams were split evenly, one could reasonably expect that each team’s velocity would be about even at 30 points per team per sprint. Taking into account that Team 1 would be operating with 25% less bandwidth, the expected Team 1 velocity could be calculated at 22.5 points/sprint (30 x 75%) from a pure mathematical/theoretical perspective. Story point estimates were arrived at by the consensus of both teams, so there was not any skewing of the story point data when it comes to comparing the velocity of both teams.

With Team 1 now operating in a manner that is more consistent with lean/agile concepts and practices of not building up queues of incomplete work via not moving on to the next story until the current story is considered “done” (coded, unit tested, deployed to QA, functional tested, and any documentation written), they could technically gain another full week. In our first release cycle with the two teams, the velocity for each of the teams broke down as follows:

Note: for comparison purposes, the extra week that Team 1 gained was evenly distributed across each of the four sprints

The total velocity for Team 1 and Team 2 teams was 151 story points and 176 story points respectively. Despite having 25% less bandwidth, Team 1 was only 14% less productive in terms of story points. This was also in spite of 0 points being delivered in sprint 1 which was primarily attributed to the amount and scale of changes being introduced to that team, a large portion of the first sprint being dedicated to answering support questions from production, and adapting to their new definition of done.

It is reasonable to project that the next release Team 1 could increase their velocity and come much closer in terms of velocity as Team 2 team for the overall release as they will not have to adjust to so many intrusive changes. Excluding the first sprint in which Team 1 was adapting to the many changes, they were more productive than Team 2 team; Team 1 completed 151 story points vs. 129 story points completed by Team 2 team in the last three sprints:

Team 1 was 17% more productive than Team 2 in the last three sprints

This analysis has not been meant to slight Team 2 team as both teams improved velocity compared to the adjusted average velocity coming into the release, evidenced by the table below. The small team approach proved to be very beneficial to both teams and the data lends credibility to the teams’ hypothesis that productivity would be increased by having smaller, independent teams. Comparing the projected sprint velocity to the actual velocity, the two smaller teams alone (without taking into account any of the process adjustments made by Team 1, it appears to have boosted velocity by 46%.

Team	Team Size	Projected Sprint Velocity/Productivity	Actual Sprint Velocity/Productivity	Difference	Differences Adjusted for Smaller Team Efficiencies
Team 1	3	22.5	37.75	168%	22%
Team 2	4	30	44	146%	0%

In the context of the more focused experiment, this has been a good example and proved that there is room for improvement regarding how the software development and testing process is executed. With Team 1 being forced to adapt and redefine their definition of done in order to be successful, this has been also been an example of what can be accomplished when not being afraid to take manageable risks by questioning the status quo and trying new things.

While probably not completely 100% accurate let’s assume, for the sake of argument, that the 46% gain in productivity by Team 2 team was entirely due to the efficiencies of having a smaller team as there is not any other data to say otherwise; adjusting the difference to exclude the efficiencies of the smaller team, the data can be interpreted such that a 22% increase in productivity was achieved by adopting the new process changes despite the large amount of time/effort expended towards production support in the first sprint. Should the 46% gain not be entirely due to the smaller team aspect and perhaps a portion stemming from some other source, then this would indicate that Team 1 saw even higher gains due to the process changes introduced.

The Conclusion

This experiment in determining how large an impact the definition of done has on a software project was constrained by real world, practical business needs, so the experiment could not be conducted in a vacuum by having the two teams build and test identical features. Taking those practical constraints into consideration and making all other things equal outside the process changes introduced n order to adapt to the new environment, we were able to infer from the data in our ALM tool that there is room for improvement and perhaps an overdue re-evaluation of how software is developed and tested within our organization.

The circumstances we happened to fall into lent themselves well to such an experiment by having a control group and an experimental group. The cohesiveness of the experimental group seemed to grow with each sprint which resulted in higher velocity/productivity gains with reduced costs and a testament to what can be accomplished if the manner in which we perform our work is re-evaluated and an effort is made to continuously improve that day to day process.

In this particular case, Team 1 was forced to adapt and challenge the status quo to mold a new definition of done and how they went about getting there. The independent, two team model brought with it many transitional challenges for both teams, but additional challenges for Team 1 had to be adapted to in order to have success due to not having a dedicated QA analyst. A lot of the success Team 1 had in adapting to these circumstances while maintaining a high level of productivity, comes down to the fact that the team worked together and was open and willing to try agile best practices that took them out of their comfort zone in an effort to boost their productivity and efficiencies. As an additional, sidebar that should be noted was that Team 1 was lacking senior members (in terms of title) that Team 2 team possessed, but working as a team and the changes they adopted more than made up for what Team 1 team lacked in titles.

In general, there are most likely efficiencies that can be gained by taking the next step towards adopting some of the practices advocated by the agile community that we have been hesitant and/or afraid of for various reasons (the practices may be non-intuitive on the surface, fear of change in general, general distrust, etc.).

Agile Nuts

Revisiting the Definition of Done

No comments:

Post a Comment