Part One looked at Velocity, what it is, how it gets abused and what the typical result of that is – and therefore the need for an alternative.
Part Two then considered what "agility" means, with three overlapping principles that we want to try and find some measures for.
Now we want to look at each of those three principles, and consider potentially useful measures we might try.
At it's most simple, I think this could be a combination of the "Value Early" part, which would be the elapsed time from when a team decides to focus on a problem area, to when they have developed a good enough MVP to test. Does it take 9 months to do an MVP? Or is it closer to 3 months? Let's call this Time to Market, or TTM.
TTM = Time to Market — Elapsed time in weeks from team starting to explore a problem space to when the first MVP is being used by Customers/Users.
If MVP takes 9 months, the TTM = 39
If MVP takes 3 months, the TTM = 13
We also need something to cover the "and Often" part. I'd suggest something like Release Frequency as a half decent measure of this. So, what is the elapsed time between releases? Is it quarterly? or Monthly? Weekly? Daily? Multiple times a day? Clearly the shorter the better, as a general rule.
RF = Release Frequency — Elapsed time in weeks between releases to customers/users. If Quarterly Releases, then RF = 13
If Monthly Releases, the RF = 4.3
If Daily (weekday) Releases, then RF = 0.2
I'm not sure how to combine these two parameters, but adding them together doesn't seem logical. I'm gonna multiply them for now. So…
Value Early and Often, VEO = TTM x RF
So, let's imagine how the conversation between a Senior Exec and, say, a Delivery Manager might go with this:
Exec: "So, what's our Value Early and Often Score?"
Delivery Manager: "Well, we've seen an improvement over the last quarter. Our TTM has gone from 39 weeks down to 26 weeks. This is mostly from doubling our Release Frequency which was Quarterly previously. We are now down from 13 weeks to a release to production every 6 weeks. There's been a huge effort in making releases cheaper and Continuous Delivery to make this possible."
Exec: "Sounds good – what's the overall Score now?"
DM: "From 507 down to 156. A 70% improvement!"
Exec: "That's amazing. We should celebrate that achievement. Do you need anything from me to get it down even further?"
DM: "Not really. But we have already plucked a lot of the low hanging fruit. To get it down further we really need to invest in improving the quality and coverage of our Unit Testing. For that, the teams have suggested that we increase capacity allocated to this to 20%. We're shooting for TTM of 13 weeks and RF of 4 weeks, which would bring us down to 52."
Exec: "If that's what they suggest, we should look at what impact that might have on various roadmaps. I'm happy to signal to the Product Management community that this is important – but the decision is really up to the individual teams"
With me so far? Let's look at the 2nd of the three principles and see where we end up…
This may seem like a repeat of TTM, but I'm assuming that the MVP is a bigger batch. What I'm interested is how long it takes for an individual Feature (of "story" if those represent something of value to the user/customer). We're looking now not at the batch, but how quickly one item in the batch goes from backlog to done.
For this, we can use the fairly standard definition of Cycletime, but ideally we would make this from End-to-End. The other thing to avoid here is measuring how long each part takes. If teams are building car doors, but not integrating those doors with the rest of the pieces needed to deliver and increment or iteration or information that is valuable it's not really "end to end" in my view.
For many orgs, you can get a fairly decent dataset on this by looking at Jira Control Charts. Or, just timestamp when an item is pulled from the backlog to when it's "done". It's quite important that this also include the time spent in the last mile of development, from "code complete" to when it's fully integrated and considered good enough to ship.
CT = End to End Cycletime. Elapsed Time in days from pulling a "Ready for Dev" story or feature into WIP through to "Done Done" i.e. to production-level, ready to ship Quality.
If Scrum with 2 week sprints, should be less than 14 days.
We could make this more complicated by looking at Mean Time to Recovery and a load of other useful metrics, but for now let's just keep it simple.
End-to-End Flow, E2EF = CT
Again, let's imagine how the conversation between a Senior Exec and, say, a Delivery Manager might go with this:
Exec: "So, what's our End-to-End Flow Score?"
Delivery Manager: "Well, to make the more frequent releases possible we've had to improve our Continuous Integration setup, shaving off half of the "last mile" to get to a production-like environment. We also added two UX designers where we had queues building up. From this and other improvements that have come out of team-level retrospectives, our E2E story cycletime has gone from a little over 4 weeks down to under 3 weeks for 70% of stories."
Exec: "Sounds good – so a drop from 28 down to 21 days?"
DM: "Yeah – there's of course some variation in that, but thats the trend for ~70% of stories"
Exec: "Understood. A 25% improvement is pretty good. What's next on this?"
DM: "The teams think they can maybe get this down to under a fortnight. The bottleneck for most teams has shifted from downstream to upstream – so we're starting to look at our Definitions of Ready to see if we can tighten that up to smooth the flow through the teams but without shifting more work upstream."
Exec: "Perfect. Again, let me know if there's anything I can do to support that."
So now we have covered two of the three principles in some way shape or form. I'd argue this last one is perhaps the most important though, so stay with me…
This one is more tricky. We're in the realms of SNR: Signal to Noise Ratios, False Positives and False Negatives. Test Pyramids and Broken Windows Theory. Some feedback loops contain almost no information whatsoever.
Good quality feedback loops are also nested, so if a quality problem makes it through an earlier feedback loop without being picked up, hopefully one of the many broader outside loops will catch it before a user or customer is affected.
With all those caveats, how might we objectively measure fast feedback loops? We also want it to be relatively simple – we're competing with "velocity" on the simplicity scale after all. So, what's a half-decent starting point?
How about if we chose three fairly common feedback loops and used the cycletime for each of those – measured from when we first start working on something to when we get some sort of feedback loop relating to quality that would tell us whether we are likely heading in the right direction?
We of course already have a couple of key feedback loops covered in the above measures (TTM is the speed to feedback from users/customers, and CT is the speed of feedback for an individual feature or story). What nested feedback loops nested inside those two might be something we could objectively measure?
Here's a "starter for ten" set of three that might be worth trying?
FL^3 = the cycletime of three nested feedback loops: From Pull of Story ->
a) UT: local Unit Tests to run (<1 day?)
b) SIT: System Integration Tests (< 5 days?) and
c) DSR: Demo/System Review (< 14 days?)
Again, I'm not sure how we might combine these three, but let's say we multiply them together? Measured in Days? (Really not sure about this!) I'm also going to leave out, for simplicity's sake, any measure of how often the team reflect on their own ways of working (AKA Retrospectives) as a core feedback loop for continuous improvement for the team itself rather than the Product they are working on. If we were to include it, it might be a fourth loop? Seems too easy to game that one though…
Fast Feedback Loops, FFL = UT x SIT x DSR
So how might the conversation go on this topic?
Exec: "So, the Value Early and Often and E2E Flow Scores are showing improvement. What about our Fast Feedback Loops Score?"
Delivery Manager: "Glad you asked. I've already mentioned the investments we've made in speeding up Continuous Integration. The feedback loop from Start to SIT Completed is down from 4 weeks to less than 2 weeks on average. Unit Test are now running 50% faster too, so that's down from nightly to half a day from when we start a story. It's not reflected in the Score, but we've put a huge effort into refactoring broken tests and improving the SNR for the tests that we do have. We can now go from a broken build to a fix much faster, since we have better logging of bugs when they arise. Time to Demo hasn't changed – we're still doing these once a fortnight."
Exec: "OK – so what the Fast Feedback Loops score?"
DM: "From 392 (=28 x 1 x 14) down to 98 (=14 x 0.5 x 14)"
Exec: "That's great. This is so important – what's the next steps on Fast Feedback Loops?"
DM: "Some of the teams feel they are ready to experiment with weekly Demos. We're trying to convince some stakeholders of the importance of early feedback, but there's some grumbling about this taking too much time."
Exec: "OK. I'll have a chat at my next weekly with my peers and explain the importance of fast feedback. With the improvements in Release Frequency and Time To Market, I'm sure we've earned the right to ask for them to cooperate – at least until they see the benefits for themselves."
I could probably write a whole book on the importance of Fast Feedback loops with high quality Signal to Noise Ratio, and different ways to measure quality in these terms. Until I do, go and check out Steve Smiths book "Measuring Continuous Delivery".
Ideally, we wouldn't combine these. I'd say each measure is worthy of attention separately, and combining them not only combines Apples and Oranges (not to mention potentially very different units) together in strange ways, but it dilutes and oversimplifies. If I was forced to combine, again, I'd probably multiply. If you wanted a higher number to represent "more agility" then you'd just take the inverse, something like this…
"Value, Flow, Feedback" Score = 1,000,000 / [VEO x E2EF x FFL]
Using the examples above (which may seem low to some of you – but you have to meet people where they are!) the overall score would look something like this:
VFF(Before) = 1,000,000 / [507 x 28 x 392] = 1/ 5,564,832 = 0.18 (Yeah, pretty low, right?)
VFF(After) = 1,000,000 / [156 x 21 x 98] = 1/ 321,048 = 3.1 (Better!)
VFF(Goal?) = 1,000,000 / [52 x 14 x 49] = 1/ 35,672 = 28 (Much Better!)
How could teams use this? Well, for starters, baseline where you are today. For each, what improvements could you make? This way you could have an objective, consistent measure – and a way to communicate where you have come from, where you are today and what your goal is. I'd argue this would be better than the dozens of different qualitative Agile Assessments I've seen – which are typically based on some judgement about whether a team has adopted a particular set of agile practices, whether those are working or not.
I could probably write another whole blog post on how each of these could and are likely to be misused and abused. Not least of which is the likely application of targets to each of them. To which the economist Charles Goodhart would say, "When a measure becomes a target, it ceases to be a good measure."
Would "VFF Score" be worse than "Velocity" though? Probably not.
Thoughts? @ me! Joshua J. Arnold.