Efficiency

Cycle Time Correlations

Improving flow is a key goal for nearly all organisations. More often than not, the primary driver for this is speed, commonly referred to as Cycle Time. As organisations try to reduce this and improve their time to (potential) value, what factors correlate with speed? This blog, inspired by the tool DoubleLoop, looks at the correlations Cycle Time has with other flow-based data…

The correlation of metrics

A few months ago I came across a tool called DoubleLoop. It is unique in that it allows you to plot your strategy in terms of your bets, the work breakdown and key metrics, all onto one page. The beauty of it is that you can see the linkage to the work you do with the metrics that matter, as well as how well (or not so well) different measures correlate with each other.

For product-led organisations, this opens up a whole heap of different options around visualising bets and their impact. The ability to see causal relationships between measures is a fantastic invitation to a conversation around measuring outcomes.

Looking at the tool with a flow lens, it also got me curious, what might these correlations look like from a flow perspective? We’re all familiar with things such as Little’s Law but what about the other practices we can adopt or the experiences we have as work flows through our system?

As speed (cycle time) is so often what people care about, what if we could see which measures/practices have the strongest relationship with this? If we want to improve our time to (potential) value, what should we be focusing on?

Speed ≠ value and correlation ≠ causation

Before looking at the data, an acknowledgement about what some of you reading may well be pointing out.

The first is that speed does not equate to value, which is a fair point, albeit one I don’t believe to be completely true. We know from the work of others that right-sizing trumps prioritisation frameworks (specifically Cost of Delay Divided by Duration — CD3) when it comes to value delivery.

Given right-sizing is part influenced by duration (in terms of calendar days), and the research above, you could easily argue that speed does impact value. That being said, the data analysed in this blog looked at work items at User Story/Product Backlog Item level, which is difficult to quantify the ‘value’ that brings.

A harder to disagree with point is the notion that correlation does not equal causation. Just like the biomass power generated in the Philippines correlates with Google searches for ‘avocado toast’, there probably isn’t a link between the two.

However, we often infer in working with our teams about things they should be doing when using visual management of work. For some, these are undoubtedly linked, for example how long an item has been in-progress is obviously going to have a strong relationship with how long it took to complete. Others are more for up for debate such as, do we need to regularly be updating work items? Or should we be going granular with our board design/workflows? The aim of this blog is to try challenge some of that thinking, backed by data.

For those curious, a total of 15,421 work items completed by 70 teams over the since June 1st 2024 were used as input to this research. Given this size, there may be other causal relationships at play (team size, length of time together, etc.) that are not included in this analysis.

Without further delay, let’s start looking at the different factors that may influence Cycle Time…

Days since an item was started (Work Item Age)

One of the most obvious factors that plays into Cycle Time is how long an item has been in-progress, otherwise known as Work Item Age.

Clearly this has the strongest correlation with Cycle Time as, when your item is in-progress, it will have been in that state for a number of days and then once it moves to ‘done’ there should never be a difference in those two values.

The results reflect that with a correlation coefficient of 1.000, and about as strong a positive correlation as you will ever see. This means that above everything else, we should always be focusing on Work Item Age if we’re trying to improve speed.

Elapsed time since a work item was created (Lead Time)

The next thing to consider is how long it’s been since an item was created. Often referred to as ‘Lead Time’, this will often be different to Cycle Time as there may be queues before work actually starts on an item.

This is useful to validate our own biases. For example, I have often made the case to teams that anything older than three months on the backlog probably should just be deleted, as YAGNI.

This had a correlation coefficient with of 0.713, which is a very strong correlation. This is to be largely expected, as longer cycle times invariably will mean longer lead times, given it (more often than not) makes up a large proportion of that metric.

Time taken to start an item

A closely related metric to this is the time (in days) it took us to start work on an item. There are two schools of thought to challenge here. One is the view of “we just need to get started” and the other being that potentially the longer you leave it, the less likely you’re going to have that item complete quickly (as you may have forgotten what it is about).

This one surprised me. I expected somewhat a stronger relationship than the 0.166 correlation. This shows there is some relationship but it is weak and therefore not going to impact your cycle time how quickly you do (or don’t!) start work on an item.

The number of comments on a work item

The number of comments made on a work item is the next measure to look at. The idea with this measure would be that more comments likely mean items take longer, due to their being ambiguity around the work item, blockers/delays, feedback etc.

Interestingly in this dataset there was minimal correlation, with a correlation coefficient of 0.147. This suggests there is a slight tendency for work items with more comments to have a longer cycle time, but we can see that after 12 or so comments this doesn’t seem to be true. This could be that by this point, clarification is reached/issues are resolved. Of course, once we go past this value there are far less items that have that amount of comments.

The number of updates made to a work item

How often a work item is updated is the next measure to consider. The rationale for this being teams are often focused on ensuring work items are ‘up to date’ and trying to avoid them going stale on the board:

An update is any change made to an item which, of course means that automations could be in place to skew the results. With the data used, it was very hard to determine those which were automated updates vs. genuine ones, which means there is a shortcoming in using this. There were some extreme outliers with more than 120 updates, which were easy to filter out. However once I started going past this point there was no way to easily determine which were automated vs. genuine (and I was not going to do this for all 15,421 work items!).

Interestingly here we see a somewhat stronger correlation than before, of 0.261. This is on the weak to moderate scale correlation wise. Of course this does not mean just automating updates to work items will improve flow!

The number of board columns a team has

The next measure to consider is the number of board columns a team has. The reason for looking at this is that there are different schools of thought around how ‘granular’ you should go with your board design. Some argue that To Do | Doing | Done is all that is needed. Others would say viewing by specialism helps see bottlenecks and some would even say more high-level views (e.g. Options | Identifying the problem | Solving the problem | Learning) encourages greater collaboration.

The results show that really, it doesn’t matter what you do. The weak correlation of 0.046 shows that really, board columns don’t have any part to play in relation to speed.

Flow Efficiency

Flow efficiency is an adaptation from the lean world metric of process efficiency. This is where for a particular work item we measure the percentage of active time — i.e., time spent actually working on the item against the total time (active time + waiting time) that it took to for the item to complete.

This one was probably the most surprising. A correlation coefficient of -0.343 suggests a moderate negative correlation. What this means is that as Flow Efficiency increases, Cycle Time tends to decrease. The correlation of -0.343 shows this the relationship between the two whilst not very strong is certainly meaningful.

The number of times a work item was blocked

The final measure was looking at how often a work item was blocked. The thinking with this one would be if work is frequently getting blocked then surely this will increase the cycle time.

It’s worth noting a shortcoming here is not how long it was blocked for, just how often blocked. So, for example, if an item was blocked once but it was blocked for nearly all the cycle time, it would still only register as being blocked once. Similarly, this is obviously dependant on teams blocking work when it is actually blocked (and/or having a clear definition of blocked).

Here we have the weakest of all correlations, 0.021. This really surprised me as I would have thought the blocker frequency would impact cycle time, but the results of this suggest otherwise.

Summary

So what does this look like when we bring it all together? Copying the same style of DoubleLoop, we can start to see which of our measures have the strongest and weakest relationship with Cycle Time:

What does this mean for you and your teams?

Well, it’s clear that Work Item Age is the key metric to focus on, given just how closely it correlates with Cycle Time. If you’re trying to improve (reduce) Cycle Time without looking at Work Item Age, you really are wasting your efforts.

After that, you want to consider how long something has been on the backlog for (i.e. how long it was since it was created). Keeping work items regularly updated is the next thing you can be doing to reduce cycle time. Following this, retaining a balance of the time taken to start a work item and keeping an eye on the comment count would be something to consider.

The number of board columns a team has and how often work is marked as blocked seem to have no bearing on cycle time. So don’t worry too much about how simplified or complex your kanban board is, or focusing retros on those items blocked the most. That being said, a shortcoming of this data is that it is missing the impact of blockers.

Finally, stop caring so much about flow efficiency! Optimising flow efficiency is more than likely not going to make work flow faster, no matter what your favourite thought leader might say.

Our survey says…uncovering the real numbers behind flow efficienc

Flow Efficiency is a metric that is lauded as being a true measure of agility yet it has never had any clear data supporting it, until now. This blog looks at over 60 teams here in ASOS and what numbers they see from a flow efficiency perspective…

A mockup of a family fortunes scoreboard for flow efficiency

Not too long ago, I risked the wrath of the lean-agile world by documenting the many flaws of flow efficiency. It certainly got plenty of engagement — as it currently is my second most read article on Medium. On the whole it seemed to get mostly positive engagement (minus a few rude replies on LinkedIn!) which does make me question why it still gets traction through things like the flow framework and Scaled Agile Framework (SAFe). I put it down to something akin to this:

A comic of mocking a consutlancy changing their mind on selling SAFe

Source: No more SAFe

For those who missed the last post and are wondering what it is, flow efficiency is an adaptation of a metric from the lean world known as process efficiency. This is where, for a particular work item, we measure the percentage of active time — i.e., time spent actually working on the item against the total time (active time + waiting time) that it took to for the item to complete.

For example, if we were to take a team’s Kanban board, it may look something like this:

An example kanban board

Source: Flow Efficiency: Powering the Current of Your Work

Flow efficiency is therefore calculated like so:

An explanation of the flow efficiency calculation of active time divided by total time multiplied by 100

One of my main issues with Flow Efficiency is the way it is lauded as the ‘thing’ to measure. There are plenty of anecdotal references to it, yet zero evidence and/or data to back up the claims. Here’s some of the top results on Google:

None of these statements have any data to support the claims they make. Rather than bemoan this further, I thought, in line with the previous blog on Monte Carlo forecasting accuracy and with the abundance of teams we have in ASOS Tech, let’s actually look at what the data says…

Gathering data

At ASOS, I am very fortunate as a coach that those before me such as Helen Meek, Duncan Walker and Ian Davies, all invested time in educating teams about agility and flow. When randomly selecting teams for this blog, I found that none of the teams were missing “queues” in their workflow, which is often the initial stumbling block for measuring flow efficiency.

Flow Efficiency is one of the many metrics we have available to our ASOS Tech teams, to use when it comes to measuring their flow of work and as an objective lens in their efforts towards continuous improvement. Teams choose a given time period and select which steps in their workflow are their ‘work’ states (i.e. when work on an item is actually taking place):

An animation showing how teams configure their flow efficiency metric

Once they have done this, a chart will then plot the Flow Efficiency for each respective item against the completed date, as well as showing the average for all the completed items in that period. The chart uses a coloured scale to highlight those items with the lowest flow efficiency (in orange) through to those with the highest (in blue):

An example of a flow efficiency chart showing 29% for a completed item

For this article, I did this process for 63 teams, playing around with the date slicer for periods over the last twelve months and finding the lowest and highest average flow efficiency values for a given period:

An animation showing the date range showing different flow efficiency calculations

This was then recorded in my dataset like so:

A table showing the minimum and maximum flow efficiency measures for a team

All teams use a blend of different practices and frameworks — some use Scrum/ScrumBut, others use more Kanban/continuous flow or blend these with things like eXtreme Programming. Some teams are working on things you will see on the front-end/customer facing parts of the journey (e.g. Saved Items), others back-end system (e.g. tools we use for stock management and fulfilment).

Now that I’ve explained the data capture process — let’s look at the results!

Our survey says…

With so many teams to visualize, it’s difficult to find a way to show this that satisfies all. I went with a dumbbell chart as this allows us to show the variance in lowest-highest flow efficiency value per team and the average across all teams:

A summary dumbbell chart of the results showing a range of 9–68%

Some key findings being:

  • We actually now have concrete data around flow efficiency! We can see that with this study flow efficiency values from 9–68% have been recorded

  • The average flow efficiency is 35%

  • Flow efficiency has variability — any time you hear someone say they have seen flow efficiency values typically of n% (i.e. a single number), treat this with caution/scepticism as it should always be communicated as a range. We can see that all teams had variation in their values, with 38% of the teams in this data (25 of 63 teams) actually having a flow efficiency difference of >10%

  • If we were to take some of those original quotes and categorised our flow efficiency based on what is ‘typically’ seen in to groups of Low (<5%), Medium (5–15%), High (15–40%) and Very High (>40%) then the teams would look something like this:

A chart showing the distribution of results

The problem with this is, whilst we do everything we can to reduce dependencies on teams and leverage microservices, there is no way nearly all teams have either high or very high flow efficiency.

I believe a better categorisation that we should move to would be — Low (<20%), Medium (20–40%), High (40–60%) and Very High (>60%), as this would look like:

A chart showing an updated distribution of the results

But, but…these numbers are too high?!

Perhaps you might be reading this and have your own experiences and/or share the same views as those sources referenced at the beginning of the blog. There is no way I can disagree with your lived experience but I do think that when talking about numbers people need to be prepared to bring more to the conversation than anecdotal reference points. Are these numbers a reflection of the “true” flow efficiency of those items? Definitely not! There is all the nuances of work items not being updated in real time, work being in active states on evenings/weekends when it clearly isn’t being worked on (I hope!), work actually being blocked but not marked as blocked etc. — all of which I explained in the previous article.

Let’s take a second and look at what it would mean practically if you wanted to get a more ‘accurate’ flow efficiency number. Assume a team has a workflow like so:

An example kanban board

If we take for example, 30 work items that have been completed by this team in a one month period. We assume that 60% of those items went through all columns (7 transitions) and the remaining 40% skip some columns (2 to 6 transitions) — allowing for some variability in how items move through our workflow:

An example table of data showing column transitions

Then we assume that, like all teams at ASOS (or teams using tools like Azure DevOps or Jira), they comment on items. This could be to let people know something has progressed, clarify part of the acceptance criteria, ask a question, etc. Let’s say that this can happen anywhere from 0–4 times per item:

An example table of data showing column transitions and comment count

Not only that but we also mark an item when it is blocked and also then unblocked:

An example table of data showing column transitions, comment count, times blocked and unblocked

Note: randomised using Excel

Now, if we just look at that alone, that means 348 updates to work items in a one month period. If we then wanted to add in when work is waiting, we would need to account for a few (2) or many (6 — conservatively!) times an item is waiting, as well as adding a comment sometimes (but not all the time) so that people know the reason why it is waiting:

An example table of data showing column transitions, comment count, times blocked and unblocked, wait time and waiting reason

Random values again calculated via Excel :)

We can already see that with these conservative guesses we’re adding nearly 200 more updates to just 30 work items. Once you start to do this at scale, whether that be for more items and/or more teams as well as over a longer period, you can see just how much additional work this means for teams in interacting with their tool of choice. Combine this with the costs of context switching (i.e. moving out of being ‘in the work’ to log into the tool that you are ‘waiting’) and you can see why tracking flow efficiency to more accuracy is a fools errand.

Summary

My hope is that we can now start to have a proper conversation around what ‘typical’ flow efficiency numbers we see. Whether it be what ‘typical’ values you see or what a ‘high’ flow efficiency looks like. To my knowledge this is the first attempt at something like this in our industry… and it should not be the last! In addition to this, I wanted to demonstrate what it would truly mean if you wanted a team to place more focus on getting an ‘accurate’ flow efficiency value.

For those wondering if it has changed my opinion on flow efficiency, as you can probably tell it has not. What I would say is that I have cooled a little on flaw #1 of teams not modelling queues in their workflow, given nearly all these teams had multiple queue states modelled (after good training/coaching). I still stand by the other flaws that come with it and now I hope folks who are familiarising themselves with it have this information as a common reference point when determining its appropriateness for them and their context.

What are the alternatives? Certainly Work Item Age would be a good starting point, as items aging at a rate greater than your historical cycle time might allude to an inefficient way of working. Another approach could also be looking at blocked work metrics and insights those bring.

What are your thoughts on the findings? Surprised? Sceptical? 

Let me know in the replies what you think…