Comments on: Why Your Delivery Predictions Will Always Be Wrong if You Keep Mapping Story Points to Hours

By: Bruno

Bruno — Fri, 20 Jun 2025 11:58:21 +0000

There’s even more to that …

In the meantime, I’ve looked at this for more than 2 dozen teams and teams-of-teams. And I didn’t find just one with a high correlation. From that I conclude that it’s unlikely to be a matter of proficiency in estimating.

In contrast to a lack of correlation between Cycle-Time and estimated SP, Velocity – i.e. Throughput measured in SP – and Throughput measured in work items per time typically highly correlate; I found just one counterexample. From that I conclude that if estimation is done for reasons of capacity – i.e. answering the question “How much can be done until X?” – counting work items is equally effective, but more efficient.

Well, nothing new for the Kanban Community, but it’s good to see that being backed up by real-world evidence.

By: Sebastian

Sebastian — Thu, 19 Jun 2025 20:53:12 +0000

In reply to Matthew Lewis.

Just look at the percentiles of your cycle times for your work items to predict the elapsed time of a future item. All your work items have a certain distribution of their size and cycle times.
If, e.g., 85% of your finished items took let’s say 20 days or less, than there’s a 85% of a future item to stay 20 days or less in your workflow. Size doesn’t actually matter in this case, it’s implicitly in your historic data.

If you want to forecast the time when a bunch of tickets are done, use throughput data and Monte Carlo Simulations. They will give you a probability and a range, e.g., 40 items will be done in 14 or less days with a confidence of 85%.
What you look at might be mistaken as “noise”, but it’s not. It’s showing your company’s constraints implicitly.

And by the way: another thing with SP is that they most likely violate linearity. If you estimate an item with 2 SP you expect an item with 8 SP as being “4 times as big”. Otherwise all velocity calculations are doomed. In practice, a 8 SP item might be “5 times as big” or “3 times as big”. Then adding up SP in your next sprint planning won’t work out, because adding up a 2 SP item and some 8 SP items will lead you to a false velocity.

As the article shows, there’s no correlation between SP and elapsed time and thus linearity is violated.

Cheers

By: Sebastian

Sebastian — Thu, 19 Jun 2025 20:16:09 +0000

Great article.
I conducted a similar experiment with our company data some months ago. Our teams estimate in hours and skip the “SP-to-hour-transformation”. The result was very much the same.
The elapsed time to estimated hours showed a correlation close to zero, meaning that estimates were more or less useless to answer the question “when will it be done” or “how much will be done in the next sprint”.
As a matter of fact, our burn down charts almost always show horizontal lines (they show “remaining effort in hours” which is an estimate, too).
With extracted throughput data and a little Monte Carlo Magic, I was able to show that A) our sprints were always overbooked (at least by a factor of 4 or 5) and B) in 85% of all forecasts the predicted range of finished work was correct (no surprise: the confidence was set to 85%).
I presented my results to a bunch of decision makers (people leads, POs, directors) and their reactions were astonishment and disbelief. They couldn’t get the point that our estimates are worthless. The most stubborn reaction was “then we have to get more accurate”. Dear, oh dear!
Historical data is so much better when it comes to forecasts since they implicitly contain all the company’s constraints—waiting times, blockers, rework, loops, bottlenecks, just to name a few.
From my perspective, any estimation is waste that needs to be reduced as much as possible. You probably cannot avoid it totally, but it’s probably fair enough to decide whether or not a story or task can be finished within a day or two (plus it’s sometimes useful to get a common understanding among the developers and testers).
The latest thing I heard from our top management was that all backlogs needed to be estimated upfront—all epics, all stories, all tasks, all current bugs. As a Scrum Master I made clear that’s not agile at all, and if they insist on doing so, they cannot count on my support for this insanity.
So, my message is clear: trust your own data, make evidence based decisions, skip estimating (as much as possible) and simply focus on the most valuable outcome.
Cheers

By: Sonya Siderova

Sonya Siderova — Thu, 19 Jun 2025 15:03:42 +0000

In reply to Matthew Lewis.

It’s not really about the size of the story. It’s about implementing (and following!) the rules to keep the delivery times of those same various sized tasks consistent.

Here are a few resources that dig into that.

https://getnave.com/blog/managing-items-of-different-sizes-in-kanban/
https://getnave.com/blog/explicit-policies/

Hope this helps! 😊

By: Matthew Lewis

Matthew Lewis — Thu, 19 Jun 2025 14:44:02 +0000

I really appreciate this article. It makes a clear, strong case on story point estimation causing more harm than good. The dark spot I see in the discussion is that while Kanban and flow systems can support a much better and clearer method for estimating, most teams–especially embedded software, feature, “cost-center” teams–that are expected to answer estimation questions don’t have a great way of standardizing what a unit of “work” is. A “ticket” is just as useless as a “point” for that sort of metric. When asked “when will that thing be done?” and one looks into the past data and sees nothing but noise about tickets and task sizes with nothing in common, it’s tough to do any better than guess from the gut. I’d love to see more guidance in that problem space.

By: Sonya Siderova

Sonya Siderova — Thu, 08 Sep 2022 07:11:44 +0000

In reply to Saira. Saira, I'm glad you find it useful!

By: Saira

Saira — Wed, 07 Sep 2022 09:21:24 +0000

In reply to Peter Kretzman. Thank you for the detailed analysis. Your comments make more sense.

By: Sonya Siderova

Sonya Siderova — Sat, 20 Mar 2021 17:14:56 +0000

In reply to Peter Kretzman.

Hi Peter,

Let me address your specific points one by one.

“Use of a graph to triumphantly “prove” a much stronger conclusion than just “don’t convert story points to hours”” – That’s a surprising conclusion. I don’t see how the article leads to it. The mention of the statement “you should never resort to story point estimation again” has its link assigned to it, and it is a totally different conversation that has nothing to do with converting story points into hours.

“The team behind the graph “are extremely good to predict how much time they will need to finish the work” – Yes, they are extremely good in predicting the EFFORT of the work, which in turn has nothing do to with the actual delivery time of the work (proved by the variability observed in their delivery times). I hope this makes it more clear. The effort for a work item, regardless of how it is measured, doesn’t correlate with delivery times. There is more to it, as you already stated.

“Although I’d emphasize that analysis is the very foundation of worthwhile estimation practices, not somehow separate or “differentiated” as you said” – In the article I pointed you out in my first comment, we prove that making delivery predictions (either to plan your releases or to come up with a delivery date) can be done using Monte Carlo simulations based on the throughput of the system. That process is highly accurate and effective when the delivery workflow is stable and predictable, and it could be very well separated from the analysis of the work (if the delivery workflow is not predictable, no method can produce reliable results quantifying the risks you’re managing).

“I’d emphasize that showing a graph as you did represents a core and incorrect assumption that they SHOULD map, and concluding erroneously that if they don’t, story points must therefore be worthless” – That conclusion is even more surprising that your first one, and if anyone reading this post comes up with the same, I’d point this out explicitely – the main purpose of the chart should be to serve as an evidence to prove that story points should not be mapped to hours since there is no correlation between them. Adding your data to the chart is supposed to be a one-time effort, the results of which would hopefully call the practice of mapping story points to hours into question.

I’d like to close this conversation by acknowledging that I really appreciate your contribution, Peter, your input is valuable!

By: Peter Kretzman

Peter Kretzman — Fri, 19 Mar 2021 17:51:43 +0000

In reply to Sonya Siderova.

Thanks for replying, Sonya. I do appreciate your thoughtful approach to all this. I’m a little confused though, because it doesn’t seem to me that you addressed my specific points. We do, though, have some common ground.

At core here, in my view, is a misunderstanding by many that story points somehow represent a time estimate. They don’t. They’re not a schedule; they’re input (among many other factors) to constructing a schedule. They’re an abstract assessment of size/complexity/risk/capacity drain for the team overall, and thus we shouldn’t expect them to map very cleanly, particularly *as individual items*, to the actual time consumed, for the various reasons I mentioned. So yes, as your article pointed out, it makes no sense to convert story points to hours, and as stated, I concur.

My issue, as stated in my previous note, is the use of a graph to triumphantly “prove” a much stronger conclusion than just “don’t convert story points to hours”: to wit, “you should never resort to story point estimation again”. As I said, there are many reasons (e.g., wait time) that cycle time can diverge from correlating to story points, BUT I emphasized that those root causes should be the focus for improvements, rather than deciding to completely ditch the notion of incorporating estimated item size into one’s analysis and discussions. (I’m puzzled, too, by your insistence that the team behind the graph “are extremely good to predict how much time they will need to finish the work”, because the whole point you were making by presenting the graph is that their predictions *didn’t* correlate well to actual cycle time.)

But anyway, common ground:
– we agree that analysis and risk management matters most (although I’d emphasize that analysis is the very foundation of worthwhile estimation practices, not somehow separate or “differentiated” as you said)
– we agree that no one should be mapping story points directly to hours, although I’d emphasize that showing a graph as you did represents a core and incorrect assumption that they SHOULD map, and concluding erroneously that if they don’t, story points must therefore be worthless. There are, however, many valid reasons to use story points, and I’ve covered some of them in my own two-part post, at http://www.peterkretzman.com/2018/10/24/quocknipucks-or-why-story-points-make-sense-part-1/.

By: Sonya Siderova

Sonya Siderova — Thu, 18 Mar 2021 20:44:50 +0000

In reply to Peter Kretzman.

Hi Peter,

First of all, thank you very much for spending the time and effort to give your 2 cents on the matter! I really appreciate it!

We’re not followers of the #NoEstimates movement. We only work with numbers and make conclusions based on the evidence we discover.

Let me provide some more context that hopefully will cover your points. Whenever you’re unhiding information, you need to take action on it. The act of exposing the information has the sole goal to reveal an opportunity for improvement. The example we outlined in this article is for a new team that consists of the top specialists in the technology this company is developing. They are extremely good to predict how much time they will need to finish the work.

The next step was to perform an analysis of their delivery system. It turned out that in 60% of the cases, the work was blocked by external dependencies. Additionally, their work generated plenty of waiting time due to the bottleneck in their workflow. The QA specialists struggled to handle all the work produced by the developers and as such the work was stuck in the middle of the process.

The bottom line is, their delivery times were drastically impacted by the waiting time in their system. That’s not something that can be predicted estimating the effort of the work, regardless of its granularity level. Taking this outcome further, it became obvious that the practice of estimating the work by converting story points to hours (which is the topic of this article) was flawed.

When it comes to making delivery predictions, I firmly believe that the main goal is to manage risks and more specifically, to mitigate the risks as much as possible.

The realm of probability is all about managing risks when we make delivery commitments. We need to be able to quantify the risk we’re dealing with to make reliable future predictions. Neither story point estimation nor hours or days estimation can provide the certainty of achieving our goals. Probabilistic forecasting is the way to go, and we’ve added our 2 cents on that matter in this piece of content https://getnave.com/blog/release-planning/

Finally, we’ve made a clear differentiation between analyzing the work (which is a must) and estimating the work. The goal of this article is to bring more awareness to the matter. Did you know that there are 20 people per day on average searching for information about how to map story points to hours? I strongly believe that each and every one of these people need a different perspective to be able to call that antipattern into question!