Understanding Maintenance Metrics

The “mean time to repair” and “mean time before failure” metrics can be used to track asset performance as well as maintenance teams and individuals.

David Greenfield, editor in chief

Sept. 9, 2021

6 min read

Two common asset management metrics whose relevance has grown in importance even as new analytic technologies such as predictive maintenance have emerged are “mean time to repair” (MTTR) and “mean time before failure” (MTBF). Though these metrics have been used widely for years, they don’t always get the attention they deserve, which has led to some confusion among end users about how to properly calculate them and apply their results.

Automation World connected with Sam Russem, senior director of smart manufacturing solutions at Grantek, a system integration and engineering services company, to learn more about MTTR and MTBF for a recent episode of the “Automation World Gets Your Questions Answered” podcast series.

Defining the terms
Starting our discussion with a focus on MTTR, Russem said, “It is essentially the average time it takes to get an asset back up and running once it's gone down. So, if your case packer goes down at noon, and you get it back up and running at 12:30, that took 30 minutes to repair. That's your time to repair. Then, when you average those downtime periods together over a certain time span, that’s your mean time to repair over that period of time.”

In terms of MTTR’s application, Russem explained that, if the MTTR on most of your assets is in the 10 to 20 minute range, but one key piece of equipment has an MTTR measured in hours or days, this kind of insight can help to prioritize your maintenance schedule and keep production running.

Sam Russem, senior director of smart manufacturing solutions at Grantek.

Another common use of MTTR is to evaluate maintenance teams, or even maintenance individuals. “Maintenance always wants to be driving MTTR down because that means they're getting things back up and running quicker in the event of a downtime. But if you look into the data, and you realize that a certain maintenance technician consistently has a lower MTTR than another technician, this can be a training opportunity to spread that worker’s knowledge around. It can also tell you a lot about how your maintenance team has performed.”

Russem said that MTBF is much like MTTR in that they both focus on the length of specific downtime events for a piece of equipment. The primary difference is that MTBF is “much more of an indicator of your machine and asset performance than it is a personnel assessment,” he said.

“It’s a really interesting metric to examine over time because, if you have an asset with an MTBF of 30 days and that figure drops to 28 days, then 25 and continues to fail, that indicates the asset is due for maintenance or some other type of remediation to get it to fail less frequently.”

Maintenance benefits
Beyond the application of MTTR and MTBF to evaluate maintenance teams and asset performance, these metrics can also be used to track repair rates not just by asset, but by specific fault code on that asset. “This can help further prioritize maintenance activities and organize the maintenance team’s schedule,” said Russem.

“If a specific machine is showing a in-feed jam fault, and we know it usually takes 15 minutes to repair, that can really help your maintenance team organize their scheduling,” he said. “On the operations side, think about how MTTR ties into operationally focused metrics. For example, if you are improving your MTTR, you should see that reflected in a higher asset availability and a higher OEE (overall equipment effectiveness) score.”

Your OEE system could be pulling data directly from PLCs, whereas your CMMS is likely tied to maintenance logs. Then, when operators or maintenance personnel log data into one of these systems, discrepancies between how those two different systems are reporting can highlight information about how well your operations and maintenance are in sync, or not.

MTBF can also be used to identify opportunities for planned downtime. “For example, if a machine is failing every two weeks, and there's a planned downtime for preventative maintenance on that machine, that can indicate opportunities to bring those MTBF numbers up a little bit higher and keep things running longer,” said Russem. “On the operations side, lower MTBF numbers mean that production assets are failing more frequently and hurting overall productivity. In other words, there are some situations where operators could directly influence MTBF time; for example, if you have an operator who knows how to tweak machine parameters to increase throughput. But that may be decreasing MTBF by making the line go down more often and causing overall productivity to decrease. That’s why MTBF is one of those numbers that can help you understand and balance the overall effectiveness of a line and its influence on overall throughput.”

Metric calculation
Fortunately, MTBF and MTTR are fairly simple to calculate. As Russem noted, all you really need to calculate these metrics is to know when your assets are down and when they're back up.

“As long as you have those pieces of information, you can calculate MTTR and MTBF, which is why there are many pieces of software—like OEE and CMMS (computerized maintenance management systems)—that can do these calculations based on maintenance and operator logs,” said Russem. “But it doesn't mean, if you don't have one of those systems, that you can't get this information yourself. As long you have that asset availability information, you’re just one Excel sheet and a quick calculation away from figuring out and tracking these numbers.”

Listen to the podcast with Sam Russem explaining MTTR and MTBF.

Essentially, any software connected to your machine data or receiving operator or maintenance log information is capable of helping you obtain these metrics.

Using CMMS and OEE software to calculate MTTR and MTBF can also help you zero in on more difficult-to-determine operational insights. Russem noted that MTTR and MTBF can become more interesting when numbers [from different software systems] don't match up. For example, your OEE system could be pulling data directly from PLCs, whereas your CMMS is likely tied to maintenance logs. Then, when operators or maintenance personnel log data into one of these systems, discrepancies between how those two different systems are reporting can highlight information about how well your operations and maintenance are in sync, or not.

About the Author

David Greenfield, editor in chief

Editor in Chief

David Greenfield joined Automation World in June 2011. Bringing a wealth of industry knowledge and media experience to his position, David’s contributions can be found in AW’s print and online editions and custom projects. Earlier in his career, David was Editorial Director of Design News at UBM Electronics, and prior to joining UBM, he was Editorial Director of Control Engineering at Reed Business Information, where he also worked on Manufacturing Business Technology as Publisher.