Retrace Your Steps
By Mike Cowles, Senior Project Engineer
As a father of two boys, one of the most common things I do is try to find the things that they may have lost around the house, whether it is toys, TV remote, game controllers, or anything that is special to them that they are not able to find on their own. Many times we've torn the whole house apart looking for the smallest toy, only to find it in a drawer that they all of the sudden remember leaving it in. Many times even I've tried to locate something like my keys or wallet when it had been in my pocket the whole time. I even remember when I was a kid losing that particular thing and then asking my parents if they knew where it was or if they could help me find it. My parents often told me what I now tell my kids, "Retrace your steps". Many times retracing steps involves answering three questions.
What did you do today?
When do you remember last seeing what you lost?
Where are the places that lost things tend to show up?
I find myself using the same steps in my almost 20 years of troubleshooting lost or wrong data in plant monitoring systems across a few companies from when the data leaves the PLC to when it shows up on the the data display boards (Large ceiling mounted data display boards where data from which data can be viewed from the floor), reports, pages or any other outputs that the plant operators see. For example, when someone points out that a data display board either either has missing or wrong data, I answer the same three questions as posed above though posed a little differently.
What is the software path that the data goes through to its output destination?
What was the last point in that path when the data was correct?
What are the common issues in that place that cause the data to behave like it is behaving now?
I will be going through how I go through each of the questions in more detail below.
1. Know the path (What did you do today?)
The first step in finding something that is lost is to think back on what you did that day before you lost the item. Where did you go with it? Similarly when looking for lost or incorrect data, it would definitely help to know what path through the system that data takes to get to its destination. This is where a good flow chart or diagram can come in showing all the different places data goes through to get to the output. On many older systems, it can go from PLC, to a data collection PC server with one or two OPC servers, then get saved to a database that stores all the raw data. The data may then get transferred to another database with special curated data binned or calculated together to more easily show on show on data display boards or reports. Then there might be additional software to display that data into the correct places. Sometimes the data could go through 5 or more different pieces of software before it gets to its’ destination. Often times when training somebody new, the first thing I do is either draw up a flowchart of all the possible paths of the data and then reference back to it whenever I go through and train them on each individual piece. Knowing the data path ahead of time can make searching for where the data breakdown is a lot easier.
2. Check the checkpoints (When do you remember last seeing what you lost?)
Once you remember everything you did that day, the next step to finding your lost item would be to remember when was the last time you saw what you lost during that day. Where were you? What were you doing at that time? When looking for lost or incorrect data, this would involve checking the checkpoints of the data path from step 1. When training new people how to troubleshoot, while knowing the data path is step 1, knowing how to check the data at each point in the data path would be the next step in learning where the data broke down. For example, at the OPC level, I would typically check an OPC Client for the data items in question and see if they are what I expect or not. If they are what I expect, the issue would be after that point in the data path. If not, the issue would be either at that point or before that point in the data path. Any reports that look directly at the raw data from a database before calculations are done for reporting would be another good spot to check. Log files can also be a good place to see whether the data is still good at this point or not. By checking each of these checkpoints, you can vastly narrow down where the issue might be. This would also be called deductive reasoning. By eliminating all other possibilities, the one that is left must be where the issue is.
3. Verify common bottlenecks (Where are the places that lost things tend to show up?)
Once you realize where you were and what you were doing when you lost the item, you can then find the common places that lost items tend to show up around that area. Similarly with plant monitoring, once we find where the data stopped being correct, we can check the most common reasons for why the correct data is not getting passed that point. When training someone new, I tend to focus on areas where people tend to make common mistakes that causes data not to get passed a second point. One area of focus would be to match the linking words between programs. For example, data might be referred to one word in a program that is sending the data, but then if the receiving program has typo which changes the word to something different, the data would have nowhere to go and gets stuck. Sometimes even little things like extra spaces or even hidden characters can make stop data going from one link to another. When checking the links, I often paste both words in a text file where every character is the same width to easily spot if there are any differences.
Proper use of the three above steps can greatly aid in trouble shooting issues of incorrect or lost data in plant monitoring systems. While newer IIOT systems are starting to come up with and integrate more all-in-one solutions which will also help with trouble-shooting, these techniques should still offer a good baseline for finding solutions to issues that may come up day to day in the plant monitoring systems of still many plants today.