This is all pretty intuitive so far. Obviously, to test the email notifications, you have to test all of the email options you have enabled. But, some of the options we can’t test. You and I can’t generate the offers and promotions emails to validate that the setting is working properly. That’s where the old adage comes into play. “Lord, give me the wisdom to change what I can change and the wine to endure what I can’t change.” And now the fun begins. The notification for low battery is enabled when you subscribe to device health notifications. For you and I, testing this is actually a bit time consuming. It takes days when using the maximum power consumption setting for a Ring Camera to get the battery down to the threshold required to generating the email notification. Once we hit the threshold, the email should be sent. That’s fine for a consumer, but a tester also has to make sure that no notification is sent when device health is not subscribed to.
For a low-quality test, that’s enough, but let’s take it up a notch. What happens if I only charge the battery to 1% over the threshold? Will the notification still work? There is logic that goes into triggering and “resetting” the status. This introduces the opportunity for error. Do we really need to test that? And why 1% over the threshold and not 2%, or 3.14159%, etc.? The reason for 1% is that a common programming error occurs when operators such as equal to, less than or equal to, and greater than or equal to are incorrectly used. We’re going to use 30% as a trigger point. Should a programmer choose to reset the notification at greater than 31%, then upon recharging to 31% the trigger is not reset; the battery will run out without any low battery notification emails. The obvious logic is to reset the notification if the battery level is greater than 30%. Logic errors that simple can be made, but more importantly, they are made, and so testing for those types of errors is essential. A friend of mine (Hi Hugh) once crashed the entire entertainment system on an airplane due to the existence of a very similar type of logic error in the programming of one of the games in the entertainment system. I want you to think about that for a moment. Testing for what could be, rather than what should be, enabled my friend to (inadvertently) crash an entire in-flight entertainment system. It was probably a bug in the entire system that allowed one game, being played at one seat in the airplane, to bring down the entire entertainment system!
Back to Ring. Ideally, multiple voltage thresholds would be tested. But the more complex the test, the more expensive it is to test. Because we are talking about a security product, significantly more testing is required.
Great! It worked. Now, how about the other options? Do they work? Wait!!! Didn’t I say that I’m testing email notifications? Well, I’ve barely started to test email notifications. There are seven types of notifications here. Does the email trigger if device health is the only notification that is enabled? The truth is that there are 27th (128) possible combination of the seven selections. That means that to test all of the combinations, I would first have to enable email notifications alone and then test with one other notification type at a time enabled, then two other notification types, and so on. For example, I’d have device health notifications turned on, along with educational tutorial notifications. Then, I’d add customer survey notifications, and so on.
Nice easy test, but there’s more. For critical systems testing, you’re going to toggle the device health off and then back on, before going through all of the combinations of subscriptions again. Yep, toggling settings on and off can create errors. I’ve seen this type of logic error in testing real security products.
This is only the low-hanging fruit. We are not done yet. Now, we have to test the mobile and SMS subscription settings. Say what? Didn’t I just say I was testing email notifications? Yes, but without testing, I cannot know if the email notifications are affected by toggling the notification settings for mobile and SMS selections. And at that, all possible combinations from toggling every setting in SMS alone, Mobile alone, and both Mobile and SMS interactions, must be tested. And I have to do another test for the trigger threshold, and yet another test for charging to 1% above the threshold. Ultimately, we are creating a test matrix that includes thousands of test cases just to see if the low-battery email notifications are reliably functioning. Guess what you have to do to ensure that SMS email notifications are working? Mobile email notifications? Finally, we’re finished with the easy part.
Notice that we have only tested one part of the app. Completely unrelated settings in other parts of the app have the potential to affect the reliability of notifications. Perhaps the hardest part of this process is actually creating the test matrix. Identifying every possible test case is only a portion of the battle. You have to look at what you can afford to test, what the lowest-risk test scenarios you can eliminate are, and potentially the most complex test cases. Complexity = time and money.
Testing is expensive. Even with automation that can speed up testing by hundreds to thousands of times, writing the automation requires time. Additionally, there may be significant human intervention required. You might think that you can emulate battery charge states and charge and discharge conditions to save time, but a battery is not the same as an electronic power supply that performs AC to DC rectification. Using a real battery that is fully charged, but adjusting the voltage with a potentiometer can save a lot of time, and even be automated, but in the real world, a potentiometer is not used. This is not to say that a test using a potentiometer is invalid, but rather that it is a mistake to assume automation and emulation techniques work properly if they have not been validated. Only then can you substitute emulation and be confident of the fidelity of the test based upon empirical evidence. In reality, extremely few, if any companies are going to test every possible permutation of sequences in an app, unless there are only a few permutations to test.
For you and I, all we care about is does the email notification setting work. To make sure that the notifications reliably work, testers must do a whole lot more than simply toggling email on and off.
At SecureIQLab, we validate cloud security products. A web application firewall is arguably much more complex than a Ring Security system. With potentially hundreds of rules, notifications, connectivity states, interoperability requirements, and more, there’s more to test than test labs can afford to in terms of both time and money. Vendors can’t test everything either.
With several dozens of years of experience, we design our test matrices based upon real-world experience, and a qualified analysis of what needs to be tested.
OK, you thought I was about to break one of the laws of the universe. Security blogs must include AI, or the universe will apparently unravel. So, with no further ado, some no-hype AI commentary. AI and big data can help to reduce the complexity of testing by identifying how it is possible for disparate conditions to cause failures. These types of systems can identify what tests aren’t necessary based upon a wealth of data. But guess what? Each software update has the potential to break what previously worked.
I’ll conclude by sharing a real example of what can happen with an incomplete test matrix.
Way back in the days of the floppy disk, I was testing an antivirus product for inclusion in a conference paper titled “Testing for Broken Antivirus Software.” It was a product that I didn’t use, but the conference organizers asked me to include the product in the presentation. I decided to go after the low-hanging fruit first. I knew of a product that had failed to detect a boot sector virus on a floppy disk that contained no files.
That’s really easy to test for. I took a blank floppy disk, infected it with a boot sector virus, and fired up the virus scanner. Guess what happened next? Yeah, a message came up telling me to insert a floppy disk into the drive. Doh! I forgot to put the floppy disk into the drive. We call things like that “Stupid Human Tricks.” The trick is stupid, not the human. I put the floppy disk into the drive and clicked ok. I bet you can guess what happened. Drum roll, please… No detection. But a test requires repeatability to be validated. I closed the app, opened it, and scanned again. This time, the virus was detected.
The error message telling me to insert the floppy had changed the logic flow and broke detection. Remember I mentioned repeatability? I tested both cases again. Yay!!! I was able to reliably reproduce the error.
This should shine light on the challenges involved in creating a comprehensive test matrix. In my example, a vital test case was missed by the vendor. Without the failure, I would have had to work harder to break the product. What I found was a very serious bug that almost certainly generated a new test case in the vendor’s test matrix.
Stupid human tricks are why you can’t completely replace all testing with automation. They can also uncover serious bugs that were not discovered due to an incomplete test matrix. The vendor had and has extremely smart people working for them, but we all make mistakes. Catching those mistakes is what testing is all about. It’s how we testers contribute to improving security products for the greater good.
Randy Abrams
SecureIQLab
Senior Security Analyst Emeritus