Testing Insights: Test Management

Showing posts with label Test Management. Show all posts

Tuesday, 18 January 2011

Integration Strikes Again

When I get into the client's office this morning I have to do a conference call to look at how well the integration of two of their systems (via a third) is going. This follows a call last Thursday and there is one already scheduled for tomorrow. The original plan was to hand this over to the unit I am setting up at the end of December to be tested. I had that plan changed in early December to one where the construction team would actually Integrate it (see article here) before handing it over for test. We also defined a set of basic integration tests we wanted the construction team to demonstrate at the time of handover. Four weeks were allowed for the integration prior to the handover.

We are just over half way through the Integration period. The score at the end of yesterday was zero of the demonstration tests attempted, one type of message and item of data successfully sent from system A to system B and no messsages yet sent from B to A. The alarm bells are ringing and hence the start of a series of frequent calls to recover things.

Why has this happend again? All of the ususal causes alluded to in the article or in the more detailed analysis given in the notes here which also explain the concepts of a moe effective Integration approach. The way this exercises was likely to go was forseen and forewarned. The re-plan that gave the construction team four more weeks to get it working has helped, it has avoided disruption and waste from trying to test something that simply does not work, but we have yet to see how long it will actually take get to the point where the basic integration tests planned for the handover pass.

Lets see what this morning call brings.

Tuesday, 28 December 2010

Well has the test failed or hasn’t it?

When should you classify a test as Failed? This sounds such a simple question and you may think the answer is obvious; however there are some factors that mean a well thought out approach can have significant benefits to the test manager.

Introduction

Generally one of the states used in test reporting is Failed. A common assumption, one that is generally sound, is that failed tests mean you have problems. Given typical practice a less well founded extension of this goes that failed tests indicate the system has problems doing the things that those tests were testing. Years of attempting to understand what is really going on inside projects show that this is the point at which the complexity of the real world overwhelms the abstract model of tests and test failures.

Think about the simple abstract model. Tests have actions and, hopefully, expected outcomes. If the action is done correctly and the outcome does not match the expectation then the test has Failed. Simple or what? This model is applied on a regular basis all over the world so what is the issue? Issues come in many forms and will be illustrated here using three examples.

Example One - Environmental Problem

Our system allows external users to submit transactions through a web-portal. There is a change to the way these submissions are to be presented to internal users on the backend system. If the submission has an attachment this is flagged to the user. One type of transaction has three modes; two tests are passed and the third is failed. Over a number of days a common understanding across both the test and development team builds up that the change works for two of the three modes and does not work for the third. Only when we dig into the detail to decide whether to release with the issue or not do we discover that transactions for the third mode fail to submit at the portal. No on had managed to get this transaction in; the handling of it in the backend had not been tried.

The real problem was a test environment configuration issue that derailed this test. The test was marked as Failed and the story began to develop that the third mode did not work. This test had not Failed it was blocked and unable to progress and discharge its purpose.

Example Two - Incorrect Search Results

To test that billing accurately consolidates associated accounts these associations have to be created and then the accounts billed. To associate accounts one account is selected as the master and then a search facility is used to obtain the list of accounts that can be associated; selections are then made from the list. After this billing can be tested. When the search is done it returns the wrong accounts and association attempts fail. Has the test failed?

If the test is classified as failed this tends to (well should) indicate that when you bill associated accounts then the bill is wrong. So marking tests like this as failed sends the wrong message. The test can't be completed and a fault has been observed and can't be ignored, but this fault is not to do with the thing being tested.

Example Three - Missing Input Box

A test navigates through a sequence of common HCI areas. On one page it is observed that one of the expected input boxes is missing. This doesn't bother us as the test doesn't use it. Everything works well for the test. Has it Passed?

The most meaningful outcome for the test is that it Passed; but then that leaves the defect that was observed floating around so shouldn't it be marked as failed to ensure it is re-tested?

An Alternative model of Failure.

Those were just three examples. There are many similar variations; so what rules should be used to decide whether to claim Failure? Generally a test should have a purpose and should include explicit checks that assess whether the thing tested by that purpose has or has not worked correctly. An expected result after an action may be such a check; alternatively a check may require more complex collection and analysis of data. Checks should relate to the purpose of the test. Only if a check is found to be false should the test be marked as Failed. If all the checks are ok then the test is not Failed even if it reveals a defect.

The role of Expected Results

So are all expected results checks? Often there are expected results at every step; from logging in through navigation to finally leaving the system. Given this the position is a very very strong no. Many expected results in tests serve a utility purpose. They verify some step has been done as required; they often say little about the thing the test is actually needed to prove. If you don't get the expected result then it means there is a problem some where; a problem with the test, with the way it is executed or with the system; however it does not necessarily mean that there is a problem with the thing being tested. Only when there is a definite problem with that should the test claim to be a Failure.

Orphaned Defects

That leaves defects that are triggered when running tests but that don't mean the test has Failed. We could end up with no tests Failed, perhaps even all Passed, and a stack of defects; this is counter intuitive so what is going on? Actually the discipline of refusing to fail tests unless an explicit check fails provides very useful feedback.The statistical discrepancy can indicate:

(a) That the tests do not have adequate checks; they are revealing errors in the thing being tested that can be seen but nothing in the test itself says check for that. Time to improve the test and then mark it as Failed. Improving the test is required to make the defect detection delivered by the tests consistent; we should only depend on explicitly defined error detection.

(b) That we are finding errors in things that are not being tested as no test is failing as a result of the defect. For control purposes add tests that do Fail because of the defects. Also is this indicating a major hole in regression or testing of the changes? If so is action required?

Conclusion

Adopting an approach that governs, actually restricts, when a test can be marked as Failed to circumstances where an explicit check has shown an issue provides more precise status on the system and improved feedback on the quality of the testing. Furthermore this reduces the discrepancy between the picture painted by test results and the actual state of the release and the management time required to resolve this.

Friday, 26 November 2010

Integration; the puzzle at the heart of the project.

We have recently started working with a new client on changes to their testing and delivery practice. The aims is to increase the throughput of development and at the same time accelerate delivery and maintain quality. This has been running for a few weeks now and enough time has elapsed for us to start hearing stories about previous projects and what went well and what was problematic.

Today we had a planning session for a project that involves the connection and interoperation of two systems. In this session it became clear that their experiences of this type of endeavour were very similar to ones we have seen elsewhere. Connecting systems is always more complex than expected, there is always lots of stuff that is not adequately prepared, lots of things that go wrong and it always takes far longer than anyone thought.

On the plus side it was reassuring to hear their head of development recounting similar experiences and holding a similar position to my own on how what has to be done next time if there is to be any chance of avoiding the same fate. There was a common understanding of the need for someone being accountability for getting things to work. There was similar alignment over the need to use virtual teams, the importance of preparation, the risk from environmental problems and the need for hands on technical capability and determination.

It was some years ago that we identified integration as one of the number one issues affecting projects both large and small. A distinguishing aspect of our thinking is the major distinction we make between the act of getting it working and the act of testing whether it is working, We always try and get clients to think of the discipline of Integration (see Integration Papers ) as something that stands apart from testing; even from testing called Integration Testing.

Tuesday, 16 November 2010

To release or not to release, that is the question.

Here are two interesting propositions. Number one; test managers should focus on getting as quickly as possible to a state where it is obvious that further testing offers little benefit compared with finding out how the system survives in the wild. Number two; it is easier to make the decision to release a system when delaying the release to permit further testing is not likely to put you in any better position than you are already in. The interplay of these two propositions is discussed below.

For a number of years I was part of the programme leadership team that governed the development and release of a very large and very critical telecoms OSS system. This system was so large, so complex and so important that release decisions were never simple. We would spend a lot of time converging on a good deployment position; one that realised the maximum benefits from the release whilst containing the risks.

As you might expect sometimes making a decision was hard; things were not clear and it could go either way. These decisions often involved long debates based on uncertain information. We found that ways of thinking evolved that made decisions easier. One of the most powerful tools that evolved was a very simple question – “If we delay the release another two weeks and carry on testing then will we be in any better position to make a decision?”.

When the answer to that question was “no” we knew it was time to take a deep breath go for it and deal with any consequences that arose (and we became quite effective at dealing with those occasions when there were consequences you would not want to experience). This question worked well in that environment because the cost of not deploying was high; it was a high intensity delivery environment with a heavy emphasis on deploying and moving onto the next release. That said the question is a tool that can be used in many environments.

Returning to test managers and to their aims. If a key part of the decision to release a system is a question of the form “Can any more testing be of benefit?” then test managers should plan to get to a position where the answer would be “No” as soon as possible and to manage execution to achieve this answer as soon as possible. In doing this they accelerate delivery of the system. The sooner the answer can be “more testing is a waste of time” the sooner the benefits of the system will be seen.

Epilogue

Just to be clear. It is very easy to get the answer “more testing is a waste of time” if testing is simplistic and ineffective testing or worse is simplistic and ineffective testing executed ineffectively. This approach is not recommended. Rather do well thought out highly effective testing and do it quickly. You and your colleagues on the development side should hold similar opinions as to when the optimum point has been reached. If there is a caveat that goes something like “but we would spend more time testing if the testing were better” then there is some need for improvement.

Sunday, 7 November 2010

Testing the discipline that lives in Flatland

Flatland: A Romance of Many Dimensions is a novella set in world whose citizens are only aware of two dimensions; the third one is a secret. After many years of observing the way that organisations approach software testing I have an ever strengthening belief that testing is hindered by a failure to recognise dimensions along which layered approaches should be used. Testing is a discipline where anonymous uniform interchangeable tests exist and managers think in two dimensions these being effort and schedule. These Flatland style limitations leads to testing that is both ineffective and inefficient,

So after that philosophical introduction what am I really getting at. There are a number of things about the way testing is generally approached, resourced and executed that lack a layered approach (layering denoting a dimension) and that suffer as a result. In this post I will describe the main ones that are repeatedly found in organisation we work with. Later I hope to make time to explore each in more detail. The four recurring themes are:

People. There are testers and well there are testers; that is it. Compare this with enterprise level development organisations where we see architects, lead end-to-end designers, platform architects, platform designers, lead developers and developers. This is not necessarily anything to do with the line or task management structures; this is people with different levels of skill and experience who are matched to the different challenges to be faced when delivering the work. Compare again testing where organisations generally think in terms of a flat interchangeable population of testers. A source of problems or not; what do you think?
Single step test set creation. At one point there is nothing other than a need to have some tests, usually to have them ready very quickly, then there are several hundred test cases often described as a sequence of activities to be executed. Any idea how we got from A to B; any idea whether B is anywhere near the right place never mind whether it is optimal; any chance of figuring it out retrospectively? No; not a chance. Its like starting off with a high level wish for a system and coding like mad for two weeks and expecting to get something of value (actually come to think of it isn't there something called Agile...). Seriously an effective test set is a complex optimised construct; complex constructs generally do not get to be coherent and optimised without a layered process of decomposition and design. In most places test set design lacks any layered systematic approach and has no transparency; it depends on the ability and the on the day performance of the individual tester. Then once it is done it is done; you can't review and inspect quality into something that is not in the right place to start off with.
Tiers of testing. Many places and projects have separate testing activities; for example system testing, end-to-end testing, customer experience testing, business testing and acceptance testing. How often is the theoretical distinction clear; how often does the reality match the theory? Take a look and in many cases you will see that the tests are all similar in style and coverage. There is a tendency to converge on testing that the system does what it says it does and to do this in the areas and ways that are easy to test. This can lead to a drive to merge the testing into one homogenous mass to save time and cost; given that the tests had already become indistinguishable it is drive that it is hard to resist. Distinct tiered testing has a high value but the lack of clear recognition of what makes the tiers different is the start of the road to failure.
The focus of tests. When you see a test can you tell what sort of errors it is trying to find? Is it designed to find reliability problems, to ensure user anomalies are handled, to ensure a user always knows what is going on or to check that a sale is reflected correctly in the accounting system? A different focus requires a different type of test. Yet generally there are just tests and more tests. No concept of a specific focus for a particular group of tests, little concept of different types of test to serve different purposes. Testers lack clear guidance on what the tests they are designing need to do and so produce generic tests that deliver generic test results.

These four themes demonstrate a common lack of sophistication in the way that testing is approached. A view of testing as set of uniform activities to be exercised by standardised people in a single step process is the downfall of many testing activities. It is a Flatland approach and testing practices need to invade and spread out along these other dimensions for testing to become more effective and valued. Hopefully I will be able to provide some ideas on how to escape from Flatland at a later date.