Friday, 30 March 2012

Skimming the Surface

We have just hit something that occurs time and time again; that is testing the interface without looking at what is going on below the surface.  It is often in the depths that the really important stuff occurs and yet this is where we often find no one is looking.

The source of this frustation was a set of user account maintenance tests.  The first one was a test that deleted a users account.  It had steps to do so and implicit check that the system said it was doing what you asked.  Quite how good a test it was of the request functionality I am not really sure and not really concerned about.  What is concerning is the rather fundamental fact that there was no follow-up activity to check that the account was actually deleted.  Fine the system (hopefully) said is was doing what was asked of it but not checking that it had actually gone is a fundamental flaw in the test approach.

The issue highlights the need for a systematic test design practice and this in turn demands professional test designers and an appropriate design method.  Adhoc - just write down the steps and annotate with the expected results - test writing is one of the reasons tests come out like this.

Saturday, 19 February 2011

Requirements can work at many levels; so decide in advance which levels are required.

Despite the trend in software development that thinks of requirements as old hat many people still do and always will work with requirements. This fact makes the observations that follow relevant and likely to remain of relevance.

An interesting thing about requirements is that people generally think about them as a single homogenous layer. Conceptually in people's minds there is not much distinguishing one level of requirements from another. Requirements sit in a single layer above which there is not much whilst below them sits either functional definition or a direct entry into design and implementation. It is rare to see a clear recognition on the widely different breaths, domains and concerns that different requirements can address. Hence almost inevitably there is little thought given to what form of requirements are appropriate and how they might be partitioned. The consequence of this is often a muddled set of requirements; some addressing high level concepts and some minute system details. To illiterate this take a look at two examples:
  • Requirement A - Operational practice shall ensure no information associated with transactions processed for a client organisation nor knowledge of the existence of individual transaction being handled for that organisations is available to the staff of any other client organisation.
  • Requirement B - Users of the system only have visibility of transactions owned by operational entities that their user account is mapped into.
Both are driven by the same underlying need for confidentiality. The first is a true business practice requirement. It can be used to judge the operation of a system, a family of systems, how communication and reporting must work and what people can see when they walk into a room. It is agnostic to the nature of any system used in the activities of the operation. The second is really a system requirement. Furthermore it is a dependent system requirement that only makes sense in the context of some decisions that have already been made about how the system will work.

It is fair to say that if someone were writing a requirement set for this space then it is possible that they might address this topic in Requirement-A style, in Requirement-B style or might include both. Both have validity and it is not a problem to have requirement at either end of the spectrum or at points in between. However when they are all mixed in together then this leads to difficulties maintaining and working from requirements.

It is better to explicitly separate out the different layers of requirements, to handle each layer separately and where requirements are found in the wrong place to move them to the correct place. This many sound like lots of extra work but it is not, rather it is being organised and disciplined; the reward for being so is more efficient and effective communication of critical knowledge across teams.

A fairly generic, though not universal, layering would be:
  • Business Operational Performance Requirements
  • Business Operational Practice Requirements.
  • Business Practice and Process Requirements
  • Intrinsic System Requirements.
  • Dependent System Requirement.
Organising requirements around this or some similar layering model provides for better more maintainable requirements. It also allows, well actually encourages, staggering of requirements generation and eliminates the waterfall bias of a monolithic requirements view. This layering model allows good requirements engineering practices to be integrated into modern iterative / incremental delivery practices.

Tuesday, 18 January 2011

Integration Strikes Again

When I get into the client's office this morning I have to do a conference call to look at how well the integration of two of their systems (via a third) is going.  This follows a call last Thursday and there is one already scheduled for tomorrow.  The original plan was to hand this over to the unit I am setting up at the end of December to be tested.  I had that plan changed in early December to one where the construction team would actually Integrate it (see article here) before handing it over for test. We also defined a set of basic integration tests we wanted the construction team to demonstrate at the time of handover.  Four weeks were allowed for the integration prior to the handover.

We are just over half way through the Integration period.  The score at the end of yesterday was zero of the demonstration tests attempted, one type of message and item of data successfully sent from system A to system B and no messsages yet sent from B to A.  The alarm bells are ringing and hence the start of a series of frequent calls to recover things.

Why has this happend again?  All of the ususal causes alluded to in the article or in the more detailed analysis given in the notes here which also explain the concepts of a moe effective Integration approach.  The way this exercises was likely to go was forseen and forewarned.  The re-plan that gave the construction team four more weeks to get it working has helped, it has avoided disruption and waste from trying to test something that simply does not work, but we have yet to see how long it will actually take get to the point where the basic integration tests planned for the handover pass.

Lets see what this morning call brings.

Tuesday, 28 December 2010

Well has the test failed or hasn’t it?

When should you classify a test as Failed?  This sounds such a simple question and you may think the answer is obvious; however there are some factors that mean a well thought out approach can have significant benefits to the test manager.

Introduction

Generally one of the states used in test reporting is Failed.  A common assumption, one that is generally sound, is that failed tests mean you have problems.  Given typical practice a less well founded extension of this goes that failed tests indicate the system has problems doing the things that those tests were testing. Years of attempting to understand what is really going on inside projects show that this is the point at which the complexity of the real world overwhelms the abstract model of tests and test failures.

Think about the simple abstract model.  Tests have actions and, hopefully, expected outcomes.  If the action is done correctly and the outcome does not match the expectation then the test has Failed. Simple or what?  This model is applied on a regular basis all over the world so what is the issue?  Issues come in many forms and will be illustrated here using three examples.

Example One - Environmental Problem

Our system allows external users to submit transactions through a web-portal.  There is a change to the way these submissions are to be presented to internal users on the backend system.  If the submission has an attachment this is flagged to the user.  One type of transaction has three modes; two tests are passed and the third is failed.  Over a number of days a common understanding across both the test and development team builds up that the change works for two of the three modes and does not work for the third.  Only when we dig into the detail to decide whether to release with the issue or not do we discover that transactions for the third mode fail to submit at the portal.  No on had managed to get this transaction in; the handling of it in the backend had not been tried.

The real problem was a test environment configuration issue that derailed this test. The test was marked as Failed and the story began to develop that the third mode did not work.  This test had not Failed it was blocked and unable to progress and discharge its purpose.

Example Two - Incorrect Search Results

To test that billing accurately consolidates associated accounts these associations have to be created and then the accounts billed. To associate accounts one account is selected as the master and then a search facility is used to obtain the list of accounts that can be associated; selections are then made from the list.  After this billing can be tested.  When the search is done it returns the wrong accounts and association attempts fail.  Has the test failed?

If the test is classified as failed this tends to (well should) indicate that when you bill associated accounts then the bill is wrong.  So marking tests like this as failed sends the wrong message.  The test can't be completed and a fault has been observed and can't be ignored, but this fault is not to do with the thing being tested.

Example Three - Missing Input Box

A test navigates through a sequence of common HCI areas.  On one page it is observed that one of the expected input boxes is missing.  This doesn't bother us as the test doesn't use it.  Everything works well for the test.  Has it Passed?

The most meaningful outcome for the test is that it Passed; but then that leaves the defect that was observed floating around so shouldn't it be marked as failed to ensure it is re-tested?

An Alternative model of Failure.

Those were just three examples. There are many similar variations; so what rules should be used to decide whether to claim Failure?  Generally a test should have a purpose and should include explicit checks that assess whether the thing tested by that purpose has or has not worked correctly.  An expected result after an action may be such a check; alternatively a check may require more complex collection and analysis of data.  Checks should relate to the purpose of the test.  Only if a check is found to be false should the test be marked as Failed.  If all the checks are ok then the test is not Failed even if it reveals a defect.

The role of Expected Results

So are all expected results checks?  Often there are expected results at every step; from logging in through navigation to finally leaving the system.  Given this the position is a very very strong no.  Many expected results in tests serve a utility purpose.  They verify some step has been done as required; they often say little about the thing the test is actually needed to prove.  If you don't get the expected result then it means there is a problem some where; a problem with the test, with the way it is executed or with the system; however it does not necessarily mean that there is a problem with the thing being tested. Only when there is a definite problem with that should the test claim to be a Failure.

Orphaned Defects

That leaves defects that are triggered when running tests but that don't mean the test has Failed.  We could end up with no tests Failed, perhaps even all Passed, and a stack of defects; this is counter intuitive so what is going on?  Actually the discipline of refusing to fail tests unless an explicit check fails provides very useful feedback.The statistical discrepancy can indicate:

(a) That the tests do not have adequate checks; they are revealing errors in the thing being tested that can be seen but nothing in the test itself says check for that.  Time to improve the test and then mark it as Failed. Improving the test is required to make the defect detection delivered by the tests consistent; we should only depend on explicitly defined error detection.

(b) That we are finding errors in things that are not being tested as no test is failing as a result of the defect.  For control purposes add tests that do Fail because of the defects.  Also is this indicating a major hole in regression or testing of the changes?  If so is action required?

(c) That there are environmental problems disrupting test activities.

Conclusion

Adopting an approach that governs, actually restricts, when a test can be marked as Failed to circumstances where an explicit check has shown an issue provides more precise status on the system and improved feedback on the quality of the testing.  Furthermore this reduces the discrepancy between the picture painted by test results and the actual state of the release and the management time required to resolve this.

Wednesday, 15 December 2010

Maintaining Focus

If you want testing to be effective and want it to be manageable in the wider sense of the word (understood by others, amenable to peer and expert review and controllable) then everything has to be focussed.  Each constituent part of the effort needs a clear purpose and this has to extend down to quite a fine grained level.   Macros level building blocks such as Functional Test, Performance Test and Deployment Test don’t do it.  What is required is to break the work into a set of well defined heterogeneous testing tasks each one focussing on certain risks.

This approach originated when myself and a guy called Stuart Gent were working through the challenge of shaping and scoping the testing for a major telecommunications system programme.  We had a team of twelve analysts simply trying to understand what need to be tested.  We had already divided the work into twelve workstreams but recognised we needed something more.  We also had the experience of not using an adequate analysis approach on preceding releases of the system. These were far smaller and less complex than this one but we had learnt the dangers of inadequate independent analysis, of tending to follow the development centric requirements, specifications and designs, of testing what was highlighted by these and of missing less obvious but vitally important aspects.

Out of this challenge the concept of Focus Area based test management emerged.  The name isn’t ideal but it services it purposes.    The fundamental approach is that test activity should be divided up into a number of packages each being a Focus Area.  Each has a tight well defined remit.  There can be quite a few Focus Areas on large projects we are not talking about single digits; inventories exceeding a hundred, possibly approaching two, have been known.

A key thing is that a focus area is coherent and people can understand what it aims to cover and what it does not cover.  This enables far clearer assessment of whether a group of tests is adequate; because the focus is clear it is a tractable intellectual challenge to judge whether the tests do the job; divide and conquer.  Looking from the other end of the telescope how well are the overall risks of the system covered? If you have one thousand test cases with no way of telling what they really do, other than reading them, then you haven’t got a chance of finding the gaps.  If you have forty three well defined Focus Areas around which the tests are structured then you are in a much better shape.

What makes up a Focus Area definition?  This is something that flexes and depends on how formal you want to be but there are some basic things that should always be present:
(a)     The aspects of the system’s behaviour to be covered.
(b)     Distinct from this the conditions and scenarios that behaviour is being exercised under.
(c)     The sorts of malfunctions in this behaviour that we are trying to make sure aren’t there or at least that we need to catch before they get into the wild.
(d)     Any particular threats to be exercised.
(e)     Whether we are after hard faults or ones that don’t always manifest themselves even when the things we are doing to try and make a fault happen appear the same.

Look at how this works.  If you don’t apply a Focus Area approach and ask a team to create tests for some system then what is it that you are actually doing?  Well putting this situation into our basic Focus Area form you are saying:

“(a) Test all aspects of the system’s behaviour. (b) Do this under arbitrary conditions and usage scenarios.  (c) Whilst you are at it look for anything that could possibly go wrong.  (d) We aren’t telling you what particular things have a high probability of breaking it. (e) We are not highlighting whether things that may manifest themselves as reliability issues need to be caught.”

That is a lot of ground to cover both in area and types of terrain.  Thinking will be difficult as there are lots of different concerns all mixed in together.  Our experience is that you will tend to get homogenous testing using a small number of patterns that focuses on primary behaviour.  Much of the terrain will not get tackled; particularly the stuff that is harder to traverses.  Also, as discussed above, it is very difficult to review a set of tests covering such wide concerns and when you do you will probably find gaps all over the place.

Alternatively perhaps experienced people should define a number of Focus Areas to shape the work.  An example high level brief for a focus area might be:

“(a) Test the generation of keep the customer informed messages sent to the customer during order handling. (b) Test this for straightforward orders and for orders that the customer amends or cancels (don’t cover internal order fulfilment situations as they are covered elsewhere).  (c) Testing should check for the occurrence of the message and the accuracy of the dynamic content of the message.  Testing should check for spurious messages.  Static content and presentation need not be checked.  The latency of the message issue mechanism is outside the scope of this package. (d) Particular concerns are orders for multiple products and orders where the customer has amended contact rules after placing the order.  The impact of load on operation is outside the scope of this package.  (e) It is accepted that this package should provide reliable detection of consistent failures and will not be implemented to detect issues that manifest themselves as reliability failures.”

A definition likes this helps to focus the mind of the test designers; it should help to shape the pattern of testing so as to most effectively cover the ground.  It should ensure there are fewer gaps around its target and it should make reviewing more effective.  The overall set of well thought out focus areas allows the Test Architect to shape the overall coverage delivered by the testing exercise.

Personally I would never consider even reviewing a set of tests without first having my Focus Areas to hand.

Friday, 3 December 2010

The return of an old friend.

I have just encountered an old friend of mine; one that I see most places I go.  My friend is that recurring defect - the different date format bug.  In its most common and insidious form it is a mix of DD/MM/YYYY and MM/DD/YYYY representations of dates as strings.  Date format clashes of any sort cause defects but this is the worst ones because for many cases it appears to work waiting to create problems in future or corrupting data that passes through it.

How come by appearing to work for certain days it manages to slip through the net?  Dates presented in the DD/MM/YYYY format up to the 12th of the month will happily get converted into meaningful, though incorrect, dates by something that is looking for MM/DD/YYY.  So the 11th of October 2010 starts of in the first format as 11/10/2010 and then gets analysed by something looking for the MM/DD/YYYY and is interpreted as the 10th of November 2010.  If this is simply validation then the data entered is let through and no one is the wiser; but wait until the 13th.  However if the outcome of the incorrect interpretation of the date is stored in this form then we get the wrong date passed on for further processing.

Generally the presence of the issue can only be revealed when values of the day in the month part of the date that are greater than twelve are used.  For example the 13th of October 2010 in the first format is 13/10/2010.  If you look at it as being in the form of MM/DD/YYYY then we have MM=13 which is obviously, at least to the human brain, invalid.  I caveat the last point because though in many cases presenting this date will trigger some behaviour that reveals the fault it cannot always be guaranteed that this will be the case.

Why this post? It is because seeing the same problem again today has reminded me that this problem is like the common cold; it is all around us and is not going to go away.  Despite all the progress in software engineering technology none of it seems to tackle this type of issue.  Perhaps it is deemed to be too unimportant to worry about and deal with. After all once found it is an 'easy fix'. Actually it may be quick to change but the change often has the potential for massive downstream ramifications.  So perhaps not tackling this is a mistake; I would say so given the many developer hours I have watched being burnt on figuring out what is going on and the million pound per week project I saw extended by weeks through a myriad of issues of this sort.

What can testers do to help in this area?  Well they can start by remembering to test every date value and every date input control with dates that have their day part greater than twelve.  Keep a short list of key dates to use and make certain their use is comprehensive.  Thirteen may turn out to be your lucky number.

Friday, 26 November 2010

Integration; the puzzle at the heart of the project.

We have recently started working with a new client on changes to their testing and delivery practice. The aims is to increase the throughput of development and at the same time accelerate delivery and maintain quality.  This has been running for a few weeks now and enough time has elapsed for us to start hearing stories about previous projects and what went well and what was problematic.

Today we had a planning session for a project that involves the connection and interoperation of two systems.  In this session it became clear that their experiences of this type of endeavour were very similar to ones we have seen elsewhere.  Connecting systems is always more complex than expected, there is always lots of stuff that is not adequately prepared, lots of things that go wrong and it always takes far longer than anyone thought.

On the plus side it was reassuring to hear their head of development recounting similar experiences and holding a similar position to my own on how what has to be done next time if there is to be any chance of avoiding the same fate.  There was a common understanding of the need for someone being accountability for getting things to work.  There was similar alignment over the need to use virtual teams, the importance of preparation, the risk from environmental problems and the need for hands on technical capability and determination.

It was some years ago that we identified integration as one of the number one issues affecting projects both large and small.  A distinguishing aspect of our thinking is the major distinction we make between the act of getting it working and the act of testing whether it is working,  We always try and get clients to think of the discipline of Integration (see Integration Papers ) as something that stands apart from testing; even from testing called Integration Testing.