Testing Insights: November 2010

Friday 26 November 2010

Integration; the puzzle at the heart of the project.

We have recently started working with a new client on changes to their testing and delivery practice. The aims is to increase the throughput of development and at the same time accelerate delivery and maintain quality. This has been running for a few weeks now and enough time has elapsed for us to start hearing stories about previous projects and what went well and what was problematic.

Today we had a planning session for a project that involves the connection and interoperation of two systems. In this session it became clear that their experiences of this type of endeavour were very similar to ones we have seen elsewhere. Connecting systems is always more complex than expected, there is always lots of stuff that is not adequately prepared, lots of things that go wrong and it always takes far longer than anyone thought.

On the plus side it was reassuring to hear their head of development recounting similar experiences and holding a similar position to my own on how what has to be done next time if there is to be any chance of avoiding the same fate. There was a common understanding of the need for someone being accountability for getting things to work. There was similar alignment over the need to use virtual teams, the importance of preparation, the risk from environmental problems and the need for hands on technical capability and determination.

It was some years ago that we identified integration as one of the number one issues affecting projects both large and small. A distinguishing aspect of our thinking is the major distinction we make between the act of getting it working and the act of testing whether it is working, We always try and get clients to think of the discipline of Integration (see Integration Papers ) as something that stands apart from testing; even from testing called Integration Testing.

Saturday 20 November 2010

Testing is easy; isn’t it?

I heard a comment recently; it went something along the lines of “if they can’t deliver testing to us then they won’t be able to do anything”. Was I surprised to hear this coming from a senior test manager? Well actually no; I wasn’t surprised. It illustrates that even people with many years in senior testing posts can fail to understand what first class testing is, how different it is from run of the mill work and how complex and difficult it is to do first class testing well and at speed. This was not the first time I have come across this view and I doubt it will be the last.

Perhaps one day there will be a more general recognition of the downside of viewing testing as something that can always be done on the cheap and as one of the easiest things to give to the lowest bidder. Until that day it seems it will always be testing that is the first target for cost cutting. However I think I may have a very long wait for any change of attitude. After all if senior test managers hold the view that testing is far easier to do than development then what chance is there of a change in the wider development space; never mind in the views of finance and procurement teams.

Tuesday 16 November 2010

To release or not to release, that is the question.

Here are two interesting propositions. Number one; test managers should focus on getting as quickly as possible to a state where it is obvious that further testing offers little benefit compared with finding out how the system survives in the wild. Number two; it is easier to make the decision to release a system when delaying the release to permit further testing is not likely to put you in any better position than you are already in. The interplay of these two propositions is discussed below.

For a number of years I was part of the programme leadership team that governed the development and release of a very large and very critical telecoms OSS system. This system was so large, so complex and so important that release decisions were never simple. We would spend a lot of time converging on a good deployment position; one that realised the maximum benefits from the release whilst containing the risks.

As you might expect sometimes making a decision was hard; things were not clear and it could go either way. These decisions often involved long debates based on uncertain information. We found that ways of thinking evolved that made decisions easier. One of the most powerful tools that evolved was a very simple question – “If we delay the release another two weeks and carry on testing then will we be in any better position to make a decision?”.

When the answer to that question was “no” we knew it was time to take a deep breath go for it and deal with any consequences that arose (and we became quite effective at dealing with those occasions when there were consequences you would not want to experience). This question worked well in that environment because the cost of not deploying was high; it was a high intensity delivery environment with a heavy emphasis on deploying and moving onto the next release. That said the question is a tool that can be used in many environments.

Returning to test managers and to their aims. If a key part of the decision to release a system is a question of the form “Can any more testing be of benefit?” then test managers should plan to get to a position where the answer would be “No” as soon as possible and to manage execution to achieve this answer as soon as possible. In doing this they accelerate delivery of the system. The sooner the answer can be “more testing is a waste of time” the sooner the benefits of the system will be seen.

Epilogue

Just to be clear. It is very easy to get the answer “more testing is a waste of time” if testing is simplistic and ineffective testing or worse is simplistic and ineffective testing executed ineffectively. This approach is not recommended. Rather do well thought out highly effective testing and do it quickly. You and your colleagues on the development side should hold similar opinions as to when the optimum point has been reached. If there is a caveat that goes something like “but we would spend more time testing if the testing were better” then there is some need for improvement.

Saturday 13 November 2010

Performance by request.

After doing a fair bit of performance testing and troubleshooting we have seen the effects of performance only receiving attention at the end of the project. We encounter teams making herculean efforts to ring acceptable performance out of systems; we encounter systems that do not reach and never will reach acceptable levels; we encounter cancellations.

Few organisations spend much time and effort worrying about performance at the start of a project. Many spending an awful lot of time and money at the end dealing with the consequences. This pattern is not limited to naive first offenders; there are major organisations, ones that most people would expect to have sophisticated performance risk controls, that fall foul of this problem. It would be safe to say that, in general, the software industry doesn’t do performance engineering it does performance mend and make do.

What makes this madness is that simple techniques can make things a lot better; there is no need to turn to rocket science. These techniques may not be up to delivering the performance certainty required by an air traffic control system but they can certainly reduce risk for your average web application. Some thought and a little effort can provide a major reduction in performance risk. The first trick is to ask for what you want.

Ask and you might receive.

This may sound obvious but if it is so obvious then why is it not done? The people who need the system have to ask the people supplying the system to deliver a certain level of performance. Once that has been done you can look them in the eyes and let them try and provide evidence that will convince you that this will be achieved. This is founded on the adage "if you don’t ask then you don’t get". When you think about it if you don’t ask for something then what is the chance you will get it?

For this to work well two things are necessary. Firstly the people doing the asking have to understand what they need and have to express it in an organised way. Secondly they have to be sensible and avoid asking for the impossible; if you do you won’t get it and you won’t be taken seriously so you may end up with something worse than you could have had.

How to describe what you need.

Has anyone seen a performance requirement of the form "all response times must be less than 3 seconds". How much difference do you think that makes to the way developers approach the implementation of individual features. Not a jot; it has no real influence on the end game what so ever. How can this be done better? Three techniques provide the right framework.

(1) Recognise that the amount of time a user can wait for a response without it becoming a usability or throughput issue depends upon what the user is doing and what they are waiting for. Reflect these different needs as separate performance requirements with different and appropriate targets for each. Differentiation of types of responses is essential.

(2) Accept that, generally, real systems go slower when busy. With no one on it may be lighting fast; on a normal day it may be quick; during the busiest period of the year ir will almost inevitably be slower. Think about the different loads it will be used under and set distinct targets for each one. The limits may be close or it may be that at your busiest time you relax them; which ever it is good to be explicit about it.

This discipline avoids there being a covert interpretation that your targets are for ‘normal’ load conditions and an unstated assumption that in more extreme periods slower responses are acceptable. Also it can point architectural design in the right direction. Trade offs become possible; particularly when some aspects must remain constant under all conditions whilst some can slow down under heavier loads.

(3) Don’t use a simple limit; this can have strange side effects. You might pick the number that reflects the speed you want in the vast majority of cases but specify it as a maximum. Its origins are likely to mean that it is too challenging to achieve in all cases; if this is glaringly obvious the requirement is discredited. Alternatively you might pick the worst acceptable duration; now you have not constrained the middle ground; suppose they all come in around this limit. Targets should be percentile distributions; not single upper limits nor single percentile limits.

In summary identify things or classes of things with different response requirements, have distinct targets for different periods and use percentile distribution profiles to define each target.

Remaining realistic.

The second trick is to ask for things that you stand a chance of getting. Base you requirements on what the sort of technology you are mandating or are willing to pay for is able to deliver. Web technology has its strengths but does not deliver 95% interface updates for zone selection in under 0.5 seconds under any circumstances. The targets set have to be achievable or they will be ignored.

Reflect on what is plausible given the technology and the environment it is used in. What does this mean if you have an activities that must complete in a time that is unrealistic? It means you have to step back and reassess the concept. Redesign the interaction and the task structure to reduce the time criticality. Alternatively ask have we choosen the right technology?

Targets have to be achievable; unachievable ones will either be ignored or will consume lots of resources and attention and then fail. Where the tasking and interaction design mandates targets that cannot be met you need to redesign or reassess the technology options.

Concluding

One of the biggest mistakes possible is to fail to put enough thought and care into specifying performance requirements. If you don’t ask or if your request is nonsense then you risk getting something far removed from what you need. When you do decide to ask properly you have to really understand your need, define it in the right structure and ensure that what you ask for is possible. Once this framework is in place developers have something to work to and you have a firm basis for performance assurance activities.

Sunday 7 November 2010

Testing the discipline that lives in Flatland

Flatland: A Romance of Many Dimensions is a novella set in world whose citizens are only aware of two dimensions; the third one is a secret. After many years of observing the way that organisations approach software testing I have an ever strengthening belief that testing is hindered by a failure to recognise dimensions along which layered approaches should be used. Testing is a discipline where anonymous uniform interchangeable tests exist and managers think in two dimensions these being effort and schedule. These Flatland style limitations leads to testing that is both ineffective and inefficient,

So after that philosophical introduction what am I really getting at. There are a number of things about the way testing is generally approached, resourced and executed that lack a layered approach (layering denoting a dimension) and that suffer as a result. In this post I will describe the main ones that are repeatedly found in organisation we work with. Later I hope to make time to explore each in more detail. The four recurring themes are:

People. There are testers and well there are testers; that is it. Compare this with enterprise level development organisations where we see architects, lead end-to-end designers, platform architects, platform designers, lead developers and developers. This is not necessarily anything to do with the line or task management structures; this is people with different levels of skill and experience who are matched to the different challenges to be faced when delivering the work. Compare again testing where organisations generally think in terms of a flat interchangeable population of testers. A source of problems or not; what do you think?
Single step test set creation. At one point there is nothing other than a need to have some tests, usually to have them ready very quickly, then there are several hundred test cases often described as a sequence of activities to be executed. Any idea how we got from A to B; any idea whether B is anywhere near the right place never mind whether it is optimal; any chance of figuring it out retrospectively? No; not a chance. Its like starting off with a high level wish for a system and coding like mad for two weeks and expecting to get something of value (actually come to think of it isn't there something called Agile...). Seriously an effective test set is a complex optimised construct; complex constructs generally do not get to be coherent and optimised without a layered process of decomposition and design. In most places test set design lacks any layered systematic approach and has no transparency; it depends on the ability and the on the day performance of the individual tester. Then once it is done it is done; you can't review and inspect quality into something that is not in the right place to start off with.
Tiers of testing. Many places and projects have separate testing activities; for example system testing, end-to-end testing, customer experience testing, business testing and acceptance testing. How often is the theoretical distinction clear; how often does the reality match the theory? Take a look and in many cases you will see that the tests are all similar in style and coverage. There is a tendency to converge on testing that the system does what it says it does and to do this in the areas and ways that are easy to test. This can lead to a drive to merge the testing into one homogenous mass to save time and cost; given that the tests had already become indistinguishable it is drive that it is hard to resist. Distinct tiered testing has a high value but the lack of clear recognition of what makes the tiers different is the start of the road to failure.
The focus of tests. When you see a test can you tell what sort of errors it is trying to find? Is it designed to find reliability problems, to ensure user anomalies are handled, to ensure a user always knows what is going on or to check that a sale is reflected correctly in the accounting system? A different focus requires a different type of test. Yet generally there are just tests and more tests. No concept of a specific focus for a particular group of tests, little concept of different types of test to serve different purposes. Testers lack clear guidance on what the tests they are designing need to do and so produce generic tests that deliver generic test results.

These four themes demonstrate a common lack of sophistication in the way that testing is approached. A view of testing as set of uniform activities to be exercised by standardised people in a single step process is the downfall of many testing activities. It is a Flatland approach and testing practices need to invade and spread out along these other dimensions for testing to become more effective and valued. Hopefully I will be able to provide some ideas on how to escape from Flatland at a later date.