EXPLORING UNCERTAINTY !!!: 2010

What is API?

An API (Application Programming Interface) is a collection of software functions and procedures,

called API calls, that can be executed by other software applications.

What is API Testing?

API testing is mostly used for the system which has collection of API that needs to be tested. The

system could be system software, application software or libraries.

API testing is different from other testing types as GUI is rarely involved in API Testing. Even if GUI

is not involved in API testing, you still need to setup initial environment, invoke API with required

set of parameters and then finally analyze the result.

Setting initial environment become complex because GUI is not involved. In case of API, you need

to have some way to make sure that system is ready for testing.

This can be divided further in test environment setup and application setup. Things like database should be configured, server should be started are related to test environment setup. On the other hand object should be created before calling non static member of the class falls under application specific setup.

Initial condition in API testing also involves creating conditions under which API will be called. Probably, API can be called directly or it can be called because of some event or in response of some exception.

Test Cases for API Testing:

The test cases on API testing are based on the output.

• Return value based on input condition

Relatively simple to test as input can be defined and results can be validated.

Example: It is very easy to write test cases for int add(int a, int b) kind of API. You can pass

different combinations of int a and int b and can validate these against known results.

• Does not return anything

Behavior of API on the system to be checked when there is no return value.

Example: A test case to delete(ListElement) function will probably require to validate size of

the list or absence of list element in the list.

• Trigger some other API/event/interrupt

The output of an API if triggers some event or raises some interrupt, then those

events and interrupt listeners should be tracked. The test suite should call

appropriate API and declarations should be on the interrupts and listener.

• Update data structure

This category is also similar to the API category which does not return anything. Updating data structure will have some effect on the system and that should be validated.

• Modify certain resources

If API call is modifies some resources, for example makes update on some database, changes registry, kills some processes etc, then it should be validated by accessing the respective resources.

API Testing Approach

An approach to test the Product that contains an API.

Step I: Understand that API Testing is a testing activity that requires some coding and is usually

beyond the scope of what developers are expected to do. Testing team should own this activity.

Step II: Traditional testing techniques such as equivalence classes and boundary analysis are also

applicable to API Testing, so even if you are not too comfortable with coding, you can still design

good API tests.

Step III: It is almost impossible to test all possible scenarios that are possible to use with your

API. Hence, focus on the most likely scenarios, and also apply techniques like Soap Opera Testing

and Forced Error Testing using different data types and size to maximize the test coverage.

Main Challenges of API Testing can be divided into following categories.

• Parameter Selection

• Parameter combination

• Call sequencing

API Framework

The framework is more or less self-explanatory. The purpose of the config file is to hold all the configurable components and their values for a particular test run. As a follow through, the automated test cases should be represented in a ‘parse-able’ format in the config file. The script should be highly ‘configurable’.

In the case of API Testing, it is not necessary to test every API in every test run ( the number of API’s that are tested will lessen as testing progresses). Hence the config file should have sections which detail which all API’s are “activated” for the particular run. Based on this, the test cases should be picked up.

In testing web applications with JMeter, I will mainly write about running the test plans, recording the results and interpreting them.

When do I stop ?

One of the main questions you have to ask yourself when you start stress testing a web application is: when do I stop? This question is not as easy a question as it seems, the response depends on your initial objectives and on “scientific” criteria allowing you to decide when you have met the initial objectives. Eventually, it comes down to measuring and interpreting the “results” of your stress tests.

Before going any further, we should spend some time on the measurable outcomes of a stress test. There are mainly 2 interesting measures that you can record when you run a stress test on a web application:

The throughput: is the number of requests per unit of time (seconds, minutes, hours) that are sent to your server during the test.
The response time: is the elapsed time from the moment when a given request is sent to the server until the moment when the last bit of information has returned to the client

The throughput is the real load processed by your server during a run but it does not tell you anything about the performance of your server during this same run. This is the reason why you need both measures in order to get a real idea about your server’s performance during a run. The response time tells you how fast your server is handling a given load.

We are now much closer to find an answer to our initial question: you can stop stress testing your application when for a measured throughput the measured response time is “too high”. This is the right answer in an ideal world where information systems behave in a deterministic manner … another way to answer our question could also be: you can stop stress testing your application when your system crashes / collapses / starts to behave unexpectedly …

However, I will stick to our first answer for a while as it contains another interesting question: what is a “high” response time for a web application (or any application or information system used by real people)? To make it short, based on usability studies it is possible to define response time limits where the user interaction with an information system radically changes. These limits are tightly related with the nature of the human being: psychology as well as brain performance

0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.

1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.

10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

Using these limits allows us to give a precise end point to the stress tests of a system; it helps us define in collaboration with our client (or users) what is an acceptable response time. For example, the last time I made stress tests for a client, we agreed that the acceptable upper limit of the response times for his system was 7 seconds: he wanted to know how many concurrent users his system would handle.

The remaining problem now is how to measure / estimate the throughput and response timesof our system using JMeter: some simple statistics and mathematics are needed here.

Run your test plan and record the meaningful measures …

First of all, JMeter provides us with several different “listeners” allowing to record these 2 variables in various ways (graphics, tables, trees, files). I would say that most of these “listeners” are useless or to put it in a different way, one of them is a must have in order to do have all the necessary information in hand: the Summary Report.

In order to understand this report and to implement scenarios efficiently we must keep the following things in mind:

JMeter records response times and throughput for each “sampler” of each “thread group” defined in your test plan.
In the Summary Report, one line is displayed for each different “sampler” based on thesampler’s names: you can group or differentiate samplers in the report just by playing with their names.
Each “sampler” is executed many times: the Summary Report provides us with meanvalues (and standard deviations) for the throughput and response times of each named “sampler”.
Global values (mean and standard deviation) for throughput and response times are also calculated in the Summary Report.
The Summary Report allows you to store the measures of each run in a “csv” file: you can thus analyse and interpret the results in a spreadsheet program.

Other reports are also useful particularly at the beginning when building and testing your scenarios:

The View Results Tree is very handy when “debugging” a scenario as it allows to monitor all the HTTP Requests and Responses exchanged with the server. The draw back is that it consumes too much memory to be used in a large stress test.
The View Results in Table listener is also useful in the early stages of the stress test implementation as it gives a good and fast overview of the execution of a test plan. However, this listener also consumes too much memory to be used in a large stress test.
I have also found some very interesting JMeter plugins on a Google Code project. One of them, the “Active Threads Over Time” helped me a lot when trying to set the ramp up throughput by playing with the “ramp up” and “number of threads” parameters of the thread group.

One more element that you should have in mind when performing stress tests is the performance bottleneck of the computer running the tests themselves:

It is very common when running stress tests on large production systems to reach the limits of the computer running the tests before reaching the limits of the tested server.
When the computer running the tests is reaching its limits (memory, number of threads, cpu …) all the measures recorded by the stress tests tool are wrong or at least biased.
There are two way to face this problem: (1) one is to optimize your scenarios and the way you run them and the (2) second is to set up a distributed infrastructure.

(1) In the JMeter manual, you will find the following advises in the section 16.6 of the Best Practises page:

Some suggestions on reducing resource usage.
Use non-GUI mode: jmeter -n -t test.jmx -l test.jtl

Use as few Listeners as possible; if using the -l flag as above they can all be deleted or disabled.

Rather than using lots of similar samplers, use the same sampler in a loop, and use variables (CSV Data Set) to vary the sample.

Don’t use functional mode

Use CSV output rather than XML

Only save the data that you need

Use as few Assertions as possible

If your test needs large amounts of data – particularly if it needs to be randomised – create the test data in a file that can be read with CSV Dataset. This avoids wasting resources at run-time.

(2) In the JMeter manual, you will find the Remote Testing page giving you precise instructions necessary to set up a distributed testing environment and a PDF describing how it all works architecture-wise. My experience is that it is all very easy to set up and that it gives excellent results: in the end, it comes down to running the “jmeter-server” scripts on the slaves and to configure the existing host in the master’s configuration file (jmeter.properties). The only 2 or 3 little problems I came across with the distributed testing are:

Do not forget to give memory to your jmeter slaves and master (set Xms and Xmx in the jmeter.properties file) the default values a very low.
If you use external resources such as a CSV Data Set, you should have them on all your slave installation under the same location (a full path is needed in your scenario)
Beware of multiple thread groups and schedulers, they leak huge amounts of memory on the slaves

Last but not least, you should never perform your stress tests against a server or infrastructure that was just started. Servers usually need a warm-up before they reach their full speed: this is particularly true for the Java platform where you surely don’t want to measure class loading time, JSP compilation time or native compilation time.

Interpret the results …

In order to interpret the results of a stress tests, it is important to understand some basic elements of Statistics:

(1) The mean value (μ)

The following equation show how the mean value (μ) is calculated:

μ = 1/n * Σ_i=1…n x_i

The mean value of a given measure is what is commonly referred to as the average value of this measure. An important thing to understand is that the mean value can be very misleading as it does not show you how close (or far) your values are from the average. An example is always better than a long explanation.

Let’s assume that we are measuring response times in milliseconds in 2 different stress tests:

Stress Test 1:

x₁=100
x₂=110
x₃=90
x₄=900
x₅=890
x₆=910

gives you μ = 1/6 * (100 + 110 + 90 + 900 + 890 + 910) = 500 ms

Stress Test 2:

x₁=490
x₂=510
x₃=535
x₄=465
x₅=590
x₆=410

gives you μ = 1/6 * (490 + 510 + 535 + 465 + 590 + 410) = 500 ms

In both cases the mean value (μ) is the same. However if you observe closely the values taken by the response times you will see that in the first case, the values are “far” from the mean value where in the second case, the values are “close” to the mean value. It is quite obvious with this example that a measure of this distance to the mean value is needed in order to draw any kind of conclusion based on the mean value.

(2) The standard deviation (σ)

The following equation show how the standard deviation (σ) is calculated:

σ = 1/n * √ Σ_i=1…n (x_i-μ)²

The standard deviation (σ) measures the mean distance of the values to their average (μ). In other words it gives us a good idea of the dispersion or variability of the measures to their mean value. Let’s go back to our example and calculate the standard deviation of each of our theoretical stress tests:

Stress Test 1:

σ = 1/6 * sqrt( (100-500)^2 + (110-500)^2 + (90-500)^2 + (900-500)^2 + (890-500)^2 + (910-500)^2 ) ≈ 163 ms

Stress Test 2:

σ = 1/6 * sqrt( (490-500)^2 + (510-500)^2 + (535-500)^2 + (465-500)^2 + (590-500)^2 + (410-500)^2 ) ≈ 23 ms

The 2 values of the standard deviation calculated above are very different:

in the first case, the standard deviation is high compared to the mean value, which shows us that our measures are very variable (or mostly far from the mean value) and that the mean value is not very significant.
in the second case, the standard deviation is low compared to the mean value, which shows us that our measures are not dispersed (or mostly close to the mean value) and that the mean value is significant.

(3) The sampling size and the quality of the measure

Another interesting question is whether our calculated mean value is a good estimation of the “real” mean value. In other word, when calculating the mean value of the response time during a test case do we have a good estimation of the “real” mean response time of the same scenario repeated indefinitely. In probability theory, the Central Limit Theorem states conditions under which the mean of a sufficiently large number of independent randomvariables, each with finite mean and variance, will be approximately normally distributed.

The measures of response times and throughput obtained during stress tests comply with the Central Limit Theorem as we usually have: a large number of independent and random measures which have a finite (calculated by JMeter) mean value and standard deviation. We can thus assume that the mean values of the response time and the throughput are approximatively normally distributed.

This allow us to calculate a Confidence Interval for these mean values. The Confidence Intervalgives us a measure of the quality of our mean values as it allows us to calculated the variability of our mean value (interval) with a predefined probability. You can for example decide to calculate your Confidence Interval at 95%, which will tell you that the probability to have a mean value within the calculated interval is 95%. On the contrary, you can decide to calculate the probability to have you mean value within a given interval (see the examples below).

The following equation show how the Confidence Interval (CI) is calculated:

CI = [μ - Z*σ/√n, μ + Z*σ/√n]

where:

μ is the calculated mean value of our sample,
σ is the calculated standard deviation of our sample
and Z is the value for which the area under the “bell shaped curve” of the standard normal distribution represents the half the chosen Confidence C (anyone who can explain this better is welcome).

The following table gives values of Z for various given values of Confidence C:

C	Z
0.80	1.281551565545
0.90	1.644853626951
0.95	1.959963984540
0.98	2.326347874041
0.99	2.575829303549
0.995	2.807033768344
0.998	3.090232306168
0.999	3.290526731492
0.9999	3.890591886413
0.99999	4.417173413469

Source: http://en.wikipedia.org/wiki/Normal_distribution

If we go back to our previous examples, we can calculate the confidence intervals of our mean values at 95% :

CI₁ = [500 - 1.96*163/sqrt(6); 500 + 1.96*163/sqrt(6)] ≈ [370; 630]

CI₂ = [500 - 1.96*23/sqrt(6); 500 + 1.96*23/sqrt(6)] ≈ [482; 518]

This means that the probability to have a mean response time in the calculated confidence interval is 95%.

We can also calculate the probability to have the mean value in the interval [490, 510]:

10 = Z1 * 163 / sqrt(6) => Z1 = 10 * sqrt(6) / 163 => Z1 ≈ 0.15 => C1 ≈ 12%

10 = Z2 * 23 / sqrt(6) => Z2 = 10 * sqrt(6) / 23 => Z2 ≈ 1.06 => C2 ≈ 71%

Notes:

These are just given as examples of how to calculate the confidence interval … the conditions are not met for the Central Limit Theorem with such a small sample.

The last 2 examples were made using the following Standard Normal Distribution Tables.

Conclusion

As a conclusion, we can say that the best way to interpret our stress test results is to use the Summary Report provided by JMeter and to store it in a “csv” file for every run. In this report we can find, the mean response time, the mean throughput, the standard deviation of the response time and the standard deviation of the throughput for every named sampler and globally for a the run.

Based on the explanations above, I recommend the following methodology:

If we have a high number of samples (which is usually the case in stress tests) and a low standard deviation than we can conclude without risk that we have a good estimation of the mean value of both the response time and the throughput of our system and that the “real” number will be close to the calculated mean values.
If we have a high number of samples (which is usually the case in stress tests) and a high standard deviation, we probably have a good estimation of the mean value but should however consider to estimate a confidence interval. In any case, if the variability of the measure is high investigation is needed on a technical point of view as variability of response times and throughput is obviously related to instability of the system tested.
If we have a low number of samples and a high standard deviation than we almost certainly have a very bad estimation of the mean value, which means that we are measuring the wrong thing, the wrong way.

Monitor your systems while you run the tests …

It is often useful to monitor the system (and its various components) while you are stressing it. Various tools may be used that vary from one platform to another. On the Java platform you may use the excellent “jvisualvm” provided with the latest versions of the JDK and interacting with the various monitoring hooks integrated in the JVM.

Monitoring Java Web Applications is a subject in itself … I can try to share my thoughts on it some time … in another post

EXPLORING UNCERTAINTY !!!

Search Here!

Wednesday, August 25, 2010

API Testing Techniques !

Monday, August 23, 2010

Testers: Site Migration Checklist !

Sunday, August 22, 2010

Some thoughts on stress testing web applications with JMeter !

When do I stop ?

Run your test plan and record the meaningful measures …

Interpret the results …

Monitor your systems while you run the tests …

Followers

Evil Tech!