Loading...

Follow Load Impact Blog - Performance, Development and.. on Feedspot

Continue with Google
Continue with Facebook
or

Valid

At the end of 2018, Microsoft stated in a Preview Release note that 2019 is the last version of Visual Studio with load testing features; then only a few months later they formally announced the deprecation of the Azure load testing service would occur on March 31st, 2020.

This announcement came as a bomb to the community. They have been using the tools and service for years, some members even asking for MS to open source their load testing tool.

For those that follow this market, this news was not a complete surprise; we think open source tools and other vendors are providing better performance testing alternatives; this is in line with Microsoft’s opinion about the reason for shutting down its service.

When I look around at other offerings in this space (open source as well as commercial offerings that sometimes include consulting services) I honestly feel they are now better suited to meet the needs of our customers.

Jamie Cool, Director of Program Management, Azure DevOps

Following the announcement, a large number of users from the Azure and Visual Studio load testing community have been asking questions about our Load Testing platform in particular. This post intends to present how to use k6 and LoadImpact as an alternative to replace the Azure load testing service.

k6 and LoadImpact

Before digging on how our Load Testing platform compares to the Azure service, let’s start with an introduction of the main components of our platform:; k6 and the LoadImpact cloud service.

k6: an open source load testing tool; a modern alternative to JMeter. Key benefits include:

  • Test scripts are written in Javascript.
  • An easy-to-use CLI and robust set of built in APIs.
  • Automation friendly, to easily facilitate the inclusion of your performance tests into CI/CD pipelines.

You can install k6 and start running performance tests in your machine for free. By default, k6 prints the result of your test to the console. Additionally, it can store its data into multiple alternatives for the better visualization and analysis of your test results.

LoadImpact: is our cloud service built with k6 at it’s core. We’ve designed our commercial offering to:

  • Run and scale your tests in the cloud with no infrastructure setup.
  • Provide a User-Friendly Interface to organize, configure, and run your tests.
  • Manage your team members and permissions.
  • Provide a convenient and powerful interface for the visualization of your test result.
  • Leverage intelligent algorithms to identify performance issues.
  • Analyze and compare the performance of your application over time.

We have briefly covered the components of our Load Testing platform: k6, the open source tool, and LoadImpact, the cloud service. Next, we are going to describe how to run the same type of Azure load tests with our cloud service or k6.

Azure load testing service Azure load testing service has the following options to create and run your load tests:

URL load test

A URL test is one of the simplest ways to create load tests; you can add multiple URLs, select its HTTP method, add headers, and query string values for each URL. The URL Load Test accesses each of these URLs using the specified parameters multiple times depending on your load test settings.

 

 

The LoadImpact cloud service provides the same mechanism for configuring your load tests with the Request Builder Interface. These tests are configured through the LoadImpact web app, utilizing an interface that should feel familiar to Azure’s offering.

 

You can read more about the different options on the Request Builder documentation.

The Request Builder is an excellent tool to run a load test without writing any Javascript code - we generate that all for you!. It can also be very helpful to onboard new users into the k6 scripting API. By clicking the script button at the top right corner, you can find the k6 script related to your test configuration.

 

HTTP Archive load test

The HTTP Archive aka HAR is a format for logging a user interaction of a website.

Recording and downloading a HAR file is a straightforward process. Here, is a guide to generate a HAR file from a browser session.

 

 

Note that in Chrome, the Preserve log option set up the network request log to not be cleared upon a page change. After we download the HAR file, we are ready to start creating our load test.

In Azure, you have to select `New HTTP Archive based test` and the web interface will convert the HAR file content into individual HTTP requests on the HTTP URL interface.

 

 

LoadImpact also provides the ability to create a load test from a HAR file. The following image shows the view from you could upload the HAR file, configure other test settings, and run your load test.

 

 

For users preferring CLI tools or those without a LoadImpact account, we also have open source a har-to-k6 tool to convert HAR files to k6 scripts.

# install the converter
npm install -g har-to-k6

# convert HAR file
har-to-k6 my-user-session.har -o loadtest.js

Now, you could use k6 to run a local or cloud test of the converted k6 script.

# local execution
k6 run loadtest.js

# cloud execution
k6 cloud loadtest.js
JMeter test

Azure allows you to run JMeter tests on its Load Testing Cloud Service. This is done by selecting the New Apache JMeter test, uploading your jmx file, and running your test from the web interface.

 

JMeter is the most popular open source load testing tool in the market. The first release was launched in 1998 and it has been the standard tool for performance testing.

We thank the JMeter community for providing a great load testing tool and paving the way for performance testing in the software industry, but we built k6 because we believe an easy-to-use alternative with a best in class developer experience is necessary in today’s world.

For users having implemented their performance tests in JMeter, we have built a tool to convert JMeter tests into k6 scripts.

# install the converter
npm install -g jmeter-to-k6

# convert JMeter load test
jmeter-to-k6 loadtest.jmx -o loadtest

Now, you could use k6 to run a local or cloud test of the converted k6 script.

# local execution
k6 run ~/loadtest/test.js


# cloud execution
k6 cloud ~/loadtest/test.js

 

Visual Studio web performance test

Visual Studio web performance tests (.webtest files) is a Microsoft technology to perform web tests simulating many users accessing a server at the same time. While it was not designed for API testing, users have also used them for running API tests.

A web test project provides many options to configure your load test. Comparing both technologies deserves its dedicated blog post having in-depth explanations. The following section quickly describes how some webtest options can be configured in k6.

 

Add artificial human interaction pauses in your scenario

You can use the sleep function to simulate think-time by suspending the test execution for a period of time.

import { sleep } from "k6";
import http from "k6/http";

export default function() {
  http.get("https://loadimpact.com");
  sleep(Math.random() * 30);
  http.get("https://loadimpact.com/features");
};

 

Specify the number of virtual users for your scenario

You can specify the number of Virtual Users and duration of your load test in your k6 script, via the CLI or using the web interface.

k6 run --vus 10 --duration 30s script.js
export let options = {
    vus: 10,
    duration: "3m"
};

 

Configure test iteration settings for your scenario

You can configure the number of iterations a VU can make on a similar way than duration and VUs.

export let options = {
    vus: 10,
    iterations: 100
};

You can read more about all the k6 options.

 

Configure the probability of a virtual user running a test in the scenario

There is Work In Progress for providing the ability to configure probability for different scenarios. The current workaround is to use conditional statements in your k6 script.

 

Configure the desired network mix for your scenario

k6 does not provide yet the ability to simulate network conditions. Some users have been using chaos engineering tools while running their Load Tests to replicate the same type of network conditions.

 

Select the appropriate Web browser mix for your scenario

Set the User Agent header on the HTTP requests of your k6 script to simulate a particular browser generating the requests.

let headers = { "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0" };

http.get("https//test.loadimpact.com", { headers: headers });

Additionaly, the batch option configures the maximum number of simultaneous/parallel connections in total that an http.batch() call in a VU can make.

export let options = {
    batch: 15
};

 

Specify a threshold rule using counters for your load test

Checks provide an API to define assertions in your load tests. Thresholds allow configuring performance expectations in your load test to make your test pass or fail.

You could define thresholds for ensuring performance goals, for example:

  • 99th percentile response time must be below 300 ms.
  • Minimum response time must be below 100 ms
  • The response content must have been OK more than 95% of the time.
  • ...

 

Use context parameters

You can use Environment variables to pass parameters to your load tests.

k6 run -e MY_HOSTNAME=test.loadimpact.com script.js

 

We don't aim to cover an in-depth comparison of both technologies in this post; instead, we decided to write a quick overview of how k6 supports some of the webtest options included on the webtest documentation. For further questions on this or any other topic, we encourage to reach the LoadImpact support team or the k6 community.

Test results

The web interface of the Azure DevOps service and Visual Studio allows analyzing the result of load tests. You can visualize the test summary, configuration, errors and more, analyze individual load metrics, graphs and trends, create reports, compare tests, and much more.

 

k6 provides the --out option to output your load test results to different sources. This option allows you to have multiple choices for visualizing your test results.

# json
k6 run --out json=my_test_result.json script.js

# InfluxDB/Grafana
k6 run --out influxdb=http://localhost:8086/k6 script.js

# Apache Kafka
k6 run --out kafka=brokers=broker_host:8000,topic=k6

# statsd
k6 run --out statsd script.js

# datadog
k6 run --out datadog script.js

 

Our cloud service also provides an interface for the visualization of your load test results. The test result interface has been designed to:

  • providing a convenient and powerful interface for the visualization of your test result.
  • leverage intelligent algorithms to identify performance issues.
  • analyze and compare the performance of your application over time.

You can read more about all the sections of the test result interface in this article.

 

Azure Pipelines

Both locally executed k6 tests and cloud tests can be run from your Azure Pipelines process. Running load tests as part of your CI pipeline ensures that you'll catch any performance regressions that occur when your application or infrastructure changes.

If you ever thought about including your performance tests in your CI/CD Pipelines, read our step-by-step guide to integrating k6 and LoadImpact with Azure Pipelines.

Conclusion

We have presented k6 and the LoadImpact cloud service as an alternative to the departing Azure load testing service. This post intends to provide a high-level overview of both load testing platforms to show you how you can run the same type of load tests on our platform.

If you are evaluating Load Testing platforms, the best way is to try them running your own performance tests. LoadImpact offers a free trial of our cloud service, which we encourage you to explore.

If you have any feedback about the Azure DevOps service or any load testing experience, don’t hesitate to include your comment, contact our support team, or the k6 community.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

This article will show how you can use k6 to store the results of your load tests into Datadog.

Datadog is a monitoring and analytics platform that can help you to get full visibility of the performance of your applications. We ❤️ Datadog at LoadImpact and use it to monitor the different services of our Load Testing platform.

k6 is our open source load testing tool for testing the performance of your applications. One of the attractive features of k6 is that you can output the result of your load tests to different sources; for example:
  • stdout/console
  • JSON
  • InfluxDB
  • Kafka
  • LoadImpact

And now, you can send data from your k6 load test into Datadog for further analysis and correlation with Datadog analytics/metrics.

Requirements
  • k6 v0.24.0 (or above) is installed. If you do not have this installed, please refer to the official k6 installation page. You can verify your current k6 version by command k6 version.
Setup Datadog Agent

According to Datadog documentation: "The easiest way to get your custom application metrics into Datadog is to send them to DogStatsD, a metrics aggregation service bundled with the Datadog Agent". That means that we need to setup a Datadog Agent in order to utilise DogStatsD service. For the sake of simplicity we will run Datadog Agent v6 (latest at the moment) as a docker container. Here is official Datadog Docker Agent page.

To run a container execute next command in shell:

docker run \
            -e DD_API_KEY= \
            -e DD_DOGSTATSD_NON_LOCAL_TRAFFIC=1 \
            -p 8125:8125/udp \
            datadog/agent:latest


Here we add environment variables -e DD_DOGSTATSD_NON_LOCAL_TRAFFIC=1 since it is required to send custom metrics and -e DD_API_KEY=<YOUR_DATADOG_API_KEY> for authorization. Also by -p 8125:8125/udp we bind 8125 UDP port of a container to 8125 UDP port of our Host so that k6 will be able to send metrics to DogStatsD. That’s it, from now Datadog Agent container will be listening on 8125/UDP port until we close it (by hitting Ctrl+C in the shell where we started a container).

Run k6 script

Next step is to send our k6 metrics to already running Datadog agent.

Lets create a simple k6 script.js:

import http from "k6/http";
import { check, sleep } from "k6";

export let options = {
    stages: [
        { duration: "300s", target: 50 }
    ],
    thresholds: {
        "http_req_duration": ["p(95)<3"]
    }
};
export default function() {
    let res = http.get("https://test.loadimpact.com/");
    check(res, {
        "is welcome header present": (r) => r.body.indexOf("Welcome to the LoadImpact") !== -1
    });
    sleep(5);
}


Let’s run this script locally, execute next command in shell:

k6 run script.js





To output k6 metrics to Datadog, we need to add —out datadog option and define some env variables (find env variables description below):

K6_DATADOG_ADDR=localhost:8125 \
K6_DATADOG_NAMESPACE=custom_metric. \
k6 run --out datadog script.js


Here we specify next environment variables to configure k6:

  • K6_DATADOG_ADDR=localhost:8125 configures at which address Datadog agent is listening. Default value is localhost:8125 too, so we can omit it in our case.
  • K6_DATADOG_NAMESPACE=custom_metric. configures a prefix before all the metric names. Default value is k6..
Other available environment variables:
  • K6_DATADOG_PUSH_INTEVAL configures how often data batches are sent. The default value is 1s.
  • K6_DATADOG_BUFFER_SIZE configures buffer size. The default value is 20.
  • K6_DATADOG_TAG_BLACKLIST configures tags that should NOT be sent to DataDog. Default is equal to '' (nothing). This is a comma separated list, example: K6_DATADOG_TAG_BLACKLIST="tag1, tag2".


View and analyze metrics in Datadog

While k6 is running, metrics should be already available in DataDog, we can verify that by looking at Datadog Metrics Explorer:


As seen from the picture above our metrics are available with prefix custom_metric., which we specified in k6 run command. Also we see that DataDog tags are assigned (in our case we picked method:get and status:200). k6 sends k6 tags as datadog metric tags. Since metrics are available in Datadog you can monitor or analyze them as usual Datadog metrics, creating monitors or dashboards.

StatsD

The Datadog agent uses an extension of the StatsD daemon to capture the k6 metrics, aggregate and send them to the Datadog service. Because StatsD is a very popular open source project, we also took this project as an opportunity to support the k6 integration with StatsD.

Now, k6 can also push the metrics to StatsD and make the StatsD daemon to send your load test metrics to other backend services.

k6 run --out statsd script.js

Like the Datadog integration, there are also a few k6/StatsD options that can be configured.

Conclusion

If you followed along, you should now have a running Datadog agent and used the environment variables to define the output to Datadog from your k6 load test. We ❤️ Datadog here at LoadImpact and developing a new k6 feature to be able push metrics to Datadog easily was something we thought was important, this tutorial has shown how to do it in practice.

Many thanks to the contributors @ivoreis and @MStoykov for their work on DataDog output feature of k6 v0.24.0.

And finally, if you want to pull metrics from Datadog into your k6 test, we recommend you read our guide: "How to configure Datadog alerts to fail your load test"


As always, we are looking forward to hearing about your load testing experience; if you have any feedback, don’t hesitate to contact our support team or any k6 channel: community forum, k6 github repository or Slack.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

We’ve gotten a lot of feedback on Insights since we launched v4.0 of the product in June 2018. We took it to heart and started iterating, improving Insights over the last couple of months. Today we’ve released a significant UX and UI refactor so a blog post to explain what’s changed is warranted.

Since this post compares the new to the old, a screenshot from that first public release of Insights is appropriate:

We’ll go through each major change below but here’s a quick rundown on what’s changed since the public launch in June last year:

  • Machine-assisted analysis ("Performance Alerts") is now in focus

  • We broke up the “Breakdown” tab and introduced separate tabs for all the information we had crammed into the “Breakdown” tab.

    • New “Thresholds” tab
    • New “Checks” tab
    • New “HTTP” tab (old “URL table” tab)
    • New “WebSocket” tab
    • New “Analysis” tab (old “Metrics” tab)
    • New “Script” tab (old “Test script” tab)

Our focus is to make result analysis a quick affair. It should be obvious from a quick look at a test result whether it’s good or bad, passed or failed. If a test has failed, it should be apparent where to look in the data for more details.

Machine-assisted analysis in focus

We believe you shouldn’t need to be a performance testing expert to use our product, we thus strive to build automation into the product where it makes sense. Analysis of test results is one such area. The basics of result interpretation are the same for all performance tests, so a natural fit for automation.

There are two different types of automation we apply, the simple type which just surfaces straightforward things like high failure rates, overloaded load generators and URLs with too much variability. Then there is the more difficult “smart” type of automation that looks at many different metrics at the same time, figuring out whether the target system has reached its concurrency limit or not.

Conceptually, the “smart” type of automation is straightforward, but in practice, there’s a lot of variability in the data and corner-cases that needs to be considered. We’ve spent several months developing, testing and refining our algorithms so that they can tell you if the target system of a test successfully coped with the traffic or not, whether its concurrency limit reached or not:

Success!

Failure.

This section of the result page is now always visible from test start to finish.

Breaking up the breakdown

The “Breakdown tree” tab in Insights is a reflection of the structure of a k6 script, and an intuitive way to read the results when analyzing. Or so we thought, but after plenty of feedback and analysis, we realized we had crammed too much information into too little screen real estate.

We needed to break up the breakdown :)

The result is 5 new tabs, replacing the “Breakdown tree” tab while retaining and moving what we’ve deemed the “good parts” to the new tabs.

You’ll notice that when we display charts in the various tabs there’s an “Add this graph to the analysis” button. This is to simplify the process of moving different metrics to the “Analysis” tab for comparison and correlation.

Thresholds tab

The first new tab is focused on thresholds. Thresholds are the mechanism by which test runs are passed or failed and thus vital to automation of performance tests. Having them in their own tab makes it possible for us to more clearly highlight their importance, not the least to new users of the product that haven’t discovered thresholds yet.Recommendation: Make sure you follow the getting started methodology and run baseline tests before running larger tests. Then use the response times from a successful baseline test to set up thresholds for further testing.

Checks tab

Checks are like asserts but don’t halt execution like an assert would in a unit/functional/integration test suite. Instead, it records the boolean result of the check expression for summary presentation at the end of a test run and for setting up thresholds.

Load tests are non-functional tests, but this exposes the dual nature of k6. It can both be used for load tests as well as functional tests and monitoring, and checks are equally useful for load testing as it is in the other cases.Recommendation: Use checks to make sure response content is as expected, and then use it in combination with a check failure rate threshold to set pass/fail criteria.

HTTP tab

The “URL table” is now the “HTTP” tab, merged with the graphing parts from the “Breakdown tree”. Like the “Checks” tab, the table with data can be viewed as a plain list or grouped according to the group() hierarchy of the test script.Recommendation: To make result interpretation easier, avoid generating too many unique URLs by using URL grouping where appropriate to view/visualize the samples from several URLs as one logical URL.

WebSocket tab

We’ve broken out the protocol specific metrics into their own tabs. WebSocket metrics, like HTTP, now have their own tab, visualizing connection oriented response times and sent and received messages.Recommendation: As raw WebSocket connections don’t have a concept of request/response, path, method, response status etc. we can’t provide the same type of per-message metrics and data (nor machine-assisted analysis) that we can for HTTP out of the box. To get a similar level of detail of the data you can use custom metrics to record metrics for the different “events” or “message types” that makes sense to your application.

Analysis tab

The “Metrics” tab is now the “Analysis” tab, and it’s also seen some improvements. This is the new home for custom metrics, and every custom metric will, by default, be added as a smaller chart in this tab.

Script tab

The tab with the fewest changes. The primary change that’s happened here in the last 6 months is the switch to a dark mode theme for the script, a change we made across the app in all places where we display code.

What’s next?

Glad you asked! We’re working on some additional smaller tweaks to Insights and once those are done we’re starting the implementation of test comparison. When it lands it will be a big UX improvement that also aims to help reinforce the getting started methodology.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The latest release of k6 is...not v1.0, but an intermediate release preparing for some upcoming bigger changes that we want to get into k6 before we cut a v1.0 release. There are however some interesting additions, and bug fixes, in this release:

  • CLI: You can now specify a CLI flag --console-output (or K6_CONSOLE_OUTPUT environment variable) to redirect output from the console.log() family of APIs to a file. Thanks to @cheesedosa for this feature!
  • New results output: StatsD and Datadog. You can now output any metrics k6 collects to StatsD or Datadog by running k6 run --out statsd script.js or k6 run --out datadog script.js respectively.
    Thanks to @ivoreis for their work on this!
  • k6/crypto: random bytes method. This feature adds a method to return an array with a number of cryptographically random bytes.
    import crypto from "k6/crypto";
    
    export default function() {
      var bytes = crypto.randomBytes(42);
    }

    Thanks to @bookmoons for their work on this!

  • k6/crypto: add a binary output encoding to the crypto functions. Besides hex and base64, you can now also use binary as the encoding parameter for the k6 crypto hashing and HMAC functions.
  • Error codes: we’ve unified the handling of error codes. k6 will now emit an error_code tag in the metrics output (and expose a property with the same name on http.Response) when there’s an error in making a request.

See the release notes for full details of new additions and bug fixes.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
We shared our detailed analysis of open-source load testing tools in a recent review. In that review, we also promised to share the behind-the-scenes benchmarks. In this article, we'll share those benchmarks so you can see how we arrived at our performance review for each open-source load testing tool.
Remember, too, we narrowed the list for this review to what we consider to be the most popular, open-source load testing tools. This list includes:
  • Jmeter
  • Gatling
  • Locust
  • The Grinder
  • Apachebench
  • Artillery
  • Tsung
  • Vegeta
  • Siege
  • Boom
  • Wrk
Open Source Load Testing Tool Review Benchmarks System configurations

Here are the exact details of the systems and configuration we used. We performed the benchmark using two quad-core servers on the same physical LAN, with GE network interfaces.

Source (load generator) system:

  • Intel Core2 Q6600 Quad-core CPU @2.40Ghz
  • Ubuntu 14.04 LTS (GNU/Linux 3.13.0-93-generic x86_64)
  • 4GB RAM
  • Gigabit ethernet

Target (sink) system:

  • Intel Xeon X3330 Quad-core CPU @2.66Ghz
  • Ubuntu 12.04 LTS (GNU/Linux 3.2.0-36-generic x86_64)
  • 8GB RAM
  • Gigabit ethernet
  • Target web server: Nginx 1.1.19

System (OS) parameters affecting performance were not changed from default values on the target side. On the source side we did this:

System tuning on the Source (load generator) host:

  • We ran the tests with the following configured (we borrowed this list of performance tuning parameters from the Gatling documentation):
    net.core.somaxconn = 40000
    net.core.wmem_default = 8388608
    net.core.rmem_default = 8388608
    net.core.rmem_max = 134217728
    net.core.wmem_max = 134217728
    net.core.netdev_max_backlog = 300000
    net.ipv4.tcp_max_syn_backlog = 40000
    net.ipv4.tcp_sack = 1
    net.ipv4.tcp_window_scaling = 1
    net.ipv4.tcp_fin_timeout = 15
    net.ipv4.tcp_keepalive_intvl = 30
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_moderate_rcvbuf = 1
    net.ipv4.tcp_mem = 134217728 134217728 134217728
    net.ipv4.tcp_rmem = 4096 277750 134217728
    net.ipv4.tcp_wmem = 4096 277750 134217728
    net.ipv4.ip_local_port_range=1025 65535

Note, though, that these performance tuning options had no visible effect on the test results. We ran tests with and without the tuning and couldn’t spot any differences in the results. The parameters are mostly trying to increase TCP memory buffers and allow the use of more concurrent connections. We only had max 1,000 connections active in these tests, and did not do a lot of disconnecting and reconnecting, so the tuning wasn't much help. A couple of the options, like TCP window scaling, were already set (they were defaults), so we didn’t really change them.

Tools Tested

Here are the exact versions of each open source load testing tool we benchmarked:

  • Siege 3.0.5
  • Artillery 1.5.0-15
  • Gatling 2.2.2
  • Jmeter 3.0
  • Tsung 1.6.0
  • Grinder 3.11
  • Apachebench 2.3
  • Locust 0.7.5
  • Wrk 4.0.2 (epoll)
  • Boom (Hey) snapshot 2016-09-16

A Note on Dockerization (Containers) and Performance:

We noted that running the benchmarks inside a Docker container meant that tool performance in terms of Requests Per Second (RPS) rates was reduced by roughly 40% as compared to running the tool natively on the source machine.

All tools seemed equally affected by this, so we did not try to optimize performance for theDockerized tests. This discrepancy explains why a couple of performance numbers quoted will not match what is in the benchmarking results table at the end (those results reflect only Dockerized tests).

For example, running Wrk natively produced 105,000-110,000 RPS, while running it inside a Docker container produced ~60,000 RPS. Apachebench similarly went from ~60,000 RPS to ~35,000 RPS.

A next thing to investigate in a future benchmark would be if/how response times are impacted by Dockerization. It does not seem like Dockerization increases minimum response time (i.e. adds a fixed delay) noticeably, but it may be that response time variation increases.

How we ran the tests
  1. Exploratory testing with individual tools
    We began by running the different tools manually, using many different parameters for concurrency/VUs, threads, etc. that the tools offered. There were two objectives we wanted to achieve. We wanted to see if we could completely saturate the target system and in that way find its maximum capacity to serve requests (i.e. how many requests per second (RPS) the target server could respond to). We also wanted to get to know the tools and find out which ones were roughly the best and worst performers, how well they used multiple CPU cores and finally what kind of RPS numbers we could expect from each.h3
  2. Comparing the tools
    Here we tried to run all tools with parameters as similar as possible. Using the exact same parameters for all tools is, unfortunately, not possible as the tools have different modes of operation, and different options you can set that don’t work the same way across all the tools. We ran the tools with varying concurrency levels and varying network delay between the source and target system, as listed below.h3
    1. Zero-latency, very low VU test: 20 VU @ ~0.1ms network RTT
    2. Low-latency, very low VU test: 20 VU @ ~10ms network RTT
    3. Medium-latency, low-VU test: 50 VU @ ~50ms network RTT
    4. Medium-latency, low/medium-VU test: 100 VU @ ~50ms network RTT
    5. Medium-latency, medium-VU test: 200 VU @ ~50ms network RTT
    6. Medium-latency, medium/high-VU test: 500 VU @ ~50ms network RTT
    7. Medium-latency, high-VU test: 1000 VU @ ~50ms network RTT

Note that the parameters of the initial two tests are likely not very realistic if you want to simulate end user traffic - there are few situations where end users will experience 0.1ms or even 10ms network delay to the backend systems, and using only 20 concurrent connections/VUs is also usually too low to be a realistic high-traffic scenario. But the parameters may be appropriate for load testing something like a micro-services component.

Exploratory Testing

First of all, we ran various tools manually, at different levels of concurrency and with the lowest possible network delay (i.e. ~0.1ms) to see how many requests per second (RPS) we could push through.

At first we loaded a small (~2KB) image file from the target system. We saw that Wrk and Apachebench managed to push through around 50,000 RPS but neither of them were using all the CPU on the load generator side, and the target system was also not CPU-bound. We then realized that 50,000 x 2KB is pretty much what a 1Gbps connection can do - we were saturating the network connection.

As we were more interested in the max RPS number, we didn’t want the network bandwidth to be the limiting factor. We stopped asking for the image file and instead requested a ~100-byte CSS file. Our 1Gbps connection should be able to transfer that file around 1 million times per second, and yes, now Wrk started to deliver around 100,000 RPS.

After some exploratory testing with different levels of concurrency for both Wrk and Apachebench, we saw that Apachebench managed to do 60-70,000 RPS using a concurrency level of 20 (-c 20), while Wrk managed to perform about 105-110,000 RPS using 12 threads (-t 12) and 96 connections (-c 96).

Apachebench seemed bound to one CPU (100% CPU usage = one CPU core) while Wrk was better at utilizing multiple CPUs (200% CPU usage = 2 cores). When Wrk was doing 105-110,000 RPS, the target system was at 400% CPU usage = 4 CPU cores fully saturated.


We assumed 110,000 RPS was the max RPS rate that the target system could serve. This is useful to know when you see another tool e.g. generating 20,000 RPS and not saturating the CPU on the source side - then you know that the tool is not able to fully utilize the CPU cores on the source system, for whatever reason, but the bottleneck is likely there and not on the target side (nor in the network), which we know is able to handle a lot more than 20,000 RPS.

Note that the above numbers were achieved running Wrk, Apachebench etc natively on the source machine. As explained elsewhere, running the load generators inside a Docker container limits their performance, so the final benchmark numbers will be a bit lower than what is mentioned here.

Testing with Similar Parameters

The most important parameter is how much concurrency the tool should simulate. Several factors mean that concurrent execution is vital to achieving the highest request per second (RPS) numbers.

Network (and server) delay means you can’t get an infinite amount of requests per second out of a single TCP connection because of request roundtrip times — at least not with HTTP/1.1.

So, it doesn’t matter if the machines both have enough CPU and network bandwidth between them to do 1 million RPS if the network delay between the servers is 10ms and you only use a single TCP connection + HTTP/1.1. The max RPS you will ever see will be 1/0.01 = 100 RPS.

In the benchmark we decided to perform a couple of tests with a concurrency of 20 and network delay of either 0.1ms or 10ms, to get a rough idea of the max RPS that the various tools could generate.

Then, we also performed a set of tests where we had a higher, fixed network delay of 50ms and instead varied the concurrency level from 50-1000 VU. This second batch of tests were aiming to find out how well the tools handled concurrency.

If we look at RPS rates, concurrency is very important. As stated above, a single TCP connection (concurrency: 1) means that we can do max 100 RPS if network delay is 10 ms.

A concurrency level of 20, however (20 concurrent TCP connections) means that our test should be able to perform 20 times as many RPS as it would when using only a single TCP connection. 20*100 = 2,000 RPS. It will also mean, in many cases, that the tools will be able to utilize more than one CPU core to generate the traffic.
So 20*100 = 2,000 RPS if we have a 10ms network delay. In our case, the actual network delay was about 0.1ms (i.e. 100us, microseconds) between source and target hosts in our lab setup, which means our theoretical max RPS rate per TCP connection would be somewhere around 1/0.0001 = 10,000 RPS. If we use 20 concurrent connections in this case we could support maybe 200,000 RPS in total.

In our first two benchmark tests we used both the actual, 0.1ms, network delay, and 10ms of artificial delay. We call these tests the “Zero-latency, very low VU test” and the “Low-latency, very low VU test” and the aim is primarily to see what the highest RPS rates are, that we can get from the different tools.

The “Low-latency, very low VU test” with 10ms delay is going to generate max 2,000 RPS, which should not be a huge strain on the hardware in the test. It is useful to run a test that does not max out CPU and other resources as such a test can tell us if a load generator provides stable results under normal conditions or if there is a lot of variation.

Or perhaps it consistently adds a certain amount of delay to all measurements. If the network delay is 10ms and tool A reports response times of just over 10ms while another tool B consistently reports response times of 50ms, we know that there is some inefficient code execution happening in tool B that causes it to add a lot of time to the measurements.

The “Zero-latency, very low VU test” with 0.1ms network delay, on the other hand, means the theoretical max RPS over our network and a concurrency level of 20 is 200,000 RPS. This is enough to max out the load generation capabilities of most of the tools (and in some cases max out the target system also), so testing with that network delay is interesting as it shows us what happens when you put some pressure on the load generator tool itself.

Concurrency

Some tools let you configure the number of “concurrent requests”, while others use the VU (virtual user) term, “connections” or “threads”. Here is a quick rundown:

  • Apachebench
    Supports “-t concurrency” which is “Number of multiple requests to perform at a time”
  • Boom/Hey
    Supports “-c concurrent” which is “Number of requests to run concurrently”
  • Wrk
    Supports “-t threads” = “Number of threads to use” and also “-c connections” = “Connections to keep open”. We set both options to the same value, to get wrk to use X threads that each get 1 single connection to use.
  • Artillery
    Supports “-r rate” = “New arrivals per second”. We use it but set it in the JSON config file rather than use the command-line parameter. Our configuration uses a loop that loads the URL a certain number of times per VU, so we set the arrival phase to be 1 second long, and the arrival rate to the concurrency level we want - the VUs will start in 1 second and then for as long as it takes.
  • Vegeta
    Vegeta is tricky. We initially thought we could control concurrency using the -connections parameter, but then we saw that Vegeta generated impossibly high RPS rates for the combination of concurrency level and network delay we had configured. It turned out that Vegeta’s -connections parameter is only a starting value. Vegeta uses more or less connections as it sees fit. The end result is that we cannot control concurrency in Vegeta at all. This means it takes a lot more work to run any useful benchmark for Vegeta, and for that reason we have skipped it in the benchmark. For those interested, our exploratory testing has shown Vegeta to be pretty “average” performance-wise. It appears to be slightly higher-performing than Gatling, but slightly lower-performing than Grinder, Tsung and Jmeter.
  • Siege
    Supports “-c concurrent” = “CONCURRENT users”
  • Tsung
    Similar to Artillery, Tsung lets you define “arrival phases” (but in XML) that are a certain length and during which a certain number of simulated users “arrive”. Just like with Artillery we define one such phase that is 1 second long, where all our users arrive. Then the users proceed to load the target URL a certain number of times until they are done.
  • Jmeter
    Same thing as Tsung: the XML config defines an arrival phase that is 1 second and during which all our users arrive, then they go through a loop a certain number of times, loading the target URL once per iteration, until they are done.
  • Gatling
    The Scala config specifies that Gatling should inject(atOnceUsers(X)) and each of those users execute a loop for the (time) duration of the test, loading the target URL once per loop iteration.
  • Locust
    Supports “-c NUM_CLIENTS” = “Number of concurrent clients”. Also has a “-r HATCH_RATE” which is “The rate per second in which clients are spawned”. We assume the -c option sets an upper limit on how many clients may be active at a time, but it is not 100% clear what is what here.
  • Grinder
    Grinder has a grinder.properties file where you can specify “grinder.processes” and “grinder.threads”. We set “grinder.processes = 1” and “grinder.threads = our concurrency level”

Note here that some tools allow you to separate the number of VU threads and the number of concurrent TCP connections used (e.g. Wrk lets you control this), which is a huge advantage as it lets you configure an optimal number of threads that can use CPU resources as efficiently as possible, and where each thread can control multiple TCP connections.

Most tools only have one setting for concurrency (and Vegeta has none at all) which means they may use too many OS (or whatever-) threads that cause a ton of context switching, lowering performance, or they will use too few TCP connections, which, together with roundtrip times, puts an upper limit on the RPS numbers you can generate.

Duration: 300 seconds

Most tools allow you to set a test duration, but not all. In some cases you set e.g. total number of requests, which means we had to adapt the number to what request rate the tool managed to achieve. The important thing is that we run each tool for at least a decided-upon minimum amount of time. If the tool allows us to configure a time, we set it to 300 seconds. If not, we configure the tool to generate a certain total number of requests that means it will run approximately 300 seconds.

Number of Requests: Varying

Again, this is something that not all tools allow us to control fully, but just like Duration it is not critical that all tools in the test perform the exact same number of requests. Instead we should think of it as a minimum number. If the tool allows us to configure a total number of requests to perform, we set that number to at least 500,000.

A Note on Test Length

A lot of the time, when doing performance testing, running a test for 300 seconds may not be enough to get stable and statistically significant results.

However, when you have a simple, controlled environment where you know pretty well what is going on (and where there is, literally, nothing going on apart from your tests) you can get quite stable results despite running very short tests.

We have repeated these tests multiple times and seen only very, very small variations in the results, so we are confident that the results are valid for our particular lab environment. You’re of course welcome to run your own tests in your environment, and compare the results with those we got.

We provide both a public Github repo with all the source/configuration, and a public Docker image you can run right away. Go to https://github.com/loadimpact/loadgentest for more information.

Open Source Load Testing Tool Review Benchmark Test Results

Note that result precision varies as some tools give more precision than others and we have used the results reported by the tools themselves. In the future, a better and more “fair” way of measuring results would be to sniff network traffic and measure response times from packet traces.

Results: The Zero-latency, very low VU test

The 0.1ms network delay meant requests were fast in this test. Total number of requests that each tool generated during approx 300 seconds was between ~500,000 and ~10 million, depending on the RPS rate of the tool in question.

We could get Average RTT from all tools (except Siege, which wasn’t precise enough to tell us anything at all about response times). Several tools do not report minimum RTT, Apachebench does not report maximum RTT, and reporting of 75th, 90th and 95th percentiles vary a bit.

The metrics that were possible to get out of most tools were Average RTT, Median (50th percentile) RTT and 99th percentile RTT. Plus volume metrics such as requests per second, of course.

Details: Zero-latency, very low VU test
  • Network RTT: ~0.1ms
  • Duration: ~300 seconds
  • Concurrency: 20
  • Max theoretical request rate: 200,000 RPS
  • Running inside a Docker container (approx. -40% RPS performance)

As we can see, Wrk is outstanding when it comes to generating traffic.

We can also see that apart from Wrk we have Apachebench, Boom and Jmeter that perform very well all three. Then Tsung, Grinder, Siege and Gatling perform “OK” while Artillery and Locust are way behind in terms of traffic generation capability.

Let’s see what the RPS rates look like in the test where we use slightly higher network latency, in the Low-latency, very low VU test:

Results: Low-latency, very low VU test

Here we use 10ms of network delay, which meant request roundtrips that were around 100 times slower than in the High-pressure test. Total number of requests that each tool generated during approx 300 seconds was between ~150,000 and ~600,000, depending on the RPS rate of the tool in question.

Details: Low-latency, very low VU test
  • Network RTT: ~10ms
  • Duration: ~300 seconds
  • Concurrency: 20
  • Max theoretical request rate: 2,000 RPS
  • Running inside a Docker container (approx. -40% RPS performance)

Here, we see that when we cap the max theoretical number of requests per second (via simulated network delay and limited concurrency) to a number below what the best tools can perform, they perform pretty similarly. Wrk and Apachebench again stand out as the fastest of the bunch, but the difference between them and the next five is very, very small.

An interesting change is the fact that the “worst performers” category now includes Siege also. Here is a chart that tells us why Siege seems capped at around 1,000 RPS in this test:

As mentioned earlier, we add 10ms of simulated network delay in this test. The total network delay, to be precise, is around 10.1ms...

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Load Impact is proud to now offer a free premium subscription to qualifying Open Source projects.

If you are a core contributor or maintainer of an Open Source project, you may qualify to receive a free annual subscription of Load Impact's Cloud Execution and Insights products. The subscriptions offered are at Developer or Team level, whichever is more appropriate to your needs.

At Load Impact we develop and maintain our own open source projects. We also use various OSS tools and libraries. As strong believers in open source projects, we wanted to support them and the community even more. Providing these free subscriptions to our popular load testing and result analysis products allows those projects to get crucial testing done so they can focus on pushing high quality code, faster.

To apply for consideration for a free subscription, please tell us a little about your awesome project by submitting your information here. Even if you don't meet our requirements, we may still be able to work with you under another program!

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

REST APIs make up about 83% of all APIs currently in use. Performance testing of APIs is becoming more and more critical to ensure overall system performance. Let's take a look at how we can use the k6 open source load testing tool to performance test REST API endpoints.

But first, let's consider some possible reasons why we'd want to do that:

  1. To gauge the limits and capabilities of our API, and by extension, our infrastructure at large.
  2. To enable Continuous Integration and automation processes that will, in turn, drive the baseline quality of our API.
  3. To move towards Continuous Delivery and Canary Deployment processes.

You may want to read this k6 introductory article first, to get an idea of various concepts that we are going to mention here.

Assumptions and first steps

For this guide, our target system running the API runs locally on a very modest and restricted environment. So, the various parameters and result values are going to be significantly lower than those someone would anticipate in a real, production environment. Nevertheless, it will do just fine for the purposes of this guide since the assessment steps should remain the same no matter the infrastructure at hand.

A RESTful API service typically has numerous endpoints. No assumptions should be made about the performance characteristics of one endpoint, by testing another. This simple fact leads to the realization that every endpoint should be tested with different assumptions, metrics and thresholds. Starting with individual endpoints is a smart way to begin your API performance testing.

For the purposes of running small load tests on our REST API endpoints, we will use the command line interface (CLI) local execution mode of k6. For information on how to install k6 locally, read this article.

As you start testing production-like environments, you will likely need to make use of the automated load test results analysis provided by Load Impact Insights.

After testing various endpoints in isolation, you may start to move towards tests that emulate user behavior or that request the endpoints in a logical order. Larger performance tests may also require Load Impact Cloud Execution on the Load Impact cloud infrastructure.

Our stack consists mainly of Django and Django Rest Framework, which sit on top of a PostgreSQL 9.6 database. There's no caching involved so that our results are not skewed.

Our system requires Token-based authentication, so we have already equipped ourselves with a valid token.

Load Testing Our API

With the above in mind, we'll start load testing the v3/users endpoint of our API. This endpoint returns a JSON list of representations of an entity we call User. As a first step, we are going to perform some ad hoc load tests, to get a "feel" for this endpoint to determine some realistic baseline performance thresholds.

Performing GET requests

We first need to create a file named script.js and provide the following content:

import http from "k6/http";
import { check } from "k6";
import { Rate } from "k6/metrics";

export let errorRate = new Rate("errors");

export default function() {
  var url = "http://api.dev.loadimpact.com/v3/users";
  var params = {
    headers: {
      "Authorization": "Token ffc62b27db68502eebc6e90b7c1476d29c581f4d",
      "Content-Type": "application/json"
    }
  };
  check(http.get(url, params), {
    "status is 200": (r) => r.status == 200
  }) || errorRate.add(1);
};

The above script checks that every response to that API endpoint returns a status code of 200. Additionally, we record any failed requests so that we will get the percentage of successful operations in the final output.

Usually, we should start with somewhat modest loading (e.g. 2-5 Virtual Users), to get a grasp on the system's baseline performance and work upwards from that. But suppose we are new at this and we also feel a bit optimistic, so we reckon we should start a load test of the above script with 30 Virtual Users (VUs) for a duration of 30 seconds.

We execute the k6 test with the aforementioned parameters:

$ k6 run -d 30s -u 30 ./script.js

The below partial output of our load test indicates there is an error.

Figure 1: First load test run results show only 14% of requests get a response

We see that only 14% of our requests were successful. This is abysmally low!

OK, so what happened? Well, if we were to show the full output of the load test, we'd notice that we get a lot of warnings of the type:

WARN[0067] Request Failed error="Get http://api.dev.loadimpact.com/v3/users" : net/http: request canceled (<strong>Client.Timeout</strong> exceeded while awaiting headers)

We immediately understand that most requests timed-out. This happened because the default timeout value is set to 60 seconds and the responses were exceeding this limit. We could increase the timeout by providing our own Params.timeout, in our http.get call.

But, we don't want to do that just yet. Suppose that we believe that 60 seconds is plenty of time for a complete response to the GET request. We'd like to figure out under what conditions our API can return proper and error-free responses for this endpoint.

But, first we need to understand something about our load test script. The way we wrote it, every virtual user (VU) performs the GET requests, in a continuous loop, as fast as it can. This creates an unbearable burden on our system, so we need to modify the test.

Consequently, we decide to add a sleep (aka think time) statement to our code. The necessary code changes are the following:

// ... omitted for brevity
// add "sleep" in the import statement
import { check, sleep } from "k6";

  // ... omitted for brevity

  check(http.get(url, params), {
    "status is 200": (r) => r.status == 200
  }) || errorRate.add(1);

  // We add it after each check(); sleep for a half second
  sleep(0.5);
};

This produces:

Figure 2: Load test results show 62% of requests pass

OK, things improved a lot, but still, 38% of our requests timed-out.

We proceed by increasing the sleep value for each VU to 1 second:

// ... omitted for brevity
sleep(1);

And we rerun the test, while keeping the same number of VUs:

$ k6 run -d 30s -u 20 ./script.js

This produces a more desirable outcome for our system:

Figure 3: Load test results show all GET requests finish with 200 status; 95% of requests served in under 283.43ms

Some things we notice from the above output:

  • All requests finished in a timely manner, with the correct status code
  • 95% of our users got served a response in under 283.43ms
  • In the 30 second test duration we served 539 responses, at a rate of ~18 requests per second (RPS)

Now we have a better idea of the capabilities of this endpoint when responding to GET requests, in our particular environment.

Performing POST requests

Our system has another endpoint, v3/organizations, that allows POST requests we use when we want to create a new Organization entity. We want to run performance tests on this endpoint.

import http from "k6/http";
import { check, sleep } from "k6";
import { Rate } from "k6/metrics";

export let errorRate = new Rate("errors");

export default function() {
  var url = "http://api.dev.loadimpact.com/v3/organizations";
  var params = {
    headers:  {
      "Authorization": "Token ffc62b27db68502eebc6e90b7c1476d29c581f4d",
      "Content-Type": "application/json"
    }
  };  

  var data = JSON.stringify({
    "name": `Organization Name ${__VU}: ${__ITER}`
  });
  check(http.post(url, data, params), {
    "status is 201": (r) => r.status == 201
  }) || errorRate.add(1);  

  sleep(1);
};

A few things to note here:

  1. We changed the http.get to http.post. There's a whole range of supported HTTP methods you can see here.
  2. We now expect a 201 status code, something quite common for endpoints that create resources.
  3. We introduced 2 magic variables, __VU and __ITER. We use them to generate unique dynamic data for our post data. Read more about them here.

Armed with experience from our previous test runs, we decide to keep the same VU and sleep time values when running the script:

$ k6 run -d 30s -u 20 ./script.js

And this produces the following results:

Figure 4: Load test results for POST requests to the v3/organizations endpoint

We notice from the results above that we managed to serve all POST requests successfully. We also notice there was an increase in the duration of our responses and a decrease in the total number of requests we could handle during a 30 second test duration. This is to be expected though, as writing to a database will always be a slower operation than reading from it.

Putting it all together

Now we can create a script that tests both endpoints, while at the same time providing some individual, baseline performance thresholds for them.

import http from "k6/http";
import { check, sleep } from "k6";
import { Trend, Rate } from "k6/metrics";

let listErrorRate = new Rate("List Users errors");
let createErrorRate = new Rate("Create Organization errors");
let ListTrend = new Trend("List Users");
let CreateTrend = new Trend("Create Organization");

export let options = {
  thresholds: {
    "List Users": ["p(95)<500"],
    "Create Organization": ["p(95)<800"],
  }
};

export default function() {
  let urlUsers = "http://api.dev.loadimpact.com/v3/users";
  let urlOrgs = "http://api.dev.loadimpact.com/v3/organizations";
  let params = {
    headers: {
      "Authorization": "Token ffc62b27db68502eebc6e90b7c1476d29c581f4d",
      "Content-Type": "application/json"
    }
  };

  // Data for the POST request
  let createOrgData = JSON.stringify({
    "name": `Organization Name ${__VU}: ${__ITER}`
  });
  let requests = {
    "List Users": {
      method: "GET",
      url: urlUsers,
      params: params
    },
    "Create Organization": {
      method: "POST",
      url: urlOrgs,
      params: params,
      body: createOrgData
    },
  };

  let responses = http.batch(requests);
  let listResp = responses["List Users"];
  let createResp = responses["Create Organization"];

  check(listResp, {
    "status is 200": (r) => r.status === 200
  }) || listErrorRate.add(1);

  ListTrend.add(listResp.timings.duration);

  check(createResp, {
    "status is 201": (r) => r.status === 201
  }) || createErrorRate.add(1);

  CreateTrend.add(createResp.timings.duration);

  sleep(1);
};

In the above example we notice the following:

  1. We created separate rates and trends for each endpoint.
  2. We defined custom thresholds via the options variable. We increased our thresholds because we don't want to be too close to our system's limit- the 95th percentile is less than 500ms for GET requests (Users) and 800ms for POST requests (Organizations).
  3. We introduce the batch() call, that allows us to perform multiple types of requests in parallel.

Because we are introducing more concurrent load on our system, we also decide to drop the number of VUs down to 15:

$ k6 run -d 30s -u 15 ./script.js

And here are the results:

Figure 5: Load test results for the test of both API endpoints

We observe that all requests were successfully processed. Additionally, we now have 2 extra rates ("Create Organization" and "List Users") with visual indications about their threshold status. More specifically, Create Organization succeeded but, List Users failed, because the 500ms p(95) threshold was exceeded.

The next logical step would be to take action on that failed threshold. Should we increase the threshold value, or should we try to make our API code more efficient? In any case, we now at least have all the necessary tools and knowledge to integrate load testing as part of our workflow. You could continue your journey by reading some of our CI/CD integrations guides.

References and further reading
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview