1,000,000 Per Ke?

Vladimir Makovsky, 2015-03-02

The ke is a traditional Chinese unit of decimal time. It's equal to 1 centiday and lasts 1/100 of one day. That is 864 seconds which equals to 14 minutes and 24 seconds. In this blog post, we will perform a simple stress test to determine a safe usage limit of our MIA database. We want to know the maximum number of requests that can be safely served in a second by MIA DB. We will simulate user requests that come from our Gitcharts analytic. This will tell us how MIA DB scales up on modern processors. Presumably, we also want to find out how many requests can be served per ke.

Testing scenario

In Gitcharts analytics, there is a summary report for the whole repository. And also 8 other reports that show statistics for the last X days. There is a slider in the application that controls the number of parallel requests sent by a browser, at 4 requests per slider value:

/api/p/Project/r/summary,
/api/p/Project/r/winners?days=Days
/api/p/Project/r/last_x_days?days=Days
/api/p/Project/r/files_changed?days=Days

where  Days 1 .. 30
	   Project is one of AngularJS, Bootstrap, Django, Docker, Ember.js, 
	                    GCC, Git, Jekyll, Jenkins, CI, jQuery, Linux,
						MongoDB, PostgreSQL, Puppet, Rails, youtube-dl

Performance test simulates random changes of the slider value. Each test set have between 50 and 400 parallel clients based on the number of processor cores available. Client selects randomly a project from 16 existing projects. We run the test set for a few minutes so at least all combinations of days and projects are examined. After that, we divide a total time by number of all requests. Application server with MIA DB is deployed on AWS. We run each set at least 3 times and made the average from all the attempts.

AWS environment

The following table summarizes some results:

Instance type	Cores	Avg. Requests/s	On-demand price/hour
m1.small	1	180	$0.044
t2.micro	1	430	$0.013
c3.2xlarge	8	1942	$0.42
config: 3.16.0-4-amd64 (Debian), Erlang 17.3, gcc 4.9

C3 series on Amazon are instances suggested for these use cases: "high performance front-end fleets, web-servers, ...". So we picked it up. M1 series was deprecated on Amazon this month.</li> Even the t2.micro is burstable instance it behaved quite well even in a series of tests. Actually better than older m1.small instance. As you can see we reached 1942 requests per second on c3.2xlarge instance. </ul>

Gotcha

During testing on AWS, I didn't realize one gotcha. The bottleneck can be also on your side. Firstly I tested it from my laptop to Amazon, but the bottleneck was my local network. I could not generate enough requests from my local network onto Amazon.

Not only that. There are a lot of parameters to create an OS image. Based on that, throughput of incoming network may vary a lot. E.g. even the hardware configuration is very similar request throughput is quite different on AWS, MS Azure, and Google Compute Engine.

Local network

Though the numbers are quite high, the cloud environment with virtual machines is a bit unpredictable in a term of a speed. Specially t2.micro instance is just for very rough estimation. During the different days of the week and phases of the day the performance differs. You have to make many measures.

T410

Let's see how we perform in more isolated environment - on a local network. First we ran a test on Lenovo T410 laptop. The results are as follows:

	Options	Reqs/Sec	Difference
T0	logs on	733
T1	logs off	1005	T0 / T1 = 0.72
Lenovo T410, i5-540M 2.53 GHz, 3MB Cache, 4 threads, HDD, 3.16 (Debian)

Because logging always has some performance overhead so we tried to switch it off completely. It's interesting to find out how much the overhead of logging is. It was over a quarter of total time. However on a different laptop on the local network and also on AWS c3.2xlarge it was less than 10% of a total time. That means if we change the way we do logging we can gain in speed improvement. We have some thoughts on this.

T430s, write optimized

Secondly we ran the test on a bit better laptop. We also set ext4 options to data=writeback + nobarrier so we could increase the disk writing speed (ext4 write-optimized option in the table).

	Options	Reqs/Sec	Difference
T2	logs on, gc on, ext4 write-optimized	1242
T3	logs on, gc off, ext4 write-optimized	1658	T2 / T3 = 0.75
T4	logs off, gc on	1665	T2 / T4 = 0.75
T5	logs off, gc off	1820	T4 / T5 = 0.91
Lenovo T430s, i7-3520M 2.9MHz, 4MB Cache, 4 threads, SSD, 3.16.0-4-amd64 (Debian)

The results can be seen in the table above. Performance overhead also applies to a garbage collection of old caches. To be able to run garbage collection effectively, additional information has to be stored. In special cases, it has a sense to completely switch off gathering of this information. Thus, we measured the performance overhead. The gap between T2 and T4 is quite high. It can be caused by more context switches and we don't know yet the reason behind it. However, it's another possible place for improvement.

1,000,000 per ke

We can see that determining throughput of requests is quite unpredictable. We also see that MIA DB scales up i.e. that running MIA DB can use multiple cores on instance. Maximum throughput of 1200 requests per second on an ordinary laptop is really a good result. That's actually over a million requests per ke or over 100 million requests per day or 3 billion requests per month.

Well, it's not that impressive from the angle of Internet of Things and their numbers. You would still need one-third of peta years to generate your first brontobyte (1 brontobyte = 2^90 bytes ≈ 10^27 bytes ) of log data :-). But be let's be more realistic. If you had over 3 billion page views on your website you would be safely in the top 500 most trafficked websites in the world - NBC news is around the spot 500 on Alexa rank has 1.2 billion page views per month.

Read Other Blog