Throttling is a flow control feature that limits access toresource to a certain number of times. Once the upper limit or threshold is reached, access to resource is rejected. A ban list can be used to record such failed access, so that within a timewindow, access to the same resource is also denied. It provides privilege based access control and shield resource against DDOSattack.
In API gateway, throttling is basically used to limit API access based on subscription tier. Eg: gold tier allows 1000 API access per limit. Throttling can be single tier (Eg: per user based, per IP address based) or multi-tier (per user per APIendpoint). In both cases, throttling can be viewed as a tuple (key,upper limit). For the multi-tier case, the key can be viewed as apath. Eg, User id + “/” + ipaddress
Counter based implementation
The time line is divided into aseries of time windows. A counter is maintained for the each time window.When new request comes in, the correct time window start is retrieved and counter is incremented. Request is denied if counter reaches limit and throttle key is added to the ban list. The counter is reset at the end of current time window
Refer to the following diagram
Time window approach is quite simple to implement: a map is good to capture throttling information and simplicity implies less calculation overhead.
However, it subjects to a problem known as access spike. Consider the following scenario
Throttling criteria: 1000 request per minute
Current time window: 15:00 �C15:01
For the first 30 second within current time window, no request
For the last 30 second within current time window, there are 999 requests. All requests pass throttling
Now it starts next time window:15:01 �C 15:02
For the first 30 second within current time window, there are 999 requests. All requests pass throttling
Now if you considered the time window from second 30 15:00 to second 30 15:01, it accepts 1998 requests, almost double the throttling limit!
Given the above explanation,choice of time window is a fairly important aspect to consider:
If time window is too large,the access spike problem will manifest itself more.
If time window is too small,time of throttling calculation becomes non-trivial compares to the time window and accuracy will be compromised. Currently, we normally use per-second throttling, this is usually well- balanced between accuracy and access spike issue
Queue based implementation
An alternative is queue based approach. For each key, a sorted queue is maintained to record request time. When new request comes in, we need to look back the queue, sum up all request times that with range of current time and current time �C time window size to gets the total number of requests. This is compared against throttling limit to allow/denyaccess.
At backend, there is another housekeeping thread that cleans up queues for older request times that are earlier than current time �C time window size.
Refer to the followingdiagram
Queue based implementation is not subject to access spike mentioned above. But calculation costis higher due to the need to iterate through the queue. The queueneeds to be locked for multi-threaded access. This impactsperformance and throughput
Scalability consideration:
Within a single VM, scalabilityissue mainly arises from concurrency and lock contention bydifferent threads. Carefully chosen data structure could reduce lock usage and improve performance. For counter based approach,Concurrent hashmap is used to implement key to counter mapping, and java atomicInteger is used to implement counter. For queue based approach, ConcurrentSkipList is used to implement the sorted queue.
The simple problem becomes more intriguing in a cluster environment. API requests may be handled by different members in a cluster, and the total number of request should not exceed throttling limit. Currently WSO2 API gateway uses peer to peer cluster synchronization approach, where throttling data on one cluster member is asynchronously replicated to othermembers.
This approach is flawed due to the following reasons:
Large number of messages exchanged among cluster members: On receiving each throttling request, data is synchronized to all cluster members.Given average M throttling request on each node and N clustermembers, the total number of message is M*N
To make the situation worse,throttling data is stored in axis2 message context and the whole context is replicated. Under peak load, number of throttle keys stored on each node will be huge and throttle data will have a large memory footprint, this makes serialization/deserializationand transferring of the message across network expensive.
Throttling accuracy is notguaranteed.
This is mainly due to network transfer latency of throttling data, consider the following scenario:
10 requests are allowed perminute
Node 1 received 5 requests and node 2 received 4 requests previously and both nodes are synchronized, so counter for node 1 and node 2 are both set to 9
Now Node 1 receives6th request for current time window
Node 1 replicate throttling data to node 2
Node 2 received 5threquest for current time window, however this is before latest state of node 1 is replicated to node 2, node 2 checks local counter and allows the request
There are totally 11 requests received within time window and exceeds throttling limit!
The proposed solution is to usea centralized throttle server to handle access request for thewhole cluster, compared with synchronization approach, it only sends 1 message to throttling server for each request, resulting in much lessnetwork overhead.
From implementation perspective, we need a centralized key value store of high performance; key is the concatenation of throttle key and start time of current time window, value is counter. Banlist is also keptin the store.
We choose memcached or redis as candidate key value store
Memcached has better read performance and slightly better write performance compared toRedis, especially for highly concurrent access. Redis provide cluster synchronization support and more flexible data structure.So in our case, both redis throttle and memcached throttle are implemented, but memcached throttle is more preferred. Refer to
http://blog.sina.com.cn/s/blog_72995dcc01018qkf.html
http://iyunlin.com/thread/200319for various comparison between memcached and redis
CAS (compare and swap) is another feature necessary to ensure accuracy of counter. A node receives throttle request will retrieve counter from server,compare it with throttle limit. Before counter is incremented andwritten to server, it may have been updated by another node. CAS allowed us to detect such data contention and to avoid writingstale value. In this case, client is responsible for retrieve counter again and retry. Luckily both redis and memcached supportsCAS operation
Counter reset at end of timewindow is handled by key expiration. Each key’s expiration time isset to time window length, so there is no need to explicitly remove key from store at the end of each time window. Note that the smallest expiration time for memcached is 1 second, which implies that we cannot do accurate throttling at millisecond level.
A single centralized throttle may become a bottleneck on heavy load if all throttling request is handled by it. Ideally, the centralized throttle can be a cluster too and throttle request can be distributed among the cluster.So which server should handle a particular throttle request? We use hash partition: calculate a hash value of the throttle key (murmur hashalgorithm is used), divide by number of throttle servers. The modis index of server. Alternatively, consistent hash can be used.This ensures that same throttle key always hits the same server and we don't have to worry about distributing the key value store among servers. A few limitations: We have not considered data replication and backup, as memcached does not support that. If one throttle server is down, data stored on it is lost and we will not auto fallback toanther server. Load distribution is not even as some key may be accessed more frequently than another,implying that the corresponding sever will take more load.
During implementation, we made some observations that can improve performance further:
Ban list can be stored in central throttle server as well as API gateway server. Key affinity saves network bandwidth and is effective against DDOS attack. If a key is rejected, it reaches central throttle server only for the first time and the key is added to gateway server’s local ban list.Subsequently, throttle request for the same key is directly rejected by gateway server and does not reach central throttleserver.
The counter can be distributed between central throttle server and API gateway server.
Say throttle limit = 1000 requests per min with 20 API gateway server in a cluster, we could allocate a quota of 40 as local throttle limit for each gatewayserver. So that the first 40 throttle requests are handled locally.Note that this is a heuristic approach on the premise that request for a particular throttle key is distributed evenly to all gateway cluster node, and this is meaningful only if throttle limit is fair large.
CAS can be expensive. Weobserve that under high concurrency scenario, CAS can fail easilyand need to retry for many times. Each CAS attempt sends an additional request to central throttle server and degrades performance drastically. So we should try to reduce CAS usage much as possible. Consider the following scenario
Throttle limit = 1000 requests per min with 20 API gateway server in a cluster
The first 980 request is safe without using CAS. A simple atomic increment operation will do.Contention only arises when we are near throttle limit. We should use CAS only when counter >= 980. To further reduce contention,we can sleep a random short time before each CAS attempt