Simulate Rate limit process based on webserver logs

题目

You are working as a backend engineer for alice.com. Recently, various users started abusing the company's API server with a flood of requests, and your team has decided to limit the rate at which a user can access certain API endpoints.
The domain api.alice.com is serverd by an application server. To study the effectiveness of varioius rate-limiting algorithms with realistic traffic, you have collected server logfiles, in which each request to the API server is recorded. The structure of the access logfile is:

id user_ip timestamp http_request_url http_response_code user_client

Although many rate-limiting algorithms exist, you are interested in a sliding-window strategy, which limits the number of API calls to a maximum of 10 requests per second, per IP address. This means that for an arbitrary window of one-second duration, the API server responds to 10 or fewer API requests, per IP address.
Write a function rateLimiter that simulates the sliding window limiter described above. Specifically, the function accepts a input ArrayList> logs, in which each ArrayList in the first dimension corresponding to a line of log file and each String in the second dimension corresponding to a field of that line of log. The function should return a list of integers representing the request_id of the requests that are rejected by the rate-limiter, e.g. [23, 30, 55].

Additional requirements

  • The duration of the sliding-window is exactly one second, and includes both extremes of the interval.
  • Requests that are rejected also count towards the limit of 10 per seconds.
  • Requests from the IP address 11.22.33.44 should never be rate-limited, because this is an IP used by the internal Acme crawler that indexes the site.
  • Requests to any URL started with /admin/ should never be rate-limited, because they correspond to the administrator pages of the Acme site.

Assumptions

  • All entries in the input log will be chronologically ordered.
  • There are no simultaneous requests.
  • All fields described in the above log structure are guaranteed to be present, and have a valid value.

Analysis

对于这个问题,一眼看起可能要求很多难以处理。但是其实如果我们将题目进行抽象的话,就可以简化成如下问题:对于输入的一个二维ArrayList,找出其中所有ip相同,且在1秒内出现十次以上的的行,并返回出现的十次以后的行的id。考虑到题目说所有输入的行已经按时间进行了排序,所以我们只需要维持先进先出的顺序检查一个ip对应的所有行即可。这样的话使用一个Queue就可以解决。同时考虑到其中可能有很多不同的ip,那么用一个HashMap来保存{ip, queue}对就可以方便的解决这个问题。

具体的算法如下:

  • 对于二维ArrayList中的每一行,在Hashmap中创建其ip相对应的Queue。
  • 对于每一行,取出其timestamp。检查其对应的Queue,从中丢弃所有一秒前的请求。对于剩余的请求,检查此时剩余请求的数量。如果大于10,则当前请求会被拒绝。将其id保存。并将当前请求继续放入Queue的末尾(因为题目要求被拒绝的请求也算)。如果小于10,则只进行放入Queue的操作。
  • 重复这一过程直到遍历完所有的输入。

时间/空间复杂度

logs中共有n个请求。

我们使用了一个HashMap>来存储所有的请求。则最坏情况下(所有请求都被存入)空间复杂度为O(n)`。

对于每一个请求而言,其最多只会被放入和取出HashMap一次。因此空间复杂度为O(n)

Java实现如下:

import java.util.*;
import java.math.BigDecimal;

public class RateLimiter {
    public static List rateLimiter(ArrayList> logs) {
        List rejectedRequestIds = new ArrayList<>();
        if (logs == null || logs.size() == 0) {
            return rejectedRequestIds;
        }
        if (logs.get(0) == null || logs.get(0).size() == 0) {
            return rejectedRequestIds;
        }
        rejectedRequestIds = getInvalidRequest(logs);
        return rejectedRequestIds;
    }

    pri static List getInvalidRequest(ArrayList> logs){
        List res = new ArrayList<>();

        Map>> map = new HashMap<>();
        for (ArrayList logLine : logs) {
            // Skip the lines that don't controlled by rate limiting.
            if (logLine.get(1).equals("11.22.33.44") || logLine.get(3).startsWith("/admin/")) {
                continue;
            }
            if(map.containsKey(logLine.get(1))) {
                Queue> queue = map.get(logLine.get(1));
                
                // Remove all requests that is more than 1 seconds away
                BigDecimal currTime = new BigDecimal(logLine.get(2));
                while (
                    currTime.compareTo(
                        new BigDecimal(queue.peek().get(2)).add(new BigDecimal("1"))
                    ) > 0
                ) {
                    queue.poll();
                }

                // If there is still more than 10 requests within 1 seconds for this ip
                // Current request is going to be rejected.
                if( queue.size() >= 10) {
                    res.add(Integer.parseInt(logLine.get(0)));
                    queue.offer(logLine);
                } else {
                    queue.offer(logLine);
                }

            } else{
                Queue> queue = new LinkedList>();
                queue.offer(logLine);
                map.put(logLine.get(1), queue);
            }
        }
        return res;
    }
}

你可能感兴趣的:(Simulate Rate limit process based on webserver logs)