Load Balancer using Montecarlo Algorithms
The objective
You have N servers behind a load balancer. When a request arrives it is automatically assigned to one of the available servers. Requests arrive at random intervals according to the exponential distribution with an average of 100 requests per second. When a server receives a request, it processes the request and this takes a random time described by the pareto
distribution. The minimum time is 2 seconds and the average time is 3 seconds. The servers can only process one request at the time. The cost for one server is $2000/month. If a request arrives and all servers are busy, the request is dropped. You earn $0.01 for every request you process and you pay $0.10 in penalties for every request you drop. What is the value of N that allows you to guarantee that 90% of the requests are processed? What is the value of N that maximizes profit? What is the value of N below which you would operate at a loss?
Problem definition
To address this problem, I used Monte Carlo simulations to estimate the number of servers needed (N) to attend to the expected number of request, (Q) that follows an exponential distribution. To that, I need to simulate for multiple values of Q to determine the overall demand (D) and determine the number of servers that allows processing of 90% of the distribution of D. In addition, there has to be a pipeline to complete the process and identify when a server becomes available to take a new job. At each iteration, I computed the profit of accepting requests or loss for rejecting jobs (Pi) as well as recording N and the profit Pi.
The profit is determined by the income amount per request processed minus the cost of rejecting a request and server’s cost. When the number of servers is greater than the number of request received, the penalty is eliminated and only the server fee remains. When the number of request is larger than the number of servers, a penalty is incurred in addition to the server fee. Let x be the rate of accepted request, and S the cost per second of each server; then the profit-per-second would be:
Request analysis
The distribution describing the requests that arrive to the server follows an exponential distribution with an average of 100 request per second. Apriori; if we want to be able to process everything, we would have to have 100 servers available at every second. So then, the time to process each order is 3 seconds, meaning 200 request will accumulate if 100 servers are utilized. Considering that we would simulate a minute-long operation period, the raw numbers are as follows:
I accepted requests in such a way that they are fully processed when the minute is over. That means that I stopped accepting request at 57 seconds. Initial approximation of the number of request: assuming that we always get exactly 100 request per second, we would have a total of 5,700 requests. A single request would be fully processed each minute and therefore paid. Let’s see how the distribution of ten thousand numbers generated looks like. With the following code, we can store each random number generated in the data list and then plot a graph.
The exponential distribution follows a poisson process. When we are counting the number of requests that we have in the time interval, we get the same lambda when the time frame is 1. Now, lets see the distribution of the function we need in order to simulate the number of request to arrive. We set lambda at 100 as indicated and start time variable T at 0. We would be adding the value up to the moment we accept requests, which would be 56 seconds. The random number generated using the exponential distribution would be added until T is greater that 56.
The resulting distribution for lambda equal to 100 and running the loop for T<1 is shown in Figure 1. The histogram shows the distribution with a mean value of 95, which is the number of request expected per second. When we execute the code presented above running for the 56 seconds, the distribution is close to the apriori expectation at 5,602 request. When simulating for an entire minute, the expected range of request is between 5,322 and 5,887 with an average of 5,602 and sigma of 74 requests. When simulating for an entire minute, the expected range of request is between 5,322 and 5,887 with an average of 5,602 and sigma of 74 requests.
Processing requests
When a server receives a request, it processes the request in a minimum time of 2 seconds and average of 3 seconds. The time to process follows a pareto distribution. The simulation for the process would go for the entire minute. The result would say how many processes are being processed and may actually account for profit. The python code that allows computing of the process takes what has been built in the request analysis. Now we need to know the average time to process each request. The value of alpha is: (ᵯ†1)
The value of alpha is: (alpha-1) μ = alpha*Xm → alpha= μ/(μ – Xm ) → alpha= 3/(32)
Simulation
We have seen what to expect in terms of the number of requests and time needed to process. The following step is to combine both measures to determine the profit of one minute. To that end, we start by defining the time (T). Just like before, it would start at zero and stop receiving request at the 57 seconds mark. Lambda is 100 and the time interval between request is modeled as defined above. The following python code uses the distributions and simulated one equivalent for one second.
Simulation code
class MCEngine(object):
def __init__(self, N, lamb):
self.N = N
self.lamb = lamb
self.avServers = N
def simulate_once(self): # simulate one second
N = self.N
fix_expenses = 2000.0/(30.4*24*60*60)* N # assuming months of 30.4 days
number_request = int(max(0,self.requests(57)))
if number_request <= N:
income = number_request * 0.01
else:
income = N * 0.01
profit = income max(0,number_requestN)*0.1 fix_expenses
return profit, min(0,number_requestN)*1
def measure(self, results):
return float(sum(results))/len(results)
def simulate_many(self):
self.results = []
for k in range(1,10000):
y , pro = self.simulate_once()
self.results.append(y)
self.mu = self.measure(self.results)
def requests(self, sec = 57):
lamb = self.lamb
T = 0
k = 0
while T < 1.0:#*sec/60:
t = random.expovariate(lamb)
T = T + t
k = k + 1
return k
def process(self):
xm = 2
alpha = 3 /(3xm)
return xm*random.paretovariate(alpha)
lamb = 100
data = []
for N in range(0,300,20):
engine = MCEngine(N,lamb)
engine.simulate_many()
profit = engine.mu
data.append((N, profit))
For one second we would get an average of 100 request. When we have no servers available, the only cost that we face is the rejection penalty, which approximates to $10. As we start adding new servers to take the requests during the first second, break-even is reached at N equal to 92, approximately. If we continue adding new servers, fixed cost will increase and we get back to operating out of the money. Figure 5 shows the profit for one second for values of N from 0 to 30.
Results
What is the value of N that allows you to guarantee that 90% of the requests are
processed?
To solve this problem, we can use Little’s Law to estimate the average number of request being processed in a minute (L). For that we need to compute W, the average time needed to process requests and the average number of items arriving in a minute (ƛ). We’ll run a simulation to estimate the average of L. Distribution of the request is shown in figure 6. The number of servers needed to guarantee that 90% of the request are processed is 26,122.
What is the value of N that maximizes profit? What is the value of N below which you would operate at a loss?
The profit function is
Profit = income − (rejected * 0.1 + N * serverCost)
Where:
income : minimum between requests and N
rejected: requests N when requests > N
N : number of server
S: cost per server
The simulation measures the profit at many levels of N against the number of requests. Figure 7 shows the profit in the y axis and the number of servers in the x axis. The N that maximizes the profit is around 27,000 servers. Break-even is reached at around 15,000 active servers.