We will cover the architectural subjects that show up while scaling and doing performance tuning of large scale web applications in this blog.
The performance of a web application used to mean several things. Most developers are primarily concerned about the response time and the scalability of the system.
- Response Time: This is the time taken by the web application to process the request and return a response. Applications should respond to requests (response time) within an acceptable duration. If an application is taking longer time to respond than the acceptable limit, then it is said to be a non-performing website.
- Scalability: The web application is said to be scalable if by adding more hardware, the application can linearly take more requests than before. Two ways of adding more hardware are
- Scaling Up (Vertical Scaling):– increasing the number of CPUs or adding faster CPUs on a single box.
- Scaling Out (Horizontal Scaling):– increasing the number of boxes.
Scaling Up Vs Scaling Out
Scaling out (Horizontal Scaling) is considered more important as commodity hardware is cheaper as compared to the cost of special configuration hardware (supercomputer). But increasing the number of requests that an application can handle on a single commodity hardware box is also important. An application is said to be performing well if it can handle more requests with-out degrading response time by just adding more resources.
Response Time Vs Scalability
Response time and Scalability don’t always go together i.e. application might have acceptable response times but can not handle more than a certain number of requests or the application is handling an increasing number of requests but has poor or long response times. We have to stick a balance between scalability and response time to get the good performance of the application.
Capacity planning is an exercise of figuring out the required hardware to handle the expected load in production. Usually, it involves figuring out the performance of the application with fewer boxes and based on performance per box projecting it. Finally verifying it with load/performance tests.
Application architecture is scalable if each layer in multi-layered architecture is scalable. For example:– As shown in the following diagram we should be able to horizontally scale by adding some additional boxes in the Application Layer and/or in the Database Layer.
Scaling Load Balancer
Load balancers can be scaled out by pointing out a DNS to multiple IP addresses and using DNS Round Robin for IP address lookup. Another option is to add another load balancer in front of other load balancers which distributes load to next level load balancers.
Adding multiple Load balancers is rare as a single box running
HAProxy can handle more than
20K concurrent connections per box compared to web application boxes which can handle a few thousand concurrent requests. So a single load balancer box can handle several web application boxes.
Scaling database is one of the most common issues faced. Adding business logic (stored procedure, functions) in the database layer brings in additional overhead and complexity.
RDBMS database can be scaled by having
master-slave mode with reading/writes on the master database and only reads on slave databases. Master-Slave provides limited scaling of reads beyond which developers have to split the database into multiple databases.
CAP theorem has shown that is not possible to get
Partition tolerance simultaneously.
NoSql databases usually compromise on consistency to get high availability and partition.
A database can be split vertically (Partitioning) or horizontally (Sharding).
- Vertically Splitting (Partitioning): Database can be split into multiple loosely coupled sub-databases based on domain concepts. Eg:– Customer database, Product Database, etc. Another way to split a database is by moving few columns of an entity to one database and few other columns to another database. Eg:– Customer database, Customer contact Info database, Customer Orders database, etc.
- Horizontally Splitting (Sharding): Database can be horizontally split into multiple databases based on some discrete attribute. Eg:– American Customers database, European Customers database.
Transiting from a single database to multiple databases using partitioning or sharding is a challenging task.
Scaling bottlenecks are formed due to two issues
- Centralized component A component in application architecture which can not be scaled out adds an upper limit on the number of requests that the entire architecture or request pipeline can handle.
- High latency component A slow component in the request pipeline puts a lower limit on the response time of the application. The usual solution to fix this issue is to make high latency components into background jobs or executing them asynchronously with queuing.
Categories: System Design