Rate Limiter - System Design Interview Question [Solved]

Disclosure: This post includes affiliate links; I may receive compensation if you purchase products or services from the different links provided in this article.
Rate Limiter Architecture diagram

credit --- ByteByteGo

Hello friends, System design interviews often test your ability to solve problems that balance performance, scalability, and correctness. One of the most common questions I've encountered is:

"How would you design a Rate Limiter?"

I've been asked this exact question multiple times, and each time the interviewer wanted to see how I approached it systematically.

The rate limiter is not just an academic problem; it's at the heart of many real systems. APIs, login attempts, payment systems, and messaging platforms all use rate limiting to prevent abuse, control costs, and ensure fairness among users.

In the past, I have shared common questions like how to design WhatsApp or YouTube, as well as some concept-based questions like the difference between API Gateway vs Load Balancer and Horizontal vs Vertical Scaling, Forward proxy vs reverse proxy.

In this article, I'll walk you through the problem, the key requirements, different design approaches, and show you code examples (including the simple timestamp array method I used in interviews).

What is a Rate Limiter?

A Rate Limiter is a system component that restricts the number of actions a user (or client) can perform in a given timeframe.

Examples:

  • API Gateway: Only allow 100 requests per user per minute.
  • Login System: Allow only 5 failed attempts in 10 minutes.
  • Messaging App: Prevent users from sending more than 20 messages per second.

If users exceed these limits, the system should block their requests (often returning an HTTP status code 429 Too Many Requests).

Here is a nice diagram from ByteByteGo which shows Rate Limiter in action:

Rate Limiter Design Solution


Key Requirements in Interviews

When designing a rate limiter, interviewers usually want to see if you can handle:

  1. Correctness --- Ensuring requests beyond the limit are rejected.
  2. Efficiency --- Handling millions of requests per second with low latency.
  3. Scalability --- Working in a distributed system across multiple servers.
  4. Fairness --- Avoiding loopholes where burst traffic is allowed.
  5. Configurability --- Easy to change limits per user, per API, etc.

You can also ask questions to clarify any other requirements the Interview will have, like sometimes they ask you to put a limit on a particular URL and on a particular HTTP method.


Top 4 Rate Limiting Algorithms for Interview

Many different algorithms exist for rate limiting, each with trade-offs. Here are the most popular rate-limiting algorithms, which are also asked on technical interviews:

Fixed Window Counter

  • Divide time into fixed windows (e.g., every minute). Count requests.
  • Simple but can allow bursts at window boundaries.

Sliding Window Log

  • Store timestamps of requests in a log (array/queue). Remove old timestamps.
  • More accurate but requires memory proportional to the request volume.

Sliding Window Counter

  • Uses counters for current and previous windows, weighted by time.
  • Memory efficient, smoother than a fixed window.

Token Bucket / Leaky Bucket

  • Tokens are added at a fixed rate, and requests consume tokens.
  • Smooths traffic and is widely used in production systems.

How to design a Rate Limiter on Coding Interviews?

As a Java developer, it's important to not just explain the algorithm but also write clean, interview-ready Java code. In this article, I'll explain the approaches and show you Java implementations for two popular solutions:

  1. Sliding Window Log (array of timestamps) --- the one I personally used in interviews.
  2. Token Bucket --- the production-grade solution widely used in APIs.

1. Sliding Window Log in Java

This method maintains a queue of timestamps for each request. Before processing a new request:

  • Remove timestamps older than the configured time window.
  • If the queue size is below the limit, allow the request and insert the new timestamp.
  • Otherwise, reject it.

Here is how it works:

rate limiter using sliding window log

Now, let's see the implementation in Java code:

import java.util.*;\
public class RateLimiter {\
    private final int maxRequests;\
    private final long windowSizeInMillis;\
    private final Deque<Long> requestTimestamps;\
    public RateLimiter(int maxRequests, int windowSizeInSeconds) {\
        this.maxRequests = maxRequests;\
        this.windowSizeInMillis = windowSizeInSeconds * 1000L;\
        this.requestTimestamps = new ArrayDeque<>();\
    }\
    public synchronized boolean allowRequest() {\
        long now = System.currentTimeMillis();\
        // Remove old timestamps\
        while (!requestTimestamps.isEmpty() &&\
               requestTimestamps.peekFirst() <= now - windowSizeInMillis) {\
            requestTimestamps.pollFirst();\
        }\
        if (requestTimestamps.size() < maxRequests) {\
            requestTimestamps.addLast(now);\
            return true;\
        } else {\
            return false;\
        }\
    }\
    // Demo\
    public static void main(String[] args) throws InterruptedException {\
        RateLimiter limiter = new RateLimiter(5, 10); // 5 requests per 10 seconds\
        for (int i = 1; i <= 7; i++) {\
            if (limiter.allowRequest()) {\
                System.out.println("Request " + i + ": Allowed");\
            } else {\
                System.out.println("Request " + i + ": Blocked");\
            }\
            Thread.sleep(1000);\
        }\
    }\
}

Sample Output
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Blocked
Request 7: Blocked

This solution is perfect for interviews because it's simple, intuitive, and demonstrates your understanding of sliding windows.


2. Token Bucket in Java

The Token Bucket algorithm is widely used in production (e.g., API gateways, microservices).

  • Tokens are added at a fixed rate.
  • Each request consumes one token.
  • If no tokens are available, the request is rejected.

Here is how Tocken Bucket Algorithms work:

Rate limiter using Token Bucket algorithms

Now, let's see the Java code:

public class TokenBucket {\
    private final int capacity;\
    private final double refillRate; // tokens per second\
    private double tokens;\
    private long lastRefillTimestamp;

public TokenBucket(int capacity, double refillRate) {\
        this.capacity = capacity;\
        this.refillRate = refillRate;\
        this.tokens = capacity;\
        this.lastRefillTimestamp = System.nanoTime();\
    }\
    public synchronized boolean allowRequest() {\
        long now = System.nanoTime();\
        double tokensToAdd = ((now - lastRefillTimestamp) / 1e9) * refillRate;\
        tokens = Math.min(capacity, tokens + tokensToAdd);\
        lastRefillTimestamp = now;\
        if (tokens >= 1) {\
            tokens -= 1;\
            return true;\
        } else {\
            return false;\
        }\
    }\
    // Demo\
    public static void main(String[] args) throws InterruptedException {\
        TokenBucket bucket = new TokenBucket(10, 5); // 5 tokens/sec, burst up to 10\
        for (int i = 1; i <= 20; i++) {\
            if (bucket.allowRequest()) {\
                System.out.println("Request " + i + ": Allowed");\
            } else {\
                System.out.println("Request " + i + ": Blocked");\
            }\
            Thread.sleep(200);\
        }\
    }\
}

This implementation is thread-safe and performs well under concurrent loads.


Interview Strategy (for Java Developers)

When asked, "How would you design a rate limiter?" in a Java system design interview:

  1. Start with Fixed Window Counter (simple but has edge cases).
  2. Move to Sliding Window Log (use Deque<Long> in Java).
  3. Mention Token Bucket (useful in production systems).
  4. For distributed systems, bring up Redis-based counters or API Gateway features (e.g., Nginx, Envoy).

This shows both breadth (knowledge of algorithms) and depth (working Java code).


System Design Interview Resources

In order to do well on any interview, resources are very important. Before any System Design and Coding interview, I used to read the following resources

ByteByteGo: click here

I have personally bought their System Design books to speed up my preparation, and joined ByteByteGo for comprehensive preparation.

They are now also giving a 50% discount on their lifetime plan, which is what I have, and I highly recommend that to anyone preparing for the System Design interview.

Join ByteByteGo now for a 50% Discount: click here

ByteByteGo 50% discount code

Codemia.io : Click here

This is another great platform to practice System design problems for interviews. It has more than 120+ System design problems, many of which are free, and also a proper structure to solve them.

They also have a great platform, editorial solution, and tools to help you practice system design questions online, and the best thing is that they are also offering a 60% discount on their lifetime plan.

I usually combine ByteByteGo (theory), Codemia (practice), and Exponent (mock interview) for a complete prep

Here is the link to get discount --- Join Codemia for 60% Discount

Codemia.io discount code

Exponent: Click here
A specialized site for interview prep, especially for FAANG companies like Amazon and Google. They also have a great system design course and many other materials and mock interviews that can help you crack FAANG interviews.

They are also offering a 70% discount now on their annual plan, which makes it a great time to join them.

Here is the link to get discount --- Join Exponent for70% OFF

Exponent discount code

Conclusion

Rate limiting is one of those interview questions that tests both your algorithm knowledge and system design intuition.

  • If you just need something clean in an interview, go with the Sliding Window Log approach (with a Deque<Long> in Java).
  • If you want to demonstrate production-grade knowledge, mention and explain the Token Bucket algorithm.

That way, you cover both the practical coding side and the system design side in one answer.

    Database Sharding 101: The One Topic You Must Nail in Every System Design Interview

    Hello friends, in this data driven world, the ability to efficiently handle vast amounts of data is crucial for businesses and organizations. Traditional monolithic databases often struggle to keep pace with the demands of modern applications and services and become performance bottleneck. This is where database sharding comes into play, offering a powerful solution for horizontally scaling your data. If you don't know what is Sharding? Well, Sharding is a database architecture technique that involves partitioning a large database into smaller, more manageable pieces, called "shards," which are distributed across multiple servers.

    Each shard contains a subset of the data, and together they form the complete dataset. This approach enhances performance and scalability by distributing the workload, reducing latency, and enabling parallel processing.

    Top 10 Caching Strategies for System Design

    Disclosure: This post includes affiliate links; I may receive compensation if you purchase products or services from the different links provided in this article.

    top 5 caching strategies for System design interviews

    image_credit - ByteByteGo

    Hello friends, In System design, efficiency and speed are paramount and in order to enhance performance and reduce response times, caching plays an important role. If you don't know what is caching? let me give you a brief overview first

    Caching is a technique that involves storing copies of frequently accessed data in a location that allows for quicker retrieval.

    For example, you can cache the most visited page of your website inside a CDN (Content Delivery Network) or similarly a trading engine can cache symbol table while processing orders.

    In the past, I have shared several system design interview articles like API Gateway vs load balancer, Forward Proxy vs Reverse Proxy as well common System Design problem and in this article we will explore the fundamentals of caching in system design and delves into different caching strategies that are essential knowledge for technical interviews.

    It's also one of the essential System design topics or concepts for programmers to know.

    By the way, if you are preparing for System design interviews and want to learn System Design in depth then you can also checkout sites like ByteByteGo, InterviewKickStart, Design Guru, Exponent, Educative, Codemia.io, Bugfree.ai and Udemy which have many great System design courses

    how to answer system design question

    P.S. Keep reading until the end. I have a free bonus for you.


    What is Caching in Software Design?

    At its core, caching is a mechanism that stores copies of data in a location that can be accessed more quickly than the original source.

    By keeping frequently accessed information readily available, systems can respond to user requests faster, improving overall performance and user experience.

    In the context of system design, caching can occur at various levels, including:

    1. Client-Side Caching
      The client (user's device) stores copies of resources locally, such as images or scripts, to reduce the need for repeated requests to the server.

    2. Server-Side Caching
      The server stores copies of responses to requests so that it can quickly provide the same response if the same request is made again.

    3. Database Caching
      Frequently queried database results are stored in memory for faster retrieval, reducing the need to execute the same database queries repeatedly.

    Here is a diagram which shows the client side and server side caching:

    server side vs client side caching on system design


    9 Caching Strategies for System Design Interviews

    Understanding different caching strategies is crucial for acing technical interviews, especially for roles that involve designing scalable and performant systems. Here are some key caching strategies to know:

    1. Least Recently Used (LRU)

    This type of the cache is used to Removes the least recently used items first. You can easily implement this kind of cache by tracking the usage of each item and evicting the one that hasn't been used for the longest time.

    If asked in interview, you can use doubly linked list to implement this kind of cache as shown in following diagram.

    Though, in real world you don't need to create your own cache, you can use existing data structure like ConcurrentHashMap in Java for caching or other open source caching solution like EhCache.

    Least Recently Used (LRU) caching strategy


    2. Most Recently Used (MRU)

    In this type of cache the most recently used item is removed first. Similar to LRU cache, it requires tracking the usage of each item and evicting the one that has been used most recently.


    3. First-In-First-Out (FIFO)

    This type of cache Evicts the oldest items first. If asked during interview, you can use use a queue data structure to maintain the order in which items were added to the cache.

    First-In-First-Out (FIFO)


    4. Random Replacement

    This type of cache randomly selects an item for eviction. While this type of cache is simpler to implement, but may not be optimal in all scenarios.


    5. Write-Through Caching

    In this type of caching, Data is written to both the cache and the underlying storage simultaneously. One advantage of this type of caching is that it ensures that the cache is always up-to-date.

    On the flip side write latency is increased due to dual writes.

    Write-Through Caching


    6. Write-Behind Caching (Write-Back)

    In this type of caching, Data is written to the cache immediately, and the update to the underlying storage is deferred.

    This also reduces write latency but the risk of data loss if the system fails before updates are written to the storage.

    Here is how it works:

    Write-Behind Caching (Write-Back) cache working


    7. Cache-Aside (Lazy-Loading)

    This means application code is responsible for loading data into the cache. It provides control over what data is cached but on the flip side it also requires additional logic to manage cache population.

    Cache-Aside (Lazy-Loading) working


    Cache Invalidation

    Along with caching and different caching strategies, this is another important concept which a Software engineer should be aware of.

    Cache Invalidation removes or updates cache entries when the corresponding data in the underlying storage changes.

    The biggest benefit of cache invalidation is that it ensures that cached data remains accurate, but at the same time it also introduces complexity in managing cache consistency.

    And, here is a nice diagram from DeisgnGuru.io which explains various Cache Invalidation strategies for system design interviews

    top 3 Cache Invalidation strategies


    Global vs. Local Caching

    In global caching, a single cache is shared across multiple instances. In local caching, each instance has its own cache. One of the advantage of Global caching is that it promotes data consistency and Local caching reduces contention and can improve performance.

    Global vs. Local Caching


    Best System Design Interview Resources

    And, here are curated list of the best system design books, online courses, and practice websites which you can check to better prepare for System design interviews. Most of these courses also answer questions I have shared here.

    1. ByteByteGo: A live book and course by Alex Xu for System design interview preparation. It contains all the content of the System Design Interview book volumes 1 and 2, and will be updated with volume 3, which is coming soon.

    2. Codemia.io: This is another great platform to practice System design problems for interviews. It has more than 120+ System design problems, many of which are free, and also a proper structure to solve them.

    3. Bugfree.ai: Thisi is another popular platform for technical interview preparation. It contains AI-based mock interviews as well as Interview experience and more than 3200+ real questions on System Design, Machine Learning, and other topics for practice =.

    4. DesignGuru's Grokking System Design Course: An interactive learning platform with hands-on exercises and real-world scenarios to strengthen your system design skills.

    5. "System Design Interview" by Alex Xu: This book provides an in-depth exploration of system design concepts, strategies, and interview preparation tips.

    6. "System Design Primer" on GitHub: A curated list of resources, including articles, books, and videos, to help you prepare for system design interviews.

    7. Educative's System Design Course: An interactive learning platform with hands-on exercises and real-world scenarios to strengthen your system design skills.

    8. High Scalability Blog: A blog that features articles and case studies on the architecture of high-traffic websites and scalable systems.

    9. YouTube Channels: Check out channels like "Gaurav Sen" (ex-Google engineer and founder of InterviewReddy.io and "Tech Dummies" for insightful videos on system design concepts and interview preparation.

    10. "Designing Data-Intensive Applications" by Martin Kleppmann: A comprehensive guide that covers the principles and practices for designing scalable and reliable systems.

    11. Exponent: A specialized site for interview prep, especially for FAANG companies like Amazon and Google. They also have a great system design course and many other materials that can help you crack FAANG interviews.

    how to prepare for system design

    image_credit - ByteByteGo

    Conclusion:

    That's all about caching and different types of cache a Software engineer should know. As I said, Caching is a fundamental concept in system design, and a solid understanding of caching strategies is crucial for success in technical interviews.

    Whether you're optimizing for speed, minimizing latency, or ensuring data consistency, choosing the right caching strategy depends on the specific requirements of the system you're designing.

    As you prepare for technical interviews, delve into these caching strategies, understand their trade-offs, and be ready to apply this knowledge to real-world scenarios.

    Bonus
    As promised, here is the bonus for you, a free book. I just found a new free book to learn Distributed System Design, you can also read it here on Microsoft --- https://info.microsoft.com/rs/157-GQE-382/images/EN-CNTNT-eBook-DesigningDistributedSystems.pdf