A detailed look at the component design
What is a Book Lending Application?
A book lending application is for the community of readers where they can upload their book’s library online and share it for lending purposes. Anyone can search for a particular book with various types of filters applied like authors, genres, etc.
In other words, it is a scrollable version of people’s (friends or nearby) library. Every book will have its own details and refundable security deposit.
The security deposit will be refunded to the borrower once he has returned the book to the lender.
Requirements and Goals of the System
Let’s design the application with the following requirements:
- An option to log in using any popular login API ie Google/Facebook.
- Books list/feed will be generated by the any logged in user.
- To upload a book, you’ll have to enter complete book details like author, title, bought on (date), image front and back, softcover/hardcover etc. Every book will have a unique identification number.
- Any library member should be able to search books by their title, author, subject category as well by publication date and distance to search in.
- Could also have a rating filter, where people can rate a particular book for its quality. (Use-full to choose from same title books)
- There could be more than one copy of the book, so library members should be able to checkout and reserve any copy
- The books feed/list will contain image of the book, title and author
- Our application should support appending new books as they arrive to the list for all active users.
- Personal user account page containing the books borrowed and lent. I
- There would be a maximum limit (5) on how many books a member can check-out
- There should be a maximum limit (30) on how many days a member can keep a book
- Notification system to notify a user if a particular book becomes available.
- A chat system (optional) for between lender and borrower
- Wallet service that keeps the security deposit for every book. Get’s deducted if not returned by due date else refunded.
- Users can follow each other (optional)
- Our system should be able to generate any user’s book list in real-time — the maximum latency seen by the end-user would be 2s.
- Any new book shouldn’t take more than 5s to make it to a user’s list assuming a new list request comes in.
Capacity Estimation and Constraints
Traffic estimate: Let’s assume 1M daily active users with each user fetching the latest book list near his area on an average of five times a day. This will result in 5M requests per day or approximately 581 requests per second.
Storage estimate: On average, let’s assume we want to keep about 100 posts in every user, that we want to keep in memory for quick fetch. Let’s assume each average post of the book is of 1KB in size. This would mean we would have to store roughly 100KB of data per user. To store all this data for all active users we would need 500GB of memory. If a server can hold up to 100GB we would need around 5 machines to keep the top 100 posts in memory for all active users.
Read requests will be ~5 times more than write requests.
There are 3 primary objects, User (Lenders/ Borrow) and FeedItem (or book post). Here are some of the observations.
- A user can lend and well as borrow the book.
- Users can post FeedItem containing title, images, or videos.
- Each FeedItem will have a UserID which will point to a User who created it.
- Each FeedItem can optionally have a EntityID pointing to the borrower which is also a User.
- Using relational databases looks more appropriate to use here as it offers ACID and other transactional benefits over non-relational database. Example: MySQL and Postgres are a few good examples.
If using a relational database, we would need to model two relations: User-Entity relation and FeedItem-book image relation.
High-Level System Design
At a high level, this problem can be divided into two parts.
Book Feed generation and Feed/Book publishing
Book Feed generation: Bookfeed is generated from the posts or feed items of user. So whenever our system receives a request to generate the feed for a user (say Kritika) we will perform these steps:
- Retrieve IDs of all users that Kritika follows or are within a radius of 15 km
- Retrieve the latest, most popular, or relevant book post of those IDs. These are potential posts we show on Kritika’s book feed.
- Rank these posts based on Kritika’s relevance. This represents Kritika’s new’s feed.
- Store this feed in the cache and return top post (say 20) to be rendered on Kritika’s feed.
- On frontend, when Kritika reaches the end of her current feed, she can fetch the next 20 posts from the server and so on.
One thing to note here is that we generated feed once and stored it in cache. What about new incoming posts from people around Kritika ?
If Kritika is online, we should have a mechanism to rank and add those new posts to the feed. We can periodically do it (say every 5 mins) perform the above steps to rank and add newer posts to her feed. Kritika can be notified when newer items in her feed are available to fetch.
Feed/Book publishing: At a high level we need the following components in our BooksFeed service:
- Web servers: To maintain a connection to the user. This connection will be used to transfer data between the user and the server.
- Application servers: To execute the workflows of storing new posts in the database servers. We will also need some servers to retrieve and push the book feed to the end-user.
- Metadata db and cache: To store metadata about users
- Posts db and cache: To store metadata about posts and their content
- Photo storage and cache: Blob storage like GCS or S3 to store media items like book photos included in the post
- Book’s Feed generation: To gather and rank all relevant posts for a user to generate books feed and store in the cache. This service will also receive live updates and will add these newer feed items to any user’s timeline.
- Feed notification service: To notify user that there are newer items available for their bookfeed.
Detailed Component Design
For people with a lot of other users around or following a lot of other users, can face crazy slow feed generation.
Offline generation of feed: We can have dedicated servers that are continuously generating users’ book feed and storing them in memory. So, whenever a user requests new posts we can simply serve them from a pre-generated, stored location. Using this schema, the user’s book feed is not complied on load, but rather on a regular basis and returned to the user whenever they request it.
Whenever these servers need to generate the feed for a user, they will first query to see what was the last time the feed was generated for that user. Then, new feed data would be generated from that time onwards. Then, new feed data would be generated from that time onwards.
How many feeds should we store in memory for user’s feed? Initially, we can decide to store 100 feed items per user, but this number can be adjusted later based on the usage pattern. For example, if we assume that one page of a user’s feed has 20 posts and most of the users never browse more than 3 pages of their feed, we can decide to store only 60 posts per user. For any user who wants to see more posts (more than what is stored in memory), we can always query backend servers.
Generate book feeds for all users? There will be a lot of users who don’t even login as frequently.
- A more straightforward way would be to LRU cache which can remove users from the memory that haven’t accessed their feed for a long time.
- A more intelligent way could be to track login patter for users to pre-generate their book feed. Eg what time of the day user is active and which days the user access their feed etc.
The process of pushing a post to all people around is called fanout or Pull. Let’s discuss different options for publishing.
- Pull model: This method involves keeping all the recent feed data in memory so users can pull it from the server when they need it. Clients can pull the feed data on a regular basis or manually whenever they need it. Possible problems a) New data might not be shown to the user b) Can result in empty resources most of the time causing waste of resources.
- push model: Once the user has published a book post, we can immediately push this post to all the people in the radius/followers. The advantage is that we dont have to fetch manually. Can use Long poll or web sockets for this. The possible problem with this approach is that when a user has thousands of users in the area/followers, the server has to push updates to a lot of people.
- Hybrid: An alternative method would be to handle it hybrid way ie combination of pull and push model. We stop pushing for people with a lot of users around in the radius and only use the pull model for them. We can save a lot of resources by doing so. Another way could be that once a user publishes a post, we can limit the fanout to only online people around and not everybody. We can combine, ‘push to notify’ and ‘pull for serving’.
Should we always notify users if there are new posts available in their newsfeed?
It could be useful for users to get notified whenever new data is available. However, on mobile devices, where data usage is relatively expensive, it can consume unnecessary bandwidth. Hence, at least for mobile devices, we can choose not to push data, instead, let users “Pull to Refresh” to get new posts.
The most straightforward way to rank posts in a book feed is by the creation time of the book posts, but we can do a lot more than that to ensure important posts are ranked higher. The high-level idea of ranking is first to select key “signals” that make a most important and then to find out how to combine them to calculate a final ranking score.
More specifically, we can select features that are relevant to the importance of any feed item, eg, number of likes, comments, time of the update, whether the book post has images/videos, etc., and then, a score can be calculated using these features. This is generally enough for a simple ranking system. A better ranking system can significantly improve itself by constantly evaluating if we are making progress in user stickiness, retention, ads revenue, etc.
System Design: A Book Lending Application was originally published in Better Programming on Medium, where people are continuing the conversation by highlighting and responding to this story.