In this guide we will discuss how to handle high volume messaging bots through API & Webhook's for Official WhatsApp
WhatsApp used a very interesting and completely different solution from other messenger like Messenger or Telegram.
To have a bot in the WhatApp we need to rotate a Node (node) in an instance of our own. This node maintains connectivity to WhatsApp over a long-lived TCP connection.
Our backend is attached to this node and not directly to the WhatsApp servers. Below we will explain a little better this architecture.
Instance / Node
Each WhatsApp number needs a dedicated machine to run the WhatsApp client. On this machine we will run
CoreApp MySQL WebApp
CoreApp is the application that does all the magic. It will connect to the WhatsApp servers, it will keep the encryption keys (do not forget that WhatsApp is encrypted end-to-end , the node keeps all these keys in MySQL ), it will manage the messages coming and going, backup contacts, etc.
MySQL here WhatsApp stores all incoming and outgoing messages along with media in encrypted format.
WebApp is the application that interfaces between the backend and CoreApp. Basically it is a web service authenticated by tokens for sending messages out.
Through Put Per Second (TPS) plays a big role to handle high messaging volumes, based on your selected plan we have allocated dedicated TPS like 10, 15, 20 to each number
The TPS is calculated based on the following events ;
Getting Delivery Reports
Getting Read Reports
So let's assume that you have 10 TPS this means you can handle upto 10 interactions per second, please note the delivery and read reports also counted in the TPS limit.
Webhook is responsible to push incoming messages to your server on real time basis
Event Webhook is responsible to push delivery and report reports to your server when its available
Push API is responsible to send messages out, recommend to use Webhook Instant reply if you want to give an instant reply back to the user
Please make sure your web server and application can handle the concurrent load as per the allocated TPS especially the incoming messages as the load is not predictable.
Make sure your firewall can open the maximum concurrent connections from our IP
Make sure your webserver is capable to handle concurrent HTTPS requests from our server
Make sure your applications gives instant response to the webhook (less than 500 ms is recommended )
Below is the basic server configuration needed for your webserver to handle concurrent requests, please note the below is just for your reference and prepared based on Apache Web Server on a Linux Machine considering no other apps running.
2 CPU , 2 GB RAM, SSD or IO Optimised Hard disk recommended
4 CPU , 4 GB RAM , SSD or IO Optimised Hard disk recommended
6 CPU, 6 GB RAM , SSD or IO Optimised Hard disk recommended
You need to have enough bandwidth if your bot is sending and receiving many media files
Incoming messages will be queued at WhatsApp end if it reaches the TPS allocated and if the queue exceeds the core app capability then the WhatsApp starts rejecting the incoming messages and this leads to missing of incoming messages forever.
Delivery & Read receipts also consider as incoming message queue , since WhatsApp push this to the docker when its avaiablle.
There will be no queue at WhatsApp end for outgoing messages so you need to manage the outgoing message queue at your end based on your avaiablle TPS and message priority
By default we dont manage the message queue from an API request as we dont know the message priority however if you wish to enable the queue at our end then please keep in touch with us. We can manage the queue only from PUSH API i.e we will not able to queue messages which is coming as an instant response of the webhook.
The standard WhatsApp Business API Client solution runs on a single Docker container. In case you want to split the load and have multiple servers send and receive messages to WhatsApp, you can make use of the multiconnect solution on top of it offered by WhatsApp
The standard WhatsApp Business API Client solution runs on a single Docker container. High availability, allowing you to have Docker containers on stand-by in case the primary Docker container goes down.
A high availability cluster requires at least two Master nodes and two Coreapp nodes as seen in the following diagram:
With high availability, only one Docker container is responsible for sending and receiving messages from WhatsApp servers. If messaging traffic exceeds the maximum throughput of a single Docker container, there will be backlog of message sends and message delivery latency will increase. To scale out the WhatsApp Business API Client, multiconnect supports sharding to spread loads across multiple Docker containers. Currently, WhatsApp only support static sharding with a shard number of 1, 2, 4, 8, 16, or 32. High availability is a special case of multiconnect where the shard number is 1.
If you are using WhatsApp Groups in your WhatsApp Official Account then please aware about the following limitations ;
WhatsApp Groups is in beta
Delivery / Read reports cant be disabled in Docker
If you have large number of groups and high engaging users in the group this can affect your TPS badly , see how it affects your TPS ;
Let's assume you have 10 Groups with 256 members in each group i.e total 2560 members and when you or someone sends a message to the group and when each user reads the message an event webhook will trigger to your server so one message in a group leads to 2560 event triggers. This will consume your TPS.