Pocket Network’s decentralized infrastructure is based on the input of thousands of nodes in 20+ countries, servicing around 1 billion data relays every day. Each one of those nodes needs to stay as performant as possible in order to serve the applications that rely on them. In addition, due to Pocket’s reward mechanism, the rewards are distributed based on the amount of relays performed by the node. Viewed from this lens, how can we maintain the best performance possible for the applications that use those nodes, while at the same time incentivizing and rewarding nodes that are the most performant?Before answering the question, let’s provide some background on how the Portal API works, as it is the main entry point for applications to send their requests to the network.
The Portal API is our main gateway to connect blockchain applications with nodes. It works by routing their data requests (or “relays” as we refer to them) to Pocket’s blockchain sessions, which contain a collection of nodes that are ready to serve requests to such applications. The Portal API is built on top of Pocket’s blockchain and has several features to improve latency and stability of the relays, such as the Quality of Service (QoS) checks and the Node Selector Algorithm, the latter of which is the primary subject of this article.The Portal API leverages all the blockchain-specific details and only exposes a single set of chain-specific endpoints for developers to use. There is no need to set up SDKs or manage private keys - developers just need to simply create an “application” in the Pocket Portal, copy the endpoint(s), and start sending relays to it.
Shortly after the launch of mainnet, in August of 2020, no checks were performed by the Portal API, so nodes within a session would serve roughly the same amount of relays regardless of their quality and health. This meant that there were a high number of errors due to misconfigured Pocket nodes, unsynced blockchain nodes, or simply nodes that simply took too long to respond.The first measure to reduce these types of errors was the addition of Quality of Service (QoS) checks for the nodes. These checks make sure that nodes are staked correctly and are currently on the latest block height of their corresponding blockchain. Adding these checks significantly reduced the amount of errors that end users were experiencing, as this guaranteed that only the healthiest nodes were allowed to perform relays.But it wasn’t enough, as users were still experiencing latency issues and nodes sometimes failed even when they passed the QoS checks. Also, some nodes were much faster than others, so an application sending a continuous amount of requests to the same application could experience vastly different response times, thus hindering the experience. More needed to be done.Enter the Node Selector Algorithm, which is the Portal API’s way of distributing relays (and subsequently rewards) between all the nodes within an application’s session.The Node Selector Algorithm was designed to address the following service issues:
With all this in mind, the Node Selector Algorithm allows Pocket to better rely on the most performant nodes, so applications and users can receive their service in the fastest time possible, while also rewarding the best performing nodes.
The way the algorithm works is by measuring the latency of each node’s relay response and comparing it against other nodes within the same session. The information that is gathered includes both the time taken for successful relays as well as failed relays. For failed relays, the time is not taken into account. Instead, only the number of failures is measured over the preceding 5 minutes.From these two data points, the following is determined:
The two data points above (success rate and the median/weighted success time) are recalculated every time a relay is made for all the nodes in the session that are going to be weighted. With this data, the weighting happens.To visualize the weighting process, think of a raffle. Intuitively, the more raffle tickets one possesses, the more likely they are to win the raffle, though it is of course never guaranteed. This is the case with Pocket nodes and relays too.The raffle tickets, in this case, are the number of times that a node’s ID is copied into a weighting array. The node can be copied up to 10 times for the best performing nodes, or as few as a single time for the least performant nodes.This calculation to weight a node, based on its values above, uses this formula:
The top-rated nodes are given a weightFactor of 10. Subsequent nodes are given a weightFactor based on the difference in latency (latencyDifference) between it and the previously ranked node, multiplied by the WEIGHT_MULTIPLIER. The WEIGHT_MULTIPLIER is a constant designed to produce a curve to adequately push slow nodes down the ranking, with the effect of rewarding better optimized nodes and limiting the relays that are routed to failing or slow nodes. The WEIGHT_MULTIPLIER’s current value is 35.Some more notes on weighting:
Once the weighting is complete, the array is generated, and the Node Selector Algorithm will pick a node at random from the array. This node has “won the raffle”, and will perform the relay and receive the appropriate reward.Because of this process, you can see that a node with a higher QoS will have a greater chance of being picked to perform relays.Here are some other points to take into account regarding the Node Selector Algorithm:
The following table shows a snapshot of summarized data from a real listing of Node Selector results. The data shows ten nodes from a single session, within a single region, in the Portal API:
Median Success Latency (in milliseconds)
Number of relays
Excerpt of Node Selector results from a live session of a single region in the Portal APIThere are several factors contributing to the values shown in the table above. First, as the top 5 nodes have a latency of less than 150 milliseconds (0.15 seconds), they are automatically deemed to be in the top tier of nodes, and are entered into the array 10 times, the maximum allowed.Another factor in play is the number of failures, which has the potential to flag the node as a failure and remove it from consideration from the Node Selector for up to 5 minutes. As the #2 node had double the number of failures of the #1 node, it was flagged as a failure for double the time and thus had more time where it was not chosen at all.This table is just a snapshot, and as node performance changes all the time during a session, mainly due to load, the values shown here may have not been constant for the life of the session (which is 4 blocks or approximately 1 hour).Of course, there is also the randomness factor. A node that is higher up in the ranking may have a better chance of being chosen to perform relays, but the randomness of the selection process means that a lower-performing node may still be chosen. This explains why the number of relays decline as you go down the list, but not in a perfectly consistent pattern.
The Node Selector Algorithm hasn’t changed much since its initial release. However, one aspect that has changed is the weighting mechanism. Due to an initial lack of QoS checks, the Node Selector was set up to punish nodes much more harshly. After the QoS checks were implemented, the punishment for failing was to not be sent to the Node Selector Algorithm at all, in order to guarantee that a node that is being weighted by the Node Selector Algorithm is a functioning, reasonably-performant one.From there on, most changes have been made to update the WEIGHT_MULTIPLIER constant in order to offer fairer reward distribution. These decisions were originally made internally as part of development by the engineering team at Pocket Foundation.One other change implemented to improve the algorithm was changing the calculation on weighting from average to median latency, as nodes that failed only once or twice ended up being disproportionately punished. By changing to median, we were able to more accurately track the overall performance of the node.In terms of deployment, when an improvement or bug fix is added, it must pass a series of unit tests residing on the API, after which it is sent to staging and a semi-production environment where a small percentage of the production traffic is sent to review the behavior and results. If all goes as expected, it is then deployed to production.
The process that takes a feature, bug fix, or refactoring from development to production is similar to how other features are developed within Pocket Network’s engineering team. Once the code changes are made, they must pass a series of steps before going live into production.These steps are:
While updates to the Node Selector Algorithm may be infrequent, we are making much larger changes regarding how we communicate those changes to the community. Some of these changes include:
All of these process changes are aimed to be open and public, so that community members can give feedback on the implementation and results. After all, we share the same goals: to provide the best service possible to our applications, while rewarding fairly the nodes that accomplish that.We hope that this gives you a better understanding of how not only the Node Selector Algorithm works, but also how rewards are allocated to nodes. The Node Selector Algorithm code is open-source and available on GitHub. We invite you to review the code and join the conversation on our discussion forum if you have suggestions on how we can improve our algorithms.