Reasons for zkSync Outage and its Fix: A Comprehensive Overview

On April 2, according to official news, the zkSync team announced the reason for the outage on Twitter. Blocking stopped due to a failure in the block queue database. However, the

Reasons for zkSync Outage and its Fix: A Comprehensive Overview

On April 2, according to official news, the zkSync team announced the reason for the outage on Twitter. Blocking stopped due to a failure in the block queue database. However, the server API was not affected. Transactions continue to be added to the memory pool, and the query service is normal. Although all components have comprehensive monitoring, logging, and alerts, no alerts were triggered due to the API’s normal operation. The entire team was offline when the accident occurred. The fix was implemented in 5 minutes. To address similar issues, zkSync assigns a special role to database monitoring agents, enabling them to connect to the database and continuously collect metrics. At the same time, the team introduced an alert mechanism that alerts when the database monitoring agent fails or cannot establish a connection to the database. In addition, if the situation escalates significantly, the team on standby will be notified immediately through multiple channels. But the only long-term solution is decentralization.

ZkSync: Database failures lead to downtime, and decentralization is the only long-term solution

On April 2nd, 2021, the zkSync team experienced an outage due to a failure in their block queue database. While the server API remained unaffected, the blocking stopped running, leading to transaction inconveniences for their users. In this article, we will explore the causes of the zkSync outage and how the team worked to fix it.

What Led to the Outage?

According to official news, the blocking stopped due to a technical issue in the block queue database. Despite having comprehensive monitoring, logging, and alerts, no alerts were triggered. The team was entirely offline when the outage occurred.

What Happened During The Outage?

Transactions continued to be added to the memory pool, and the query service functioned correctly. However, the block queue that confirms transactions ran into several problems, resulting in delays in confirming transactions.

How Was The Outage Fixed?

The zkSync team worked quickly to resolve the outage, taking five minutes to implement a fix. To avoid similar issues in the future, the team assigned a specific role to database monitoring agents. These agents can connect to the database and continuously collect metrics even in the case of an outage.
The team also deployed an alert mechanism that alerts them when the database monitoring agent fails or cannot establish a connection to the database. The team on standby is then notified immediately through multiple channels if the situation escalates significantly.

The Future of zkSync

While the fix worked for the short term, the overall solution is decentralization. The team must shift their focus to decrease over-dependence on centralized systems, leaving users susceptible to outages such as the one experienced on April 2nd.

Conclusion

In conclusion, technical outages are a common occurrence and can have detrimental effects on users. The zkSync outage on April 2nd is a significant example, highlighting the importance of adequate monitoring and response mechanisms. Although the team did an excellent job of fixing this outage promptly, decentralization remains the best long-term solution.

FAQs

1. What is zkSync?

zkSync is a Layer 2 scaling solution for Ethereum that uses zero-knowledge-proof technology to enable high transaction throughput.

2. How does zkSync prevent outages?

zkSync has comprehensive monitoring, logging, and alerts to ensure that they respond to outages promptly.

3. How does the zkSync team avoid future outages?

The team has deployed an alert mechanism that alerts them when the database monitoring agent fails, ensuring they are aware of any outages before they escalate.

This article and pictures are from the Internet and do not represent Fpips's position. If you infringe, please contact us to delete:https://www.fpips.com/12731/

It is strongly recommended that you study, review, analyze and verify the content independently, use the relevant data and content carefully, and bear all risks arising therefrom.