DDoS attacks on our primary network

Starting on February 15th, our primary network AS401720 was hit with a series of L3/L4 DDoS attacks from a Mirai botnet. For those not aware, a DDoS attack is designed to take down services and networks (not breach something). The attacks peaked at over 100 Mpps (million packets per second) and over 10 Gbps in traffic bursts. The primary target was our Matrix homeserver.

We know who is responsible. We’re confident this was not a state-sponsored attack. We’re not going to name the individual or group publicly at this time, but we are preserving all of the evidence. At this time no one has publicly come forward to claim responsibility.

This post covers what happened, what we did to mitigate it, and what we’re doing to harden our infrastructure going forward.

Background

AS401720 is our core network. It hosts most of our public-facing Secure Infrastructure (SI) services, including the Unredacted Matrix server, XMPP.is, and several internal systems. Our other networks and services were not affected, such as NoiseNet – our experimental anycast network designed for privacy.

Our core network is typically low in traffic volume, and built with reliability in mind. However, our router could not handle the influx of high packet volume & low packet size attacks that we received. These types of attacks are specifically designed to exhaust CPU cycles, and that’s exactly what they achieved.

Timeline

Feb 15, 17:30 UTC – The first series of volumetric attacks were ingested by our router. This caused our router’s CPU to become completely saturated.

Reports of issues started to come in about the instability of our services, and our moderation team started to investigate and scour our metrics for potential issues. They discovered potential problems with our network. Unmitigated attacks sporadically continued until 22:07 UTC.

The amount of packets which passed through our WAN (in million packets per second)
The amount of packets which were dropped on our WAN interface (in million packets per second)
The CPU usage on our router during the initial wave of attacks

Feb 15, 20:35 UTC – Our core team got online, and started looking through our infra with a fine-grained comb to figure out what was going on. We discovered that our router was the point of congestion.

We scrambled to find ways that we could address the problem and contacted our upstream network. We quickly discovered that they happened to be in the process of reworking their DDoS mitigation systems. As such, no mitigations were being applied.

We decided the best course of action was to migrate our services to an IP prefix being protected by Cloudflare’s Magic Transit which can properly handle volumetric floods.

Feb 15, 21:45 UTC – We started to onboard our services behind our Magic Transit protected IP prefix. To do this, we had to change the IP on every machine and update DNS records.

Feb 16, 00:05 UTC – The first attack came through on our new prefix. Magic Transit mitigated the entire flood, protecting our network completely.

The first flood which Magic Transit mitigated successfully

Feb 16, 00:45 UTC – We finished moving our services behind Magic Transit, and stopped advertising our original prefixes to our upstream provider to protect our network from pivoting attacks.

Feb 16, 06:30 UTC – Another flood came in, and was completely mitigated by Magic Transit.

The second flood which Magic Transit mitigated successfully

Feb 16, 10:29 -> 11:49 UTC – The attacker pivoted in their methods by sending a flood with multiple protocols. Magic Transit mitigated most of the floods, but some leaked through resulting in a very low amount of packet loss on our network.

The third set of floods which Magic Transit mostly mitigated successfully

Attack characteristics

  • Source: Mirai botnet
  • Distribution: Worldwide
  • Protocols: UDP, TCP, GRE
  • TCP flags: Nearly all
  • Ports: Various different ports
  • Packet sizes: Low (less than 300 bytes)
  • Packet rate: 2-100Mpps+
  • Bandwidth: 10Gbps+

Impact

The primary target was our Matrix homeserver. However, it affected nearly all of our core services due to the CPU exhaustion it caused on our router.

You can always check the status of our services at status.unredacted.org.

Mitigation

The timeline above covers the sequence of events, so we won’t repeat it here. The short version is that we migrated all affected services behind Cloudflare’s Magic Transit in about three hours, and withdrew our original prefix announcements to prevent the attacker from pivoting around it. No services went down after the migration.

What didn’t work was relying on our upstream’s DDoS mitigation. We were caught at a bad time with their systems being reworked, but it exposed a dependency we shouldn’t have had. Our router was also never specced for this kind of packet rate, and that’s something we should have planned for.

Attribution

This was a targeted attack against our Matrix homeserver. We know who did it. We’re not disclosing that information publicly right now, but this was not the work of a government or state-sponsored actor. No one has publicly come forward to claim responsibility.

The attack used a Mirai botnet, which is widely available and commonly rented. The targeting of our Matrix homeserver specifically, combined with other indicators we’ve observed, makes us confident in who did this.

What we’re doing next

This attack exposed weaknesses in our core network that we’re now addressing.

Magic Transit stays. We’re keeping all of our core services behind Cloudflare’s Magic Transit going forward. It proved itself during this incident, and we’re not going back to relying on upstream mitigations alone (at least until we are very confident they work well).

Router hardening. Our router was the single point of failure. It was never built to handle 100Mpps+ of low-size packet floods, ideally we can fix that. We’re evaluating hardware upgrades that can handle high packet rates without CPU exhaustion. We’re not completely sure that this is something we’ll do due to rising hardware costs. However, we’ll be looking into it.

Reviewing what else belongs behind NoiseNet. NoiseNet was designed in part to distribute load and mitigate DDoS attacks through its anycast edge. Our core services on AS401720 were not behind it. As we stabilize NoiseNet, we’ll evaluate whether some of these services should be routed through it for the additional resilience it provides. However, this will take additional time as we have yet to develop proper automation for NoiseNet.

Final thoughts

Running infrastructure that helps people access the open Internet and communicate privately means accepting that it will be attacked. We’ve dealt with abuse reports, threats, and now sustained DDoS attacks. None of it changes what we do or why we do it.

We spent countless hours building out our infrastructure and network to be resilient, and we’re working on making them even more so. Although we’re just a small non-profit, the reliability of our services like our Matrix server, XMPP.is, FreeSocks, and our Tor relays is crucial to us. We aim to be as transparent as possible when things go wrong.

If you want to help us build more resilient infrastructure, consider making a donation. If you have questions, join one of our community chats.

Donate