ImageImage

Why does the internet keep going down?

Image

Hannah Gallop

10 min read
Image

In a world that never sleeps, a sudden internet drop can feel like a major problem. One moment everything is working, the next you’re stuck on a timeout page wondering if the whole web has crashed.

“It always feels more worrying than it is,” says Katie Gale, digital project director at Barques. “Most of the time you are not watching the internet fall over. You are just bumping into one part of it having a bad day.”

When big platforms go offline, like during the M&S cyberattack or Cloudflare’s recent disruption, it’s easy to feel like outages are on the rise and to wonder if something bigger is behind them.

If you run a business that depends on digital infrastructure or you simply can’t imagine life without instant access, any drop in connectivity can feel like a crisis.

While these outages are inconvenient, they’re rarely a sign of anything catastrophic. As specialists in web design, development and systems integration, we know that the more you understand what’s really happening behind the scenes, the easier it is to stay calm.

“As soon as clients understand the layers involved, the panic usually drops,” Katie adds. “Clear, jargon-free communication is key. When you know what’s within your control, and what isn’t, it all feels far less overwhelming. And we’re here to talk you through that, openly and head-on.”

So, why does the internet actually “go down”?

The internet isn’t controlled by one switch. It’s a patchwork of networks, servers, cables and software all working together behind the scenes. When something goes wrong, it’s usually one layer that falters rather than the whole web. Here’s what tends to cause the trouble:

1. Hardware failure

Every digital experience relies on physical infrastructure. Servers, fibre cables, routers, power supplies and cooling systems keep everything running, from your home Wi-Fi to the world’s biggest e-commerce platforms.

And while most of it works quietly in the background, any failure in the core systems can have serious knock-on effects. When a large provider’s routing equipment goes down, millions of users can feel it instantly.

A famous example is the 2007 incident where a ship’s anchor accidentally severed multiple undersea cables near the coast of Alexandria in Egypt. Those cables carried a huge amount of international traffic, and the damage slowed internet access across large parts of the Middle East, India and even parts of Europe.

“People forget how physical the internet is,” says Katie. “We talk about ‘the cloud’ but there are real cables, real data centres, real bits of kit involved. If something happens to those, it can travel very quickly.”

Hardware will always be vulnerable to natural wear, heat, power issues or simple accidents, but providers build in a lot of contingency, so these failures are often fixed quickly.

2. Software bugs and automation flaws

Not every failure is physical, sometimes the problem lies in the code. A small bug in a large system can have enormous consequences and automated systems can make the impact worse if they propagate the error.

Take the major AWS outage that occured earlier this year, this was triggered by a defect in their automated DNS management. What began as a single problematic record cascaded into widespread downtime across thousands of platforms. Banks, apps, gaming platforms and even smart beds stopped working.

“Automation is brilliant until the thing that is being automated goes wrong,” Katie explains. “You get speed and scale, but if something is misconfigured, you get fast, large-scale problems too.”

We see this often as modern infrastructure is powerful, but it only takes one faulty process or misconfigured update to destabilise an entire stack.

This is why our developers test thoroughly across devices, browsers and environments; we also carry out regular maintenance to keep everything up to date and working exactly as it should, whether we’re building in WordPress or creating custom PHP systems.

3. Human error

Even the most sophisticated systems still rely on humans, and humans make mistakes. Some of the largest outages in history have been caused by:

• a mistyped command
• the wrong configuration
• a cable unplugged by accident

The 2017 AWS S3 outage happened because an engineer mistyped a single command while fixing a billing issue. That small error accidentally took far too many servers offline, including the systems that store S3’s metadata and manage new storage.

Restarting those parts of S3 took hours, and during that time huge platforms like Slack, Medium, Coursera, Quora and Expedia went down. The financial impact was significant too, with major companies losing hundreds of millions collectively.

“Most big failures are not down to one ‘stupid’ person,” says Katie. “They are the result of complicated systems, pressure, time constraints and a tiny slip in the middle of that. Good processes make those slips less likely and easier to recover from.”

Robust processes, peer review and good documentation go a long way in reducing these risks, but they’ll never eliminate them entirely.

4. Cyberattacks and DDoS events

This is the one that causes the biggest headlines (and often the biggest headaches for security teams).

A distributed denial-of-service (DDoS) attack works by overwhelming a service with fake traffic until it can’t respond to real users. Cyberattacks can force entire sites offline as teams contain the damage, as seen with the recent M&S incident.

For businesses, this is where secure development, monitored integrations and resilient architecture become essential.

“Security is not a one-off task,” Katie notes. “Regular maintenance, patching, monitoring keep your website secure and far less vulnerable to attacks. You cannot stop every attempt, but you can put yourself in a much stronger position when something does happen.”

Building resilient digital foundations

Outages like the recent Cloudflare event highlight a simple truth: the modern internet is both powerful and fragile. We’ve centralised so much of our infrastructure in a handful of major providers that when one hiccups, millions feel it.

But this is no reason to panic. Instead, it’s a reminder that resilience matters. From an agency point of view, that means:

  • choosing trustworthy hosting providers
  • building integrations that fail gracefully
  • monitoring performance and uptime
  • securing your data and platforms
  • testing thoroughly before launch
  • carrying out regular maintenance for proactive updates and reactive bug fixes

Most importantly, accepting that outages will happen and being ready to respond accordingly when they do.

“For us, resilience is not a nice-to-have,” says Katie. “It is baked into how we plan, design and build. We cannot control the global internet, but we can make sure our clients are not caught off guard.”

The bottom line

A “server not found” error might feel like the beginning of a digital meltdown, but most outages are temporary bumps rather than signs of a collapsing internet.

The web is complex, but it’s also remarkably robust. And with smart, resilient web design and development, your digital presence can weather the inevitable moments of disruption.

“It’s about being prepared, responding quickly and giving our clients confidence that we can guide them through whatever happens, even when the internet is unpredictable,” Katie concludes.

Don’t let the headlines worry you. With the right foundations, you can stay confident – even when the internet has a wobble.