AIBU to be concerned about the stability of the worldwide web

OMG12 · 2023-08-01T13:10:34+00:00

I’ll caveat this by saying I have no IT knowledge beyond switch it off and on again. I was recently reading about the fall of Alexandria and the loss of so much knowledge. Nowadays most of our knowledge is stored out there in computer land (told you I had no knowledge on these matters). Are there any circumstances whereby all this information could disappear or we wouldn’t be able to access it? Someone will probably be along soon to tell me I’m being stupid, but I’m genuinely curious.

GasPanic · 01/08/2023 14:24

I think the word you are looking for is resilience rather than stability.

Most places keep their info in multiple places to guard against catastrophic events.

There is maybe s possibility that something such as a severe solar storm might knock out a significant fraction of the storage and leave the world intact but without the data or much advanced electronics. But my guess is a lot of it would survive in some form or another. If there was another catastrophic event big enough to destroy a significant fraction of the data on the net (say asteroid strike), my guess is that we would have greater things to worry about than the web.

I think the bigger problem is actually managing the amount of crap that is out there, and knowing when to delete it/get rid of it. You could probably lose 95% of the stuff on the web with little consequence for humanity. The question is selecting the 5% of stuff worth preserving from the 95% of absolute nonsense.

ntmdino · 01/08/2023 14:40

There would have to be a catastrophe on a global scale to cause significant data loss, given the distributed nature of the Internet.

With that said...the question isn't entirely without merit, because the Internet isn't quite as distributed as a lot of people think. For example, AWS (Amazon's cloud computing arm) hosts some 10% of the data available to the web. Microsoft's Azure and Google account for roughly another 10% - so that's 20% of the Internet hosted by three companies.

Now, they don't put all that stuff in one place - AWS, for example, uses 102 different physical zones across the world to provide its hosting infrastructure, and the number's roughly the same for Google and Microsoft. Still, it's quite a small number given the sheer volume of data. They do have masses of redundancies and backups in place, though, so...back to my original statement: there would have to be a global catastrophe or an incredibly well-resourced coordinated attack to even make a dent in all of that. The only thing I can think of is a solar flare the likes of which have never been seen in the history of the planet.

What the Internet is susceptible to is loss of routing - which is to say, parts of the Internet simply going dark for a period of time. This has happened a few times - to understand why, you have to know how traffic gets from one place to another. Effectively, the Internet is made up of millions of smaller networks which all talk to each other (hence Inter-net), and the routers at the edge of those networks are constantly analysing traffic to find the best routes from A to B (ie the best balance of reliability vs speed). Sometimes, that goes a bit wrong, and one of two things will happen - either traffic starts to disappear (it'll loop, or just go into a black hole) or too much traffic will be sent across a particular link to the point where it becomes a single point of failure. If that link does fail - because of a hardware fault, sabotage or just plain overload - then everything it was serving will simply drop off the Internet.

However, it's not that much of a problem beyond the temporary loss of service - when it comes back, everything that was behind it will pick up where it left off.

Original poster

OMG12 · 01/08/2023 15:01

ntmdino · 01/08/2023 14:40

There would have to be a catastrophe on a global scale to cause significant data loss, given the distributed nature of the Internet.

With that said...the question isn't entirely without merit, because the Internet isn't quite as distributed as a lot of people think. For example, AWS (Amazon's cloud computing arm) hosts some 10% of the data available to the web. Microsoft's Azure and Google account for roughly another 10% - so that's 20% of the Internet hosted by three companies.

Now, they don't put all that stuff in one place - AWS, for example, uses 102 different physical zones across the world to provide its hosting infrastructure, and the number's roughly the same for Google and Microsoft. Still, it's quite a small number given the sheer volume of data. They do have masses of redundancies and backups in place, though, so...back to my original statement: there would have to be a global catastrophe or an incredibly well-resourced coordinated attack to even make a dent in all of that. The only thing I can think of is a solar flare the likes of which have never been seen in the history of the planet.

What the Internet is susceptible to is loss of routing - which is to say, parts of the Internet simply going dark for a period of time. This has happened a few times - to understand why, you have to know how traffic gets from one place to another. Effectively, the Internet is made up of millions of smaller networks which all talk to each other (hence Inter-net), and the routers at the edge of those networks are constantly analysing traffic to find the best routes from A to B (ie the best balance of reliability vs speed). Sometimes, that goes a bit wrong, and one of two things will happen - either traffic starts to disappear (it'll loop, or just go into a black hole) or too much traffic will be sent across a particular link to the point where it becomes a single point of failure. If that link does fail - because of a hardware fault, sabotage or just plain overload - then everything it was serving will simply drop off the Internet.

However, it's not that much of a problem beyond the temporary loss of service - when it comes back, everything that was behind it will pick up where it left off.

Thanks for that incredibly helpful response. Tbh I had no idea how the internet works. It’s a little more settling to know about the distribution. I guess the thing to be more worried about then is not the destruction of information but to understand how easy or not it would be to manipulate the data, esp since such a large amount is guarded by a small number of companies. Would that be possible?

OP posts:

Original poster

OMG12 · 01/08/2023 15:03

GasPanic · 01/08/2023 14:24

I think the word you are looking for is resilience rather than stability.

Most places keep their info in multiple places to guard against catastrophic events.

There is maybe s possibility that something such as a severe solar storm might knock out a significant fraction of the storage and leave the world intact but without the data or much advanced electronics. But my guess is a lot of it would survive in some form or another. If there was another catastrophic event big enough to destroy a significant fraction of the data on the net (say asteroid strike), my guess is that we would have greater things to worry about than the web.

I think the bigger problem is actually managing the amount of crap that is out there, and knowing when to delete it/get rid of it. You could probably lose 95% of the stuff on the web with little consequence for humanity. The question is selecting the 5% of stuff worth preserving from the 95% of absolute nonsense.

Yes I suppose the flooding of the world with data is a real risk. Mind you if you delete it then who decides what’s deleted? How are ethics developing around that sort of thing?

OP posts:

INeedAnotherName · 01/08/2023 15:12

Interesting question OP, and I also wonder about blocked access to such knowledge. What would happen if an ISP (or whatever has the power) decided "no, you can't have it".

On a less serious note
The question is selecting the 5% of stuff worth preserving from the 95% of absolute nonsense.
Deleting all those Insta perfect dinners should get rid of 50% at least 😂

ntmdino · 01/08/2023 15:16

OMG12 · 01/08/2023 15:01

Thanks for that incredibly helpful response. Tbh I had no idea how the internet works. It’s a little more settling to know about the distribution. I guess the thing to be more worried about then is not the destruction of information but to understand how easy or not it would be to manipulate the data, esp since such a large amount is guarded by a small number of companies. Would that be possible?

Show quote history

Yes, it would be possible - but it would be a massive undertaking because the data is all in custom formats, for each of their customers. Remember, AWS has millions of different customers, most of which are running their own code on Amazon's infrastructure. To manipulate data, you'd have to understand it first.

I suppose it would be possible with a gargantuan AI system, but that's not really something that's realistic. The most realistic possibility is one of those companies going out of business, but - given how reliant the world is on them (governments included) - they'd be propped up pretty much forever.

The only other possibility is a Y2k-style bug in very popular data storage software - something like MySQL, PostgreSQL, MongoDB, SQL Server etc (they're all databases). If something like that came out of the blue, it could easily cause loss of data on a massive scale because that software underpins so many of the systems out there.

Don't forget, though, that every company out there has backups, and backups of backups (at least, the ones that matter do), and even small startups run disaster recovery drills every few months to verify that they can recover from a complete infrastructure loss (or a data integrity disaster) in a short period of time.

Basically...lots of very smart people get paid an absolute fortune to worry about all the things you're talking about, so nobody else has to ;) Which, of course, isn't to say that you shouldn't be concerned - you should still run your own backups of your own data, and don't just trust that whoever's running your cloud services will do the job properly. I'm absolutely paranoid about this, which is why I have a fairly hefty server in my back room which stores every bit of data I've ever created (well, the bits I care about, anyway), and that gets backed up regularly to a set of disks that I keep elsewhere. Just in case.

ntmdino · 01/08/2023 15:19

INeedAnotherName · 01/08/2023 15:12

Interesting question OP, and I also wonder about blocked access to such knowledge. What would happen if an ISP (or whatever has the power) decided "no, you can't have it".

On a less serious note
The question is selecting the 5% of stuff worth preserving from the 95% of absolute nonsense.
Deleting all those Insta perfect dinners should get rid of 50% at least 😂

That already happens - and the solution is a VPN (sort of a network connection within a network connection, and it's strongly encrypted). While it is possible to decrypt VPN traffic in real time, it's so resource-intensive that no ISP is interested in spending the money to do it, no matter how much the government would love them to.

Original poster

OMG12 · 01/08/2023 15:20

INeedAnotherName · 01/08/2023 15:12

Interesting question OP, and I also wonder about blocked access to such knowledge. What would happen if an ISP (or whatever has the power) decided "no, you can't have it".

On a less serious note
The question is selecting the 5% of stuff worth preserving from the 95% of absolute nonsense.
Deleting all those Insta perfect dinners should get rid of 50% at least 😂

Could delete anything posted by an “influencer” and we would be nearly there.

But yes, blocking users it’s a very valid concern

OP posts:

Original poster

OMG12 · 01/08/2023 15:23

ntmdino · 01/08/2023 15:16

Yes, it would be possible - but it would be a massive undertaking because the data is all in custom formats, for each of their customers. Remember, AWS has millions of different customers, most of which are running their own code on Amazon's infrastructure. To manipulate data, you'd have to understand it first.

I suppose it would be possible with a gargantuan AI system, but that's not really something that's realistic. The most realistic possibility is one of those companies going out of business, but - given how reliant the world is on them (governments included) - they'd be propped up pretty much forever.

The only other possibility is a Y2k-style bug in very popular data storage software - something like MySQL, PostgreSQL, MongoDB, SQL Server etc (they're all databases). If something like that came out of the blue, it could easily cause loss of data on a massive scale because that software underpins so many of the systems out there.

Don't forget, though, that every company out there has backups, and backups of backups (at least, the ones that matter do), and even small startups run disaster recovery drills every few months to verify that they can recover from a complete infrastructure loss (or a data integrity disaster) in a short period of time.

Basically...lots of very smart people get paid an absolute fortune to worry about all the things you're talking about, so nobody else has to ;) Which, of course, isn't to say that you shouldn't be concerned - you should still run your own backups of your own data, and don't just trust that whoever's running your cloud services will do the job properly. I'm absolutely paranoid about this, which is why I have a fairly hefty server in my back room which stores every bit of data I've ever created (well, the bits I care about, anyway), and that gets backed up regularly to a set of disks that I keep elsewhere. Just in case.

Show quote history

I wouldn’t have a clue how to back up my stuff on servers and disks )I print everything I want to keep😀). I suppose I just don’t really understand it.

OP posts:

Original poster

OMG12 · 01/08/2023 15:34

Actually how exactly does it all work? Say I wanted to find out Hypatia (let’s stick with Alexandria). I type into google “Hypatia” and hit return. What exactly happens then?

OP posts:

ntmdino · 01/08/2023 15:47

OMG12 · 01/08/2023 15:34

Actually how exactly does it all work? Say I wanted to find out Hypatia (let’s stick with Alexandria). I type into google “Hypatia” and hit return. What exactly happens then?

Oooof, really? That's a question...

OK, it starts way earlier than that. Google starts with a list of domains (eg mumsnet.com) that have been bought. It hits up each of those, and saves the page to its database, along with a list of keywords. Then it looks for links on that page, and follows them - and saves the page, along with a list of keywords. Repeat ad nauseam for billions of pages and domains.

It then has a condensed version of the visible Internet (*) - indexed by keywords and keyphrases (think like the index at the end of a book, only waaaay bigger and more detailed).

When you search Google, you're searching that index of keywords and phrases, and it sends you back the pages it thinks are most relevant (ie most likely to get you to click on them). Every time you click a result, it adds to that page's score - you've just proven that it's a little bit useful to you. The higher the score, the higher up the results that page goes.

(*) Incidentally, the converse of this is what's known as "the Dark Web". Basically, it's not some big separate network, or even a single "thing" - it's still the Internet, accessible from the same connections. It's just that the pages on there aren't linked to from anywhere on the visible Internet - you can't simply follow a bunch of links to get there, so unless you just know the address, it's impossible to find it.

Original poster

OMG12 · 01/08/2023 17:19

ntmdino · 01/08/2023 15:47

Oooof, really? That's a question...

OK, it starts way earlier than that. Google starts with a list of domains (eg mumsnet.com) that have been bought. It hits up each of those, and saves the page to its database, along with a list of keywords. Then it looks for links on that page, and follows them - and saves the page, along with a list of keywords. Repeat ad nauseam for billions of pages and domains.

It then has a condensed version of the visible Internet (*) - indexed by keywords and keyphrases (think like the index at the end of a book, only waaaay bigger and more detailed).

When you search Google, you're searching that index of keywords and phrases, and it sends you back the pages it thinks are most relevant (ie most likely to get you to click on them). Every time you click a result, it adds to that page's score - you've just proven that it's a little bit useful to you. The higher the score, the higher up the results that page goes.

(*) Incidentally, the converse of this is what's known as "the Dark Web". Basically, it's not some big separate network, or even a single "thing" - it's still the Internet, accessible from the same connections. It's just that the pages on there aren't linked to from anywhere on the visible Internet - you can't simply follow a bunch of links to get there, so unless you just know the address, it's impossible to find it.

Show quote history

Ah ok, thanks. that makes sense. Does google personalise these search results or is it a generic thing?

OP posts:

gwenneh · 01/08/2023 17:25

Does google personalise these search results or is it a generic thing?

As much as is possible.

For example, if you're logged in as a Google user for any product, like Gmail or Youtube, the results will be very personal - it will use the data from things like the videos you watch to determine how useful the results it displays are, and serve you the results in that order.

If you're not logged in, it will use the data such as the location of your IP address and any data from browser sessions it can access to choose which results to display. So even if you're not logged in, you can google a term like "restaurants" and the results displayed will be near you - Google assumes where you are based on your IP, and displays results based on that.