Comment

Comment on No JS, No CSS, No HTML: online "clubs" celebrate plainer websites

That is just stupid. How about a slighly more complex markdown.

What I really want is a P2P archive of all the relevant news articles of the last decades in markdown like in firefox “reader view”. And some super advanced LLM powered text compression so you can easily store a copy of 20% of them on your PC to share P2P.

Much of the information on the internet could vanish within months if we face some global economic crisis.

source

Sort:hotnew top

rottingleaf@lemmy.world ⁨2⁩ ⁨months⁩ ago

And some super advanced LLM powered text compression so you can easily store a copy of 20% of them on your PC to share P2P.

Nothing can be that advanced and zstd is good enough.

The idea is cool. With pure p2p exchange being a fallback, and something like trackers in bittorrent being the main center to yield nodes per space (suppose, there’s more than one such archive you’d want to replicate) and per partition (if it’s too big, then maybe it would make sense, but then some of what I wrote further should be reconsidered).

The problem of torrents and other stuff is that people only store what’s interesting to them.

If you have to store one humongous archive, and be able to efficiently search it, and avoid losing pieces - then, I think, you need partitioned roughly equal distribution of it over nodes.

The space of keys (suppose it’s hashes of blocks of the whole) is partitioned by prefix so that a node would store equal amount of blocks of every prefix. And first of all the values closest to the node’s identifier (a bit like in Kademlia) should be stored of those under that space. OK, I’m thinking the first sentence of this paragraph might even be unneeded.

The data itself should probably be in some supercool format where you don’t need to have it all to decompress only the small part you need, just the beginning with the dictionary and some interval.

There should also be, as a separate functionality of this system, search by keywords inside intervals, so that search would yield intervals where a certain keyword is encountered. With nodes indexing continuous intervals they can decompress and responding to search requests by those keywords. Ideally a single block should be possible to decompress having the dictionary. I suppose I should do my reading on compression algorithms and formats.

Probably search function could also involve returning Google-like context. Depending on the space needed.

Would also need some way to reward contribution, that is, to pay a node owner for storing and serving blocks.

source
- AlteredEgo@lemmy.ml ⁨2⁩ ⁨months⁩ ago
  I was thinking of the Gemini (protocol) - Wikipedia but a bit more elaborate, and yeah I’m not sure how far text compression can be pushed. But I think LLMs could be useful and help reach a critical mass of being able to download and store tons of articles.
  
  Torrent V2 and other official extensions Updating Torrents Via DHT Mutable Items allow some ways to do this. Like hosting a youtube channel and updating it with new videos, without any new network protocol. Well theoretically since this isn’t yet supported well in torrent clients or lib.
  
  I’ve been thinking how this would work for a while but it’s kind of frying my brain haha. Like a P2P version control database that is truly open source. For articles and blog posts, but also for metadata for manhwa, movies, tv, anime, books etc. Like anybody can download and use it and share, edit, fork it without needing to set up some complex server. Something that can’t be taken down, sold or if abandoned someone else can just pick it up and you can merge different curated versions and additions easily.
  
  You’d basically want a “most popular items of the past X time” that almost everybody downloads, and then the whole database split into more and more exotic or obscure items. So everybody has the popular stuff but also has to host some exotic items so they don’t get lost. And it has to be easy to use and install.
  
  But the whole database has to be small and compact and compressed enough that you can still easily host it on a normal HDD. It the current times with economic and political dangers lurking this would be a crucial bit of IT infrastructure.
  
  source
  - rottingleaf@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Gemini is just a web replacement protocol. With basic things we remember from olden days Web, but with everything non-essential removed, for a client to be doable in a couple of days. I have my own Gemini viewer, LOL.
    
    This for me seems a completely different application from torrents.
    
    I was dreaming for a thing similar to torrent trackers for aggregating storage and computation and indexing and search, with search and aggregation and other services’ responses being structured and standardized, and cryptographic identities, and some kind of market services to sell and buy storage and computation in unified and pooled, but transparent way (scripted by buyer\seller), similar to MMORPG markets, with the representation (what is a siloed service in modern web) being on the client native application, and those services allowing to build any kind of client-server huge system on them, that being global. But that’s more of a global Facebook\Usenet\whatever, a killer of platforms. Their infrastructure is internal, while their representation is public on the Internet. I want to make infrastructure public on the Internet, and representation client-side, sharing it for many kinds of applications. Adding another layer to the OSI model, so to say, between transport and application layer.
    
    For this application:
    
    I think you could have some kind of Kademlia-based p2p with groups voluntarily joined (involving very huge groups) where nodes store replicas of partitions of group common data based on their pseudo-random identifiers and/or some kind of ring built from those identifiers, to balance storage and resilience. If a group has a creator, then you can have replication factor propagated signed by them, and membership too signed by them.
    
    But if having a creator (even with cryptographically delegated decisions) and propagating changes by them is not ok, then maybe just using whole data hash, or it’s bittorrent-like info tree hash, as namespace with peers freely joining it can do.
    
    Then it may be better to partition not by parts of the whole piece, but by info tree? I guess making it exactly bittorrent-like is not a good idea, rather some kind of block tree, like for a filesystem, and a separate piece of information to lookup which file is in which blocks. If we are doing directory structure.
    
    Then, with freely joining it, there’s no need in any owners or replication factors, I guess just pseudorandom distribution of hashes will do, and each node storing first partitions closest to its hash.
    
    Now thinking about it, such a system would be not that different from bittorrent and can even be interoperable with it.
    
    There’s the issue of updates, yes, hence I’ve started with groups having hierarchy of creators, who can make or accept those updates. Having that and the ability to gradually store one group’s data to another group, it should be possible to do forks of a certain state. But that line of thought makes reusing bittorrent only possible for part of the system.
    
    The whole database is guaranteed to be more than a normal HDD (1 TB? I dunno). Absolutely guaranteed, no doubt at all. 1 TB (for example) would be someone’s collection of favorite stuff, and not too rich one.
    
    source