Comment on How to self-host a distributed git server cluster?
marauding_gibberish142@lemmy.dbzer0.com 6 days agoI think I messed up my explanation again.
The load-balancer in front of my git servers doesn’t really matter. I can use whatever, really. What matters is: how do I make sure that when the client writes to a repo in one of the 5 servers, the changes are synced in real-time to the other 4 as well? Running rsync every 0.5 second doesn’t seem to be a viable solution
solrize@lemmy.world 6 days ago
Why do you want 5 git servers instead of, say, 2? Are you after something more than high availability? Are you trying to run something like GitHub where some repos might have stupendous concurrent read traffic? What about update traffic?
What happens if the servers sometimes get out of sync for 0.5 sec or whatever, as long as each is in a consistent state at all times?
Anyway my first idea isn’t rsync, but rather, use update hooks to replicate pushes to the other servers, so the updates will still look atomic to clients.
What real world workload do you have, that appeared suddenly enough that your devs couldn’t stay in top of it, and you find yourself seeking advice from us relatively clueless dweebs on Lemmy? It’s not a problem most git users deal with. Git is pretty fast and most users are ok with a single server and a backup.
marauding_gibberish142@lemmy.dbzer0.com 6 days ago
Thanks for the comment. There’s no special use-case: it’ll just be me and a couple of friends using it anyway. But I would like to make it highly available. It doesn’t need to be 5 - 2 or 3 would be fine too but I don’t think the number would change the concept.
Ideally I’d want all servers to be updated in real-time, but it’s not necessary. I simply want to run it like so because I want to experience what the big cloud providers run for their distributed git services.
Well the other choice was Reddit so I decided to post here (Reddit flags my IP and doesn’t let me create an account easily). I might ask on a couple of other forums too.
Thanks
solrize@lemmy.world 6 days ago
I see, fair enough. Replication is never instantaneous, so do you have definite bounds on how much latency you’ll accept? Do you really want independent git servers online? Most HA systems have a primary and a failover, so users only see one server. If you want to use Ceph, in practice all servers would be in the same DC. Is that okm
I think I’d look in one of the many git books out there to see what they say about replication schemes. This sounds like something that must have been done before.
marauding_gibberish142@lemmy.dbzer0.com 6 days ago
Well it’s a tougher question to answer when it’s an active-active config rather than a master slave config because the former would need minimum latency possible as requests are bounced all over the place. For the latter, I’ll probably set up to pull every 5 minutes, so 5 minutes of latency (assuming someone doesn’t try to push right when the master node is going down).
I don’t think the likes of Github work on a master-slave configuration. They’re probably on the active-active side of things for performance. I’m surprised I couldn’t find anything on this from Codeberg though, you’d think they have already solved this problem and might have published something. Maybe I missed it.
I didn’t find anything in the official git book either, which one do you recommend?