Comment on I keep waffling on Proxmox. Sell me. For or against.
tmjaea@lemmy.world 1 day agoPlease elaborate. How does it handle ssh keys? And what is fragile regarding corosync?
Comment on I keep waffling on Proxmox. Sell me. For or against.
tmjaea@lemmy.world 1 day agoPlease elaborate. How does it handle ssh keys? And what is fragile regarding corosync?
dbtng@eviltoast.org 23 hours ago
SSH key management in PVE is handled in a set of secondary files, while the original debian files are replaced with symlinks. Well, that’s still debian. And in some circumstances the symlinks get b0rked or replaced with the original SSH files, the keys get out of sync, and one machine in the cluster can’t talk to another. The really irritating thing about this is that the tools meant to fix it (pvecm updatecerts) don’t work. I’ve got an elaborate set of procedures to gather the certs from the hosts and fix the files when it breaks, but it sux bad enough that I’ve got two clusters I’m putting off fixing.
Corosync is the cluster. It’s a shared file system that immediately replicates any changes to all members. That’s essentially anything under /etc/pve/. Corosync is very sensitive. I believe they ask for 10ms lag or less between hosts, so it can’t work over a WAN connection. Shit like VM restores or vmotion between hosts can flood it out. Looks fukin awful when it goes down. Your whole cluster goes kaput.
You can put cororsync on its own network, but you obviously need a network for that. All it does is push around this set of config files, so a dedicated NIC is overkill, but in busy environments, you might wind up resorting to that. I have my systems provisioned on a dedicated corosync vlan and also use a secondary IP, but corosync is too dumb to fall back to the secondary if the primary is still “up”, regardless of whether its actually communicating, so I get calls on my day off about “the cluster is down!!!1” when people restore backups.
tmjaea@lemmy.world 22 hours ago
Thanks for your answer.
I use proxmox since version 2.1 in my home lab and since 2020 in production at work. We did not have issues with the ssh files yet. Also corosync is working fine although it shares its 10g network with ceph.
In all that time I was not aware of how the certs are handled, despite the fact I had two official proxmox trainings. Ouch.
dbtng@eviltoast.org 20 hours ago
Cool.
Here. SSH key issues. There was a huge forum war.
…proxmox.com/…/ssh-keys-in-a-proxmox-cluster-reso…
But its still a thing. That still needs to be fixed by a human. Today that’s me.
Regarding CEPH and corosync on the same network … well I’m just getting started with that now. I do have them on different vlans, but its the same 10gb set of nics. I’m hoping if it gets really lousy, my netadmin can prioritize the corosync vlan. I’ll burn that bridge when I come to it.