All week chasing VMware cluster problems. It'll now roll onto Monday. It always sucks when things fail in ways so subtle the fault tolerance doesn't catch it and you need to start pressing buttons.

A hypervisor has had some weird storage failure where a couple of devices have just vanished, but the VSAN stuff hasn't properly accommodated this fact, so it's just sitting there protesting.

Follow

If it considered the node totally failed, it would all be so much easier.

· · Web · 1 · 0 · 1

@sullybiker Ceph has the opposite problem: if a node sneezes it starts recovering data immediately unless you go out of your way to tell it not to.

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!