2014-08-15

How I added my LVM volumes as OSDs in Ceph

This article expands on how I added an LVM logical volume based OSD to my ceph cluster. It might be useful to somebody else who is having trouble getting
ceph-deploy osd create ... 
or
ceph-deploy osd prepare ...
to work nicely.

Here's how Ceph likes to have its OSDs setup. Ceph OSDs are mounted by OSD.id in
/var/lib/ceph/ceph-*
. Within that folder should be a file called
journal
. The journal file can either live on that drive or be a symlink. That symlink should be to another raw partition (e.g. partition one on an SSD) though it does work with a symlink to a regular file too.

Here's a run-down of the steps that worked for me:

First
mkfs
the file system on the intended OSD data volume. I use XFS because BTRFS would add to the strain on my netbook but YMMV. After the mkfs is complete you'll have a drive with an empty filesystem.

Then issue
ceph osd create
which will return a single number: this is your OSDNUM. Mount your OSD data drive to
/var/lib/ceph/osd/ceph-{OSDNUM}
remembering to substitute in your actual OSDNUM. Update your
/etc/fstab
to automount the drive to the same folder on reboot. (Not quite true for LVM on USB keys: I have noauto in fstab and a script that mounts the LVM logical volumes later in the boot sequence).

Now prepare the drive for Ceph with
ceph-osd -i {OSDNUM} --mkfs --mkkey
. Once this is done you'll have a newly minted but inactive OSD complete with a shiny new authenication key. There will be a bunch of files in the filesystem. You can now go ahead and symlink the journal if you want. Everything up to this point is somewhat similar to what
ceph-deploy osd prepare ..
does.

Doing the next steps manually can be a bit tedious so I use ceph-deploy.
ceph-deploy osd activate hostname:/var/lib/ceph/osd/ceph-{OSDNUM}



There's a few things that might go wrong.

If you've removed OSDs from your cluster then
ceph osd create
might give you a OSDNUM that is free in the CRUSH map but still has an old
ceph auth
entry. That's why you should
ceph auth del osd.{OSDNUM}
when you delete an OSD. Another useful command is
ceph auth list
so you can see if there's any entries that need cleaning up. The key in the
ceph auth list
should match the key in
/var/lib/ceph/osd/ceph-{OSDNUM}
. If it doesn't then delete the auth entry with
ceph auth del osd.{OSDNUM}
. The
ceph-deploy osd activate ... 
command will take care of adding correct keys for you but will not overwrite an existing [old] key.

Check that the new OSD is up and in the CRUSH map using
ceph osd tree
. If the OSD is down then try restarting it with
/etc/init.d/ceph restart osd.{OSDNUM}
. Also check that the weight and reweight columns are not zero. If they are then get the CRUSHID from
ceph osd tree
. Change the weight with
ceph osd crush reweight {CRUSHID} 
. If the reweight column is not 1 then set it using
ceph osd reweight {CRUSHID} 1.0


(Here is more general information about how OSDs can be removed from a cluster, the drives joined using LVM and then added back to the cluster).