2016-01-22

Ceph Cluster Diary January 2016

Another node is added to my Ceph cluster. This is a second-hand dual Pentium-D with 8GB of RAM and a 250GG SATA HDD. Removed from the box a dual-head Quadro graphics card and a RAID controller plus SCSI2 drives that were not compatible with Linux.

Cluster speed is a concern, so three 220GB drives were installed in the box and a writeback cache tier was created around the cephfs data pool. The online examples were very useful.

At first, the metadata pool was also cached. Though I begun to get cluster thrashing problems. I mitigated this, and some data-risk, by removing the metadata cache, but adding a crush rule to have two replicas of the metadata on SSDs and the final copy on HDDs.

There were also numerous page faults, so I gave up some space on one of the SSDs for a Linux swap partition and most of the page faults disappeared.

Most file operations are about three times faster than before. When the metadata was also cached there was about a 7x speed up, but the cluster was less reliable. My backing storage devices are mainly external USB hard drives running on old USBv1 hardware so any speed up is welcome.

The result is a much more reliable cluster that gives consistent enough speed to run virtual hard-drive files for some Virtual Machines that I occasionally run on my main desktop. Previously, those Virtual Machines had a tendency to crash when run from cephfs.

Early on, I did have a problem with the cache filling up, but fixed that by applying more aggressive cache sizing policies. In particular, I set the target_max_bytes to 85% of my SSD size.

ceph osd pool set cachedata target_max_bytes .....

I very pleased with the setup now. One or two more tweaks and I might be ready to begin retiring my dedicated NAS box and switch all my network storage to ceph.