2016-01-22

Evicting and Flushing from Ceph Cache Tier/Cache Pools

Disaster! The OSDs backing my cache pools were reporting full. This occurred because the node carrying most of the backing pools crashed leaving insufficient replicas of the backing pool. Even when that node was brought back online, recovery operations going to take a long time. Here's what I did. The first thing was to set an absolute maximum size to the cache tier pool:
ceph osd pool set cachedata target_max_bytes ....

The next thing was to start manually evicting objects from the pool. (Flushing writes dirty objects back to the backing pool, evicting boots out clean objects). Flushing would need the backing pool up and able to accept new writes - but evicting would not. Evicting would free up space in the cache tier without the backing pool having stabilised.

The standard command to evict objects is:

rados -p cachepool cache-flush-evict-all

I found that locked up on me, complaining that some objects were locked. I also tried another variant, but that was not shrinking the pool either.

rados -p cachepool cache-try-flush-evict-all

My next trick was to use parallel (apt-get install parallel if you don't have it!) to try evicting objects one by one. I'd run this script until satisfied that the cache pool had shrunk to a reasonable size and then Ctrl-C to terminate the evictions.

rados -p cachepool ls | parallel -j16 rados -p cachepool cache-try-evict {}

What this command does is list the contents of cachepool and take each entry and spawn an instance of rados to try to evict each object sepearately. The -j16 means to spawn 16 rados processes at a time.

For completeness, the other cache flushing or evicting commands that rados recognises are (from here):

cache-flush 
cache-try-flush 
cache-evict 
cache-flush-evict-all
cache-try-flush-evict-all

I believe that variants with "try" in the name are non-blocking while the rest will block.

Soon the SSD OSDs that back my cache tier were back under warning levels. My cluster continued recovering overnight and all the data lived happily ever after (at least until next time).