2015-07-26

Play The Maryland Extra Credit Problem

A screenshot of a class problem from the University of Maryland has been doing the rounds. The teacher invites the students to vote to receive extra credit:

Here you have the opportunity to earn some extra credit on your final paper grade. Select whether you want 2 point or 6 points added onto your final paper grade. But there's a small catch: if more than 10% of the class selects 6 points, then nobody gets any extra points. Your responses will be anonymous to the rest of the class, only I will see the responses.

This situation is a little different than the Prisoner's Dilemma made famous in Game Theory because nobody stands to lose anything. All outcomes are either neutral or a gain. From that vantage point the best course of action is to always vote for six points. However, I think there are some political dynamics at play that might alter your decision to decrease the likelihood of the neutral outcome.

Depending on your expected final grade, here's my advice on how to play:

If you're a troll then go for six points. #YOLO

If you're a high scoring student (A / A+) then select 2 points. You don't need the extra marks. You're doing so well that you're above all this competitive stuff. Give the other people a chance for a few extra points. Unless, you reckon that people should have to earn their place and you think the mountain top has room for only you... then go for 6 points.

Jo Beeplus: Always go 6 points. You stand to lose nothing if it ends up nobody gets any bonus points and you might just get the six points to your grade into A territory. You work hard, you deserve a shot at an A right?

Jo SeePlus: Go for the 2 points. You might need the bonus marks to ensure you pass so you don't want to risk getting zero bonus points.

Jo "NearFail": Go for six points. You might get them and two points or zero points won't make a difference.

Jo "TotalFailure" You're so far behind you should give others a shot to shine: go for 2 points. Unless you're spiteful.

Bonus evilness karma if you vote for 6 points while talking up a big game regarding the virtues of choosing 2 points.

Go play!

2015-07-04

Ceph and BitRot

BitRot is the tendency for data to degrade over time on storage devices. CERN reports error rates are at the 10-7 level, so BitRot is a significant problem. This short article talks about how to deal with BitRot on Ceph clusters.

Ceph's main weapon against BitRot is the deep scrubbing process. Deep scrubbing verifies data in a placement group against its checksum. If an object fails this test then the placement group is marked inconsistent and the administrator should repair it. Note that deep-scrub only detects and inconsistency and does not attempt an automatic repair. By contrast, a normal scrub only checks object sizes and attributes.

Deep scrubbing is resource intensive and can cause a noticeable performance drop. You can temporarily disable scrubbing and deep-scrubbing:

ceph osd set noscrub
ceph osd set nodeep-scrub
And the re-enable scrubbing with:
ceph osd unset noscrub
ceph osd unset nodeep-scrub

The configuration options for scrubbing allow the administrator to suggest how quiet the cluster should be before initiating a scrub, how long the cluster is allowed to go before it must scrub, and how many scrubs can run in parallel.

It's also technically possible to manually trigger scrubs via the command line. This means that an administrator that doesn't mind writing code could scrub the placement groups in different pools according to different policies. This article scrubs on a seven day cycle at night-time.

Another source of data degradation can occur in RAM before the data is written into the primary placement group. The best way to guard against this happening is to use ECC RAM. This particular problem is not unique to Ceph but is exacerbated because the nature of clusters is that they increase the number of potential corruption point in the supply-chain between application and storage device.

Ceph uses an underlying filesystem as a backing store and this in turn sits on a block device. There are choices an administrator might make in those layers to also help guard against BitRot - but there are also performance trade offs. For example ext4 and XFS do not protect against BitRot but ZFS and btrfs can if they are configured correctly. Ars Technica has an excellent article on the topic called BitRot and Atomic COWs: Inside next-gen filesystems. Also, don't expect that RAID will detect or repair BitRot.

For more technical details you can read this Q&A post by Sage Weil of InkTank from the Ceph mailing list.