2016-05-16

Verisimilitude and Decision-Making

Last week IVX presented “Speed Mazer IVX” at the Waikato University Open Day 2016. The creation of Speed Mazer IVX was a good time to reflect on how verisimilitude forms perhaps the key criteria for our production decisions.

Verisimilitude is the appearance of being true or real. Verisimilitude is the seduction that teases the audience willingly into the world of the work. Things present or things-expected-but-missing will affect verisimilitude. Verisimilitude is the macro-clothing in which the work dresses. It is an overall sensory experience that emerges from the aesthetics of individual elements.

There are always pragmatic considerations when creating installation works. No concept proceeds cleanly from plan to execution intact. In fact, the sign of a great concept is that there are more ideas for how to improve the work then there are available resources to do them all. Why is that?

The obvious answer is that not all ideas are good ideas so having more ideas means that creators do not become wedded to bad ideas just because they are the only ideas. Comparing ideas against each other forces greater articulation of the desired verisimilitude.

Speed Mazer IVX is a player vs. player maze racer controlled by guitar hero instruments. The simplicity of the concept meant quick buy-in by the audience, a sense of competition and an element of mastery. Having randomly generated mazes meant that players could not develop muscle memory to complete the task.

Initially, we had cosmos inspired backgrounds. IVX loves cosmos pictures – indeed, our last year’s Open Day installation was “Write Your Name Among the Starts”. However, during testing, it was apparent that the cosmos background images distracted from the competitive feeling of racing through a maze. We went with black backgrounds to give an 8-bit arcade feel.

We tried to add sound – which always goes a long way to elevating verisimilitude. However, there were problems with the sound library that were going to take considerable time to resolve. We instead elected to put the time into visual aspects.

The maze is bumped when a player hits a wall. The bump reaction signals to the player that they have hit a wall. The maze feels more real because bumps behave with believable physics. The bump reactions happen in 3D, but the 3D renderer limited what we could do with the walls (without more work). The positive verisimilitude effect of the bumps in 3D was greater than the verisimilitude of nicer looking walls, so we kept the bump effects.

Particles are spat out opposite to the direction of player movement. Some of particles are added to the opposing player’s maze, and each player spits their colour. When players are concentrating on their own maze then still get feedback that the other player is nearby. Particles also help build up a visual intensity proportional to the speed at which the player moves through the maze. Getting the right feel to the particle fields were the subject of much experimentation.

Initially, we wanted to have the player’s represented by a shape that would change direction as the player traversed the maze. While this might have been a great idea, it was judged as having a lesser effect on the verisimilitude so was not attempted as the deadline approached. Other ideas; random maze wall art, generative backgrounds, power-ups, a cheat mode, and more visual prompts similarly prioritised.

The functional aspects - making a heads-up maze racer than one side could win - did not take long to program. However, the bulk (probably 75% or more) of the coding time went into making tuning the verisimilitude to create the right experience for the audience.

Reflecting on our decision-making process during the creation of Speed Mazer IVX made me realise the central role of verisimilitude is to our work. We dropped ideas that broke the verisimilitude and prioritised our features against our limited resources (time) according to verisimilitude. I look forward to advancing this thinking more in future IVX work.

2016-01-22

Evicting and Flushing from Ceph Cache Tier/Cache Pools

Disaster! The OSDs backing my cache pools were reporting full. This occurred because the node carrying most of the backing pools crashed leaving insufficient replicas of the backing pool. Even when that node was brought back online, recovery operations going to take a long time. Here's what I did. The first thing was to set an absolute maximum size to the cache tier pool:
ceph osd pool set cachedata target_max_bytes ....

The next thing was to start manually evicting objects from the pool. (Flushing writes dirty objects back to the backing pool, evicting boots out clean objects). Flushing would need the backing pool up and able to accept new writes - but evicting would not. Evicting would free up space in the cache tier without the backing pool having stabilised.

The standard command to evict objects is:

rados -p cachepool cache-flush-evict-all

I found that locked up on me, complaining that some objects were locked. I also tried another variant, but that was not shrinking the pool either.

rados -p cachepool cache-try-flush-evict-all

My next trick was to use parallel (apt-get install parallel if you don't have it!) to try evicting objects one by one. I'd run this script until satisfied that the cache pool had shrunk to a reasonable size and then Ctrl-C to terminate the evictions.

rados -p cachepool ls | parallel -j16 rados -p cachepool cache-try-evict {}

What this command does is list the contents of cachepool and take each entry and spawn an instance of rados to try to evict each object sepearately. The -j16 means to spawn 16 rados processes at a time.

For completeness, the other cache flushing or evicting commands that rados recognises are (from here):

cache-flush 
cache-try-flush 
cache-evict 
cache-flush-evict-all
cache-try-flush-evict-all

I believe that variants with "try" in the name are non-blocking while the rest will block.

Soon the SSD OSDs that back my cache tier were back under warning levels. My cluster continued recovering overnight and all the data lived happily ever after (at least until next time).

Ceph Cluster Diary January 2016

Another node is added to my Ceph cluster. This is a second-hand dual Pentium-D with 8GB of RAM and a 250GG SATA HDD. Removed from the box a dual-head Quadro graphics card and a RAID controller plus SCSI2 drives that were not compatible with Linux.

Cluster speed is a concern, so three 220GB drives were installed in the box and a writeback cache tier was created around the cephfs data pool. The online examples were very useful.

At first, the metadata pool was also cached. Though I begun to get cluster thrashing problems. I mitigated this, and some data-risk, by removing the metadata cache, but adding a crush rule to have two replicas of the metadata on SSDs and the final copy on HDDs.

There were also numerous page faults, so I gave up some space on one of the SSDs for a Linux swap partition and most of the page faults disappeared.

Most file operations are about three times faster than before. When the metadata was also cached there was about a 7x speed up, but the cluster was less reliable. My backing storage devices are mainly external USB hard drives running on old USBv1 hardware so any speed up is welcome.

The result is a much more reliable cluster that gives consistent enough speed to run virtual hard-drive files for some Virtual Machines that I occasionally run on my main desktop. Previously, those Virtual Machines had a tendency to crash when run from cephfs.

Early on, I did have a problem with the cache filling up, but fixed that by applying more aggressive cache sizing policies. In particular, I set the target_max_bytes to 85% of my SSD size.

ceph osd pool set cachedata target_max_bytes .....

I very pleased with the setup now. One or two more tweaks and I might be ready to begin retiring my dedicated NAS box and switch all my network storage to ceph.

2015-12-20

Encouraging Good Discussion on YikYak

YikYak, the anonymous location-based social media app, can be used for serious discussion because arguments identities change from thread to thread. This article goes over the upvoting and downvoting guidelines I use to encourage better discussion behaviour on YikYak. Maybe you will agree with my decisions, maybe not. At the very least I hope my reasoning encourages you to consider how your actions affect discourse.

In serious discussions, there are some who downvote comments people based on disagreement alone. The problem is that YikYak has an automated filtering system that removes comments once they hit five downvotes. Only five downvotes is a tiny group of like-minded people to thought police a discussion into a useless echo chamber. So, I follow these rules when commenting on YikYak.

  1. I hardly ever downvote. I do downvote needlessly hurtful comments. Comments that contribute nothing or comments advocating self-harm in posts where people are asking for help and advice.
  2. I upvote any downvoted comment that is relevant to the discussion regardless on whether or not I agree with the statement. If a substantive comment has downvotes, then I upvote in response.
  3. I upvote comments that I like, agree with, or find witty and that contribute to the discussion.
  4. I do not downvote a comment that is addressing the topic of conversation even when I do not agree. I do not upvote these comment either.
  5. I do not downvote incorrect facts. My reasoning is that if one commenter believes that fact then others might too so it is usually better to false facts with another comment. Thought policing a false fact out of the discussion misses the chance to change the minds of those that hold it.

Encountering uncomfortable opinions is an important part of a functioning democracy. While discussing issues, we learn about each other and find ways to live together in relative harmony despite differences in our beliefs. Platforms like YikYak enable people to discuss ideas openly. Please join in upvoting good discussion regardless of whether or not you agree with it.

2015-10-02

Battle for Zendikar Pre-release Experience

I played at the Battle for Zendikar pre-release weekend. It’s been some time since I’ve had time and energy to attend a pre-release so I was glad to go. I’m particularly looking forward to the gorgeous full art lands. I played a sealed pre-release event; the first of the weekend where nobody else really knows the cards.

The pre-release back was a random foil rare promo card, a life counter, six boosters and a deck box. There is no seeded booster. The deck had a little hollow in the bottom that holds the life counter. All the lands in the deck are full art lands. There are also special expedition lands – but I was not lucky enough to get one. My foil promo card was Smothering Abomination, but I did not have the supporting cards to play him.

I was quite tired and concerned about my alertness so I wanted a deck that didn’t leave me too many difficult decisions. Normally the go to colour for easy decks is green, but this time my green had mostly defensive cards and some ramp with only one or two of the Ally cards. I elected to go for an aggressive Boros (Red-White) Allies deck. I figured that the local meta would have enough people would try to play the big Eldrazi creatures. Here’s what I played.

1x Kitesail Scout
2x Reckless Cohort
2x Kozilek’s Sentinel
1x Kor Castigator
2x Makindi Patrol
1x Makindi Sliderunner
1x Firemantle Mage
1x Ondu Champion
1x Belligerent Whiptail
1x Ondu Greathorn
1x Vestige of Emrakul
1x Resolute Blademaster
1x Ghostly Sentinel
1x Ulamog’s Despoiler
1x Angel of Renewal
3x Gideon’s Reproach
1x Encircircling Fissure
1x Touch of the Void
1x Turn Against

8x Plains
8x Mountains

That’s nine ally cards in total and some with awesome Rally abilities. I kept the mana curve low and focused on cheap creatures and cheap removal; in the mid-game my ideal play would be to drop a land for a landfall trigger, drop a cheap ally and have two mana left open to Reproach a blocker. This is a deck that would win by aggression and forcing the opponent to make unprofitable blocks before they could stabilise their board. The strategy needs to get enough board presence to swarm around any big blockers mid-game and win before going into a late game. But, it delays the late game by forcing the opponent to play defensively and thus delay their ramp.

Certain creatures did not synergise well in the deck except they kept the curve low; Kitesail Scout and the Kozilek’s Sentinels. I originally had an extra land and Vestige of Emrakul, but dropped these after losing the first match in order to lower my curve. I had an Evolving Wilds in my pool but did not play it – aggro decks cannot afford the turn it takes to mana fix. My pool also contained a Quarantine Field but this was also too expensive to play. I ran Encircling Fissure instead of Sheer Drop because a Fog effect is more useful when swarming and I had enough removal with the Gideon’s Rebukes. The Turn Against was for an Act of Treason effect – also useful for mid-game swarming. I never expected to have the mana where the awaken abilities would become relevant.

My first match was against a person playing a better version of my deck. I lost 2-nil, some of it unlucky draws, the rest due to poorer deck quality. I had faith my deck about as good as I could make from my pool and so tweaked it only a little after this match; removing a land and lowering the curve.

The second match I won 2-1. I had an unwritten rule that I should mulligan any hand of 7 or 6 cards that did not put two creatures on the board by the end of turn three. The new mulligan rule made mulligans more bearable - I mulliganed the most in this match, but I think I mulliganed at least once in every match. My opponent switched decks after his first loss. I appreciated the sense of fun that brought to the match though I doubt a six booster pool would have the card quality to support two forty card decks without some overlap between them.

The third match I won 2-nil. My deck played out perfectly and the rally triggers stacked up for some brutal combinations, especially with the one or two landfall creatures getting their buffs in the same turn. This guy played a sixty card deck evenly balanced around three colours. There is not many six booster sealed pools that have the card quality to support sixty card deck. Typically you might go two colours with a third colour splash for something particularly good – but three colours will need some serious mana fixing. He did get colour screwed in one game. Still, pre-release is a good time to mess about and try something you otherwise wouldn’t normally do. I suspect he might have tried to provide support for Converge cards.

The flight contained five matches but I had only time to attend three so I dropped after the third match. That was enough to win a participation prize of one booster pack. I don’t think I ever played a six drop and I won one game with only three mana. Since my games were relatively quick I also had time for some friendly games between rounds. The deck smashed face there too against slower decks.

The rally ability is amazing once some board presence has built up. I ran nine ally cards in total and five of them grant abilities to others. My opponents remarked that Makindi Patrol’s Vigilance ability seemed very unfair. Firemantle Mage granting Menace was great for swarming around defenders in the mid-game. The amount of damage that a Resolute Blademaster granting double-strike can cause is insane – especially if there are cheap beasts on the battlefield that have been pumped that turn by a landfall trigger. The most brutal combo was Resolute Blademaster and Ondu Champion giving double strike and trample to my team, though Firemantle Mage and Resolute Blademaster giving Menace and Double-strike won a game for me too.

I wasn’t lucky enough to get an expedition land, but it was still a fun day of magic.

2015-08-30

Ceph Cluster Thrash and Rebooting Nodes

I have a home cluster with low traffic volumes but terabytes of data - mostly photos. Ceph provides peace of mind that my data is resilient against failure, but my nodes are made with recycled equipment, so when a node reboot places considerable stress on the cluster to bring it back in.

Cluster thrash occurs when activity on a Ceph cluster causes it to start timing out OSDs. It's bad because recovery processes will begin - further stressing the cluster and potentially also causing problems. Then some of those outed OSDs will attempt to rejoin causing yet more problems. The usual way to avoid cluster thrash is to properly resource the cluster in the first place to handle the loads. In my case that's not a justifiable expense - my nodes are recycled (HDs are new) and my normal load doesn't stress the cluster much until a node reboots.

Here's how to deal with cluster thrash.

Tell ceph to not rebalance the cluster due to OSDs leaving. If possible, do this before rebooting your storage node, but it can be done at any time.
ceph osd set nodown
ceph osd set noout

Temporarily disable access to the cluster. My cluster predominantly serves files so stopping the network file system daemon does the job. Also unmount cephfs.
sudo service samba stop
sudo umount -lf /mnt/ceph
Your file system might not be windows networking (so not samba). Run this on the node that serves the files - which is not necessarily the same as the ceph mds or monitor nodes.

Also, shutdown the MDS service on each node that runs it. MDS is the process the oversees cephfs.
sudo service ceph stop mds 

The reason for stopping file access is to prevent load put on the cluster while it is recovering and the avoid potentially causing different versions of data to exist on the cluster. Ceph handles the latter situation very well on its own but consumes some effort in doing so. If disabling cluster access is not an option then see the tips at the end of the article.

Temporarily disable scrubbing since that takes resources:
ceph osd set noscrub
ceph osd set nodeep-scrub

Add the OSDs back into the cluster one or two at a time and allow the cluster to stabilise as much as it can in between. Newly added OSDs will go through a process of peering. Wait until all placement groups have finishing peering before adding another OSD to the cluster.
sudo service ceph start osd.<X>

This guide wouldn't be complete without listing the actions to re-enable normal operations on the cluster.
Re-start the MDS service. I prefer to do this first because it takes some time before it's ready to serve my cephfs again. I watch the monitor until the IO ceases - or I keep attempting to mount cephfs until it eventually succeeds.
sudo service ceph start MDS

Remount cephfs
sudo mount /mnt/ceph
Restart the samba service
sudo service samba start

At this point the cluster is again serving files, but we still need to re-enable scrubbing and allow the cluster to re-balance when there's errors. Never run a cluster without deep-scrubbing because that's your defense against data corruption.
ceph osd unset noscrub
ceph osd unset nodeep-scrub
ceph osd unset noout
ceph osd unset nodown
And you're done.

If you cannot afford to disable file access then all of the other tips might still be useful. In addition, if you have three or more replicas then you can also temporarily lower the minimum number of placement group replicas the cluster requires to be available. You should not put this lower than (number_of_replicas  / 2) + 1 or you risk data inconsistency. Check the number of replicas by getting the size of the pools (in my case 3), record the usual value for min_size and then use set to change the min_size temporarily.
ceph osd pool get <poolname> size
ceph osd pool get <poolname> min_size
ceph osd pool set <poolname> min_size 2
If you're using a fairly standard cephfs setup then there are actually two pools called: data and metadata. Change the min_size on both of them but always check the size of each pool first because they might be different. I run my more replicas of my metadata pool. 
Don't forget to set the min_size back to whatever value you normally have it set too once the cluster stabilises.

It does look like a lot of work to reboot a node. In practice I don't reboot nodes all that often and usually I don't need even half of the above tips to bring my current cluster back without cluster thrash. Since going away from USB flash drives into HD-spinners, I no longer have the cluster thrash problems I once did. Though, following the above tips does bring OSDs back into the cluster much quicker than letting it all happen by itself.

Happy cephing.

2015-08-22

Bulletproof Processing: try…catch

Processing is stable and crashes are infrequent. However, my experience doing performances and exhibitions for IVX has shown me it is handy to add a bit of extra resiliency to sketches. This is especially true when I expect to run for extended periods of time or when the sketch runs largely unattended. The first tip I have is to use a Java try…catch structure inside the draw() method.

Since Processing is based in Java then why not take advantage of Java’s native error handling. A try…catch structure attempts to execute all code within the try part of the structure and only executes the catch block if there is a problem. This particular patterns helps if the errors are transient, meaning that the error will generally go away on its own.

void draw() {
    try {

        // Normal drawing code goes here


    } catch (Exception e) {
        println(“draw(): “ + e.getMessage());
        // Pause for a quarter second. Hope problem goes away
        try {
            Thread.currentThread().sleep(250);
        } catch (Exception e1) {
        }
    }
}

This particular skeleton will catch any exceptions that occur in the draw() method, print them and then pause for 250 milliseconds. That quarter second is usually enough for a transient error to correct itself – perhaps a file loading or some memory becoming available. Either way, the sketch will pause for a short time and then attempt to resume itself. Without this exception handing then any Exception will cause the sketch to stop running.

Note that the sleeping function itself can throw an exception so Java forces us to declare a try…catch block – even though the catch block is empty.

Using try…catch in the setup() method might be useful if the code will be run by others, but in that case make extra sure that error messages are informative. I would prefer that setup() fails outright since most setup() errors are not transient.

So, you’ll see that it does not take much extra code to add a good amount of resiliency to a Processing sketch. Give it a try. Java's website has good resources that take a more in-depth look at exception handling.

2015-08-16

Generative Advertising with Feedback: Bahio Coffee

M&C Saatchi have combined generative design with feedback analysis to try evolve the most engaging ad. The campaign is called Bahio coffe. I think this is an interesting idea and would like to talk about the good, the bad and the ugly. It has been written about here and has a good website to explore progress here. A two minute video overview is on YouTube.

The generative algorithm is given “copy, layout, fonts, colours and images” and this is expressed as a gene-string. There is still considerable human expertise that goes into the basic assets that are input into the Bahio generative system.

The feedback is attention – which appears to be tracked by watching the amount of eyeball engagement viewers have with the poster. Individual posters are scored by the amount of attention they get – with better posters having their genes preserved for future generations.

The algorithm to generate new posters is genetic. Mathematically this is a method for searching a large multi-variate space in a non-exhaustive manner. It works best when we assume the fitness landscape is hill-like – that is, there are smooth ways to improve towards a “best” poster. Though I suspect it has limited usefulness without some sort of similarity measurement between input possibilities. What that means is, how does it determine that a small mutation on the “image” variable results in an image that is different in a similarly small way? Though, for the relatively small number of images the campaign appears to run, this doesn’t appear to be a big problem.

However, in the offline world, we don’t yet have an easy way to target particular demographics. This technique is, without modification, limited to products that have a general appeal to everybody. Some control could be exercised by creative directors on the input materials to the generative system but then the evaluation is still going to be limited.

That doesn’t make this a bad approach at all. M&C Saatchi will learn a lot of useful things from conducting this experiment. So far, this is a form of multivariate testing which is a technique already employed in the web world. It’s great to see experiments with transferring this to the offline world.

2015-07-26

Play The Maryland Extra Credit Problem

A screenshot of a class problem from the University of Maryland has been doing the rounds. The teacher invites the students to vote to receive extra credit:

Here you have the opportunity to earn some extra credit on your final paper grade. Select whether you want 2 point or 6 points added onto your final paper grade. But there's a small catch: if more than 10% of the class selects 6 points, then nobody gets any extra points. Your responses will be anonymous to the rest of the class, only I will see the responses.

This situation is a little different than the Prisoner's Dilemma made famous in Game Theory because nobody stands to lose anything. All outcomes are either neutral or a gain. From that vantage point the best course of action is to always vote for six points. However, I think there are some political dynamics at play that might alter your decision to decrease the likelihood of the neutral outcome.

Depending on your expected final grade, here's my advice on how to play:

If you're a troll then go for six points. #YOLO

If you're a high scoring student (A / A+) then select 2 points. You don't need the extra marks. You're doing so well that you're above all this competitive stuff. Give the other people a chance for a few extra points. Unless, you reckon that people should have to earn their place and you think the mountain top has room for only you... then go for 6 points.

Jo Beeplus: Always go 6 points. You stand to lose nothing if it ends up nobody gets any bonus points and you might just get the six points to your grade into A territory. You work hard, you deserve a shot at an A right?

Jo SeePlus: Go for the 2 points. You might need the bonus marks to ensure you pass so you don't want to risk getting zero bonus points.

Jo "NearFail": Go for six points. You might get them and two points or zero points won't make a difference.

Jo "TotalFailure" You're so far behind you should give others a shot to shine: go for 2 points. Unless you're spiteful.

Bonus evilness karma if you vote for 6 points while talking up a big game regarding the virtues of choosing 2 points.

Go play!

2015-07-04

Ceph and BitRot

BitRot is the tendency for data to degrade over time on storage devices. CERN reports error rates are at the 10-7 level, so BitRot is a significant problem. This short article talks about how to deal with BitRot on Ceph clusters.

Ceph's main weapon against BitRot is the deep scrubbing process. Deep scrubbing verifies data in a placement group against its checksum. If an object fails this test then the placement group is marked inconsistent and the administrator should repair it. Note that deep-scrub only detects and inconsistency and does not attempt an automatic repair. By contrast, a normal scrub only checks object sizes and attributes.

Deep scrubbing is resource intensive and can cause a noticeable performance drop. You can temporarily disable scrubbing and deep-scrubbing:

ceph osd set noscrub
ceph osd set nodeep-scrub
And the re-enable scrubbing with:
ceph osd unset noscrub
ceph osd unset nodeep-scrub

The configuration options for scrubbing allow the administrator to suggest how quiet the cluster should be before initiating a scrub, how long the cluster is allowed to go before it must scrub, and how many scrubs can run in parallel.

It's also technically possible to manually trigger scrubs via the command line. This means that an administrator that doesn't mind writing code could scrub the placement groups in different pools according to different policies. This article scrubs on a seven day cycle at night-time.

Another source of data degradation can occur in RAM before the data is written into the primary placement group. The best way to guard against this happening is to use ECC RAM. This particular problem is not unique to Ceph but is exacerbated because the nature of clusters is that they increase the number of potential corruption point in the supply-chain between application and storage device.

Ceph uses an underlying filesystem as a backing store and this in turn sits on a block device. There are choices an administrator might make in those layers to also help guard against BitRot - but there are also performance trade offs. For example ext4 and XFS do not protect against BitRot but ZFS and btrfs can if they are configured correctly. Ars Technica has an excellent article on the topic called BitRot and Atomic COWs: Inside next-gen filesystems. Also, don't expect that RAID will detect or repair BitRot.

For more technical details you can read this Q&A post by Sage Weil of InkTank from the Ceph mailing list.

2015-06-03

Discussing Human-Centric and Sentience-First Ethics

This blog post comes about in a discussion on the morality of eating meat. The no-meat camp use a sentience-first moral justification for rejecting the eating of meat. My position is different. Please be aware that I am not a professional philosopher and have no formal training in the topic. Those new to ethics should also be aware that while we can disagree on moral justification that doesn't necessarily make our daily behaviour different. For example, I would generally agree that we eat too much meat, though I don't go as far to say "Eat zero meat" is the only morally justified position.

My definition of human-centric morality has to presuppose humans are sentient. Though, the declaration of sentience as core to morality is an arbitrary claim.

I think my position comes from that of moral skepticism (anti-realist? nominalist?). That is I think that any moral system includes some axiomatic declarations: at the base of it we have to declare "X is good" and derive from there. Whether there's a declared ontic good, some deontological axiom or other... I don't believe that these moral claims exist as anything other than an emergent abstract object. In other words our morality does not exist without us.

A contributor raised the point that moral skepticism can derive consistent systems but these are not morals per se (rough paraphrase). I'd counter that all moral systems make axiomatic claims but not all axioms are created equal - we can measure them by making some epistemological assumptions.

Mine is roughly:
Survival of my species is good (axiom).
How do I know this: Morals evolved as decision making short-cuts to guide behaviours to benefit survival. If our morals didn't broadly fulfill this goal then we wouldn't be here to have morals. (There's the self-reference).

What about sentience first? Strawmen unintended.
Sentience is good (axiom).
How do I know this: I think therefore I am. If I wasn't then that's not good. (there's the self reference). It's no great stretch to assume that other sentiences exist. If I want my sentience respected then I should also respect the sentience of others.

I'd agree with this so far. But sentience-first embeds two further assumptions that I don't think are justified:

  1. Sentience is binary; you either have it or you don't.
    I don't think we have a scientific basis to draw a hard line between what is and is not a conscious/sentient being. It might be easy when we talk about humans, mammals, reptiles, insects ... but it gets harder in the relevantly-edible edge cases: Are colony organisms conscious? What about if my computer becomes conscious?*
  2. All sentiences are worthy of equal "don't eat me" consideration. Why, especially if 1) is not clear?
*FWIW I think sentience/consciousness as nominal objects; it's useful to talk about them but they don't actually exist except as an emergent phenomena. To think otherwise might open the door to mind/body dualism.

A common rebuttal to the meat eaters is to claim that it is not necessary for humans to eat meat. I asked if we should then attempt to convert the other omnivores to vegetarianism. The most logically consistent response I got was "Yes, we should but to do so would lead to BadStuff". That is question begging: We probably agree that such a conversion would cause BadStuff, but what is it about the BadStuff that makes it bad? How does that badness link back to sentience-first? I'm interested in thought on the matter.

2015-03-21

Ceph Cluster Diary March 2015

I am decommissioning all the USB backed OSDs. On the old hardware that I have the USB OSDs are much slower than USB spinning drives. With newer hardware I might get the full USB3.0 speeds that these flash drives are capable of. This does not mean I am done with Ceph. The truth is the complete opposite: I am migrating my bulk network storage to Ceph and so I need speeds comparable to my current RAID6 NAS box. This is why I have attached USB spinner drives: 2 x 2Tb, 1 x 1TB and 1 x 300Gb. Ceph reports I have just under 5Tb of usable space.

I will remove the netbook from the cluster because it’s too under powered for Ceph and I have other projects that can use it. That leaves a single older USB2 Toshiba Satellite to run Ceph until I get my two other planned nodes online. Those nodes are in decent sized tower cases with plenty of internal bays for more drives.

At present I’m not pressed for storage space. I backup the NAS to the Ceph cluster using rsync. That will do for now.

My future plans will be to consider an SSD based cache tier if I try another big data project. The speed I get from the object store will be the biggest driver of how much effort / budget goes in this direction.

I’m also watching developments in single board computers SBC. It’s almost getting cheap enough and maybe even powerful enough consider making an SBC running an OSD. Place an SBC and an HD into a stackable enclosure and there’s an easy way to grow out a home storage cluster a node at a time. If conditions are right I could even try placing a remote Ceph node or two at friend’s premises for automatic offsite redundancy over a VPN.

About those USBs: It was a great way to learn about Ceph for not much money. You could do the same thing with virtual machines and virtual devices I guess. But now – it’s time to be serious.

2015-02-14

Forcing AIO on Ceph OSD journals

My Ceph cluster doesn't run all that quick. On reads it was about 20% slower than my RAID5 NAS and writes were 4x slower! Ouch. A good part of that is probably down to using USB flash keys but...

Upon starting the OSDs I see this message:

journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
My OSDs are XFS backed which supports async writing to the journal, so let's set that up.

First, ssh into the ceph-deploy node and get the running version of the .conf file, replacing {headnode] with the hostname of the main monitor:

ceph-deploy --overwrite-conf config pull {headnode}
You can skip this step if your .conf is up to date.

Next edit the .conf file and add the following. If you already have an [OSD] section then update accordingly.

[OSD]
journal aio = true
journal dio = true
journal block align = true
journal force aio = true
This will try to apply this setting to all OSDs. You can control this on a per OSD basis by adding sections named after the OSD. E.g.
[OSD.4]
journal aio = true
journal dio = true
journal block align = true
journal force aio = true
And here's the official documentation.

Next, push the config back out to all the ceph nodes and restart ceph your osds. Separate hostnames with spaces.

ceph-deploy --overwrite-conf config push {headnode} {cephhost1} {cephhost2} ....
A word of warning: the --overwrite-conf flag is destructive. I'll leave it to you to take backups.
Then SSH into the various nodes restarting the ceph service as you go. I just restarted all ceph services,
sudo service ceph restart
But it's probably okay to just restart the OSDs
sudo service ceph restart osd

I experienced an almost 2 times speed increase on writes until the journals fill. Still slow, but getting much better. My fault for not having better hardware!

Read more about my Ceph Cluster.

2015-02-05

Howto Deep-Scrub on All Ceph Placement Groups

Ceph automatically takes care of deep-scrubbing all placement groups periodically. The exact timing of that is tunable but you're probably here because you want to force deep-scrubs.

The basic command for deep-scrubbing is:

ceph pg deep-scrub <pg-id>

and you can find the placement group ID using:
ceph pg dump

And if you want to instruct all placement groups to deep-scrub, use the same script from repairing inconsistent PGs. Basically loop over all the active PGs, instructing each to deep-scrub:

ceph pg dump | grep -i active | cut -f 1 | while read i; do ceph pg deep-scrub ${i}; done

The repair article explains how this line of script works.

You can be more specific about which PGs are deep-scrubbed by altering the

grep
part of the script. For example, to only scrub active+clean PGs:
ceph pg dump | grep -i active+clean | cut -f 1 | while read i; do ceph pg deep-scrub ${i}; done

Some general caveats are in order. Repair your PGs before attempting to deep-scrub; it's safer to only scrub PGs that are active and clean. You can use

ceph pg dump_stuck
and
ceph health detail
to help find out what's going on. Here's a link to Ceph placement group statuses.

Good luck!

2015-02-02

Bringing back an LVM backed volume

What can you do when an LVM backed logical volume goes offline? This happens on my slower netbook on an LVM logical volume spanning about 20 USB flash drives. Sometimes those PVs go missing and the filesystem stops! Here's the steps I take to fix this problem without a reboot. My volume group is called "usb" and my logical volume is called "osd.2".

Since my volume is part of a ceph cluster, I should ensure that the ceph osd is stopped. service ceph stop osd.2. You probably don't need to do this since the OSD probably exited once it saw errors on the filesystem.

Next, unmount the filesystem and mark the logical volume as inactive. We use the -f -l switches to force the dismount and lazily deal with the dismount in the background. Without those switches the umount might freeze.
umount -f -l /dev/mapper/usb-osd.2
Marking the logical volume as inactive can be done in two ways. Prefer the first method since it is more specific. The second method will mark inactive all dismounted logical volumes and that might be overkill.
lvchange -a n usb/osd.2 -or- vgchange -a n

At this point I unplug all the USB drives and check the hubs. Plug in the USB keys a few at a time and use pvscan as you go to ensure that each USB key is being recognised. If you have a dead USB key then try again in another port. If that doesn't work then check the hubs have power - even replug the hub. Failing that try a reboot. Failing that... attempt to repair the LVM volume some other way. Since ceph already replicates data I don't bother running the LVM backed logical volumes on RAID - I just overwrite the LV and make a new one from the remaining USB flash drives.

Once all the PVs have come back then pvscan one last time then vgscan. Now you should see your volume groups have all their PVs in place. Now it's time to reactivate the logical volumes. Both methods will work but again I prefer the first once since it is more specific.
lvchange -a y usb/osd.2 -or- vgchange -a y

All things going well and the Logical Volume is now active. It's a good idea to do a filesystem consistency check before you remount the drive. Since I use XFS I'll carry on with the steps for that. You should use whatever tools work for your filesystem.
mount /dev/mapper/usb-osd.2 mounting the drive allows the journal to replay. That usually fixes any file inconsistency problems.
umount /dev/mapper/usb-osd.2 unmount the drive before checking.
xfs_check /dev/mapper/usb-osd.2 to check the drive and use xfs_repair /dev/mapper/usb-osd.2 if there are any errors.

Now we're ready to mount the logical volume again: mount /dev/mapper/usb-osd.2

And since I'm running ceph I want to restart the OSD process: service ceph osd restart osd.2

Done!

Read more about my ceph cluster running on USB drives.

2015-01-06

A Quick Review of USB flash drives from Apacer, Sandisk and Strontium

In the course of building my USB thumbdrive based ceph cluster I tried USB keys from three different manufacturers with a total of five different varieties of USB drives. Here are my impressions of them.

Apacer
I have used 3 of the 8GB and 3 of the 32GB drives, both of the USB 3.0 Pen-Cap model (PBTech: 8GB | 32GB). One of the 8GB and 2 of the 32GB sticks failed (50%!). I have personal data on them so I don't want to return them for a refund in-case the failures are not total. I have had great feedback about these sticks from others and I did love the speed. However, they do run quite hot and perhaps the heavy IO loads of ceph melted them. They do have a blinky activity LED that is a gentle blue. The drives will stack on top of each other but are a too wide to stack side-by-side. However, stacking does increase the heat problem. The actual usage space is about 28-29GB this was low compared to competitors but the drives tended to be a bit cheaper.
The price probably makes them great for your briefcase/backpack but I wouldn't recommend them for high-usage.

Sandisk
I have 12 of the 8GB Cruzer Blade drives and one of the 32GB larger slide-cover Ultra3 thumb drives (PBTech: 8GB | 32GB). The Cruzer Blade style drives do not have an LED but thankfully I have had zero failures. The Cruzer Blade drives stack both horizontally and vertically in USB ports with a tiny bit of touching. The Ultra drive is too wide and tall to stack in USB ports but it does have a small blue activity LED. The Ultra3 would be my favourite USB drive for the briefcase/backpack because you get more usable storage than Apacer, the price is not much more and there's no cap to lose.

Strontium
I have four of the 32GB JET USB DRIVEs (PBTech: 32GB). These are much too large to stack multiples in standard USB ports, though you might squeeze them in stacking vertically. I love the price, performance and reliability. The Strontium thumb drives have a red activity LED and they have never failed me. These are my favourite drives for Ceph -> when I want reliability. They do come with a cap - which I don't like for a briefcase/backpack drive though.

It should be said that the economics of running CEPH on USB flash drives doesn't add up. USB hard-drives give better price per GB and probably better performance too (particularly on my old laptops).

Read more about my Ceph Cluster.

IFCOMP2014: With Those We Love Alive

With Those We Love Alive written by Porpentine and scored by Brenda Neotenomie placed 5th in the 20th Interactive Fiction Competition IFCOMP2014. You can play online at ifdb. This series of blog posts are mini-reviews I wrote as a fellow author to document my impressions of other games.

Spoilers below

Porpentine is one of the few authors I know much about. I have played CyberQueen, Cry$tal Warrior Ke$ha and Howling Dogs. Those works, and Porpentine’s interviews and posts keep me thinking the relationships between the tools, the medium, the story and how these produce a final work. Hagiography over.

I particularly WTWLA for the rich symbolism invoked with small amounts of text. I like how this game looks like it could go well onto a small screen. I’m not sure if the "select my preference" purple links had an effect on the underlying game but that didn’t matter because it helped ME construct a coherent picture of the world and how the relationships within it worked.

The game invites you to draw symbols onto your skin. I did not do this but I see how it could enhance the game. It would further immerse the player into the world and fit well with the way that time passes in the game.

The game is technically well crafted as prior actions etch onto other parts of the environment. The recognition of my prior craft reinforced how much my character had supported the machine from which I wanted to escape. Going along to go on. The music and colour changes support the story well.

Overall I was interested in where the story was going and how it got there. There are some lovely symbolic moments that will mean different things to many. I particularly like the princess uprising. Beautiful.

Highly recommended.

IFCOMP2014: Laterna Magica review

Laterna Magica by Jens Byriel placed 42nd (last) in the 20th Interactive Fiction Competition IFCOMP2014. You can play online at ifdb. This series of blog posts are mini-reviews I wrote as a fellow author to document my impressions of other games.

Spoilers below

This work occupies an interesting place for me. I’m not sure it intends to be FICTION. I read this as a dialogue between yourself and yourself as a way of exploring / confronting your own thinking about new age spirituality. YMMV on subject matter like this... I found myself having to just buy-in to certain beliefs in order to continue. Ugh fine, it’s fiction.

As an IF-work: I like the idea of using IF in a dialogue manner; a form of education, a conversation a reader can have to learn about things. This is different to Wikipedia where knowledge is presented in chunked totality. This mode of education is an unfolding journey. I’d love to see work where the conversation changes based on what has come before – that would unlock great educational potential.

I initially couldn’t find an ending and went around in loops trying to explore as many answers as I could. I get this was the whole point; to come and go as I choose but there is no end to this quest/questioning. I’m going to admit to cheating and eventually reading the source code to make sure I got everything – ha! It turns it there is a proper ending - this is sort of a maze game afterall.

I don't recommend this game.

IFCOMP2014: ICEPUNK review

ICEPUNK by pageboy placed 31st in the 20th Interactive Fiction Competition IFCOMP2014. You can play online at ifdb. This series of blog posts are mini-reviews I wrote as a fellow author to document my impressions of other games.

Spoilers below

A neat post-apocalyptic loner back story and an interesting narrative where you go around slurping up data out of the landscape. The interface riffs on old text console games and features some retro ASCII art. The map is randomised. The map was a bit clunky and slow to navigate – probably a feature of my impatience and my older computer.

The task of slurping up data got a bit tedious. It became more score-keeping than a chance to revisit the items of culture the game presented as data for the taking. It became less an unfolding story than a chore to complete. I did encounter one potential show stopper bug that gave a totally blank screen. I got around this with some console tricks.

Booting up the computer gave a simple victory screen. About what you’d expect.

A story-line with the inhabitants of another bunker basically opting-out of the game wasn’t particularly well followed. Neither was the initial screens of gender selection.

I score this game well in technical merit. The back story was cool, but too much was exposed via exposition in the opening scenes rather than discovered as the story unfolded. I suspect the author had high ambitions but unfortunately ran out of time. I don’t want to sound at all discouraging, because this could be a great game story if polished and honed.

IFCOMP2014: Begscape review

Begscape by Porpentine placed 28th in the 20th Interactive Fiction Competition IFCOMP2014. You can play online at ifdb. This series of blog posts are mini-reviews I wrote as a fellow author to document my impressions of other games.

Spoilers below

I went for this game because of its simple aesthetic. It feels like the PeopleSoft style text games from the 1970s (e.g. Chris Gaylo’s Highnoon). Those games helped me learn to program! I have done some fairly faithful conversions of these old games to Twine and Arduino. This means I already have an affinity for the style of game.

The choices are brutal and limited; even by the standards of the 1970s games. But those limits are the story. Poverty is not only about money though that becomes the basic means of maintaining health. Poverty is about resources; health, mental, charisma, and connectedness. That this game doesn’t let PC me do a bunch of things RL me could do (borrow, ask a relative or friend, apply state benefits, work for food, etc) speaks exactly to that.

Perfectly sized and I love it.