[Pulp-list] [devel] Write Concern Test

Fri May 22 04:06:16 UTC 2015

I made a simple lab environment with Vagrant that demonstrates why we
should not use {fsync: True} as our write concern in our queries. The
attached tarball contains a directory that has a Vagrantfile that
configures a simple mongo replicaset and creates three simple tests that
you can run on the set.

The tests themselves are not automated, though it is pretty simple to
drive them yourself. The Vagrantfile assumes that your box is already
configured for Vagrant + NFS. If it is not, you can use rsync by
replacing the line:

config.vm.synced_folder ".", "/home/vagrant/devel", type: "nfs",
nfs_version: 4, nfs_udp: false

with:

config.vm.synced_folder ".", "/home/vagrant/devel", type: "rsync"

Once you are ready to begin, simply run vagrant up:

$ time vagrant up
<snip>
real	2m14.050s
user	0m1.898s
sys	0m0.569s

You now have a VM on your computer with three mongods on ports
2701{7,8,9}. You can ssh to your vm with vagrant ssh:

$ vagrant ssh

There are some scripts in /home/vagrant/devel. You can configure your
hosts to be a replica set with default write concern of {w: 3} using the
replica_set.js file in there:

[vagrant at replicationtest ~]$ mongo --port 27017 devel/replica_set.js
MongoDB shell version: 2.4.13
connecting to: 127.0.0.1:27017/test
{
	"_id" : "rs0",
	"version" : 1,
	"members" : [
		{
			"_id" : 0,
			"host" : "localhost:27017"
		},
		{
			"_id" : 1,
			"host" : "localhost:27018"
		},
		{
			"_id" : 2,
			"host" : "localhost:27019"
		}
	],
	"settings" : {
		"getLastErrorDefaults" : {
			"w" : 3
		}
	}
}

There is a connection.py that all three test scripts use that defines a
simple model and establishes the connection. The connection is
established with the NEAREST read_preference, which will allow
mongoengine to auto-discover the current master (if you are unlucky and
connect to a secondary, reads will cause an error without this setting.)
There's other options for read_preference you can read about[0]. If
27017 happens to be down, it'll try 27018 instead. I've made little
while loops in all the tests so that they will keep trying to make their
writes until they are connected to a current primary. Sometimes no
autoreconnects happen, but it's there so you don't get any scary
sounding errors if you happen to be connected to a secondary. Spend a
little time becoming familiar with each test before we start.

Go ahead and run the first test, which only writes a single entry that
will get replicated to all three nodes:

[vagrant at replicationtest ~]$ time python devel/test_0.py
Successfully written in 1 attempts.

real	0m4.986s
user	0m0.066s
sys	0m0.021s

Feel free to connect to one of the mongod's to inspect the replication.
If you happen to connect to the primary, everything will work as you are
used to. If you connect to a secondary, it will not allow you to perform
even read queries:

rs0:SECONDARY> show collections
Thu May 21 23:27:55.849 error: { "$err" : "not master and
slaveOk=false", "code" : 13435 } at src/mongo/shell/query.js:128

You can tell mongo that reads are OK on a secondary like this:

rs0:SECONDARY> rs.slaveOk()
rs0:SECONDARY> db.test_document.find()
{ "_id" : ObjectId("555ea1ea99038b0b7d8c0b22"), "test_field" : "test 0" }

By the way, you can also use rs.status() to see the current state of
replication when connected to any of the nodes:

rs0:SECONDARY> rs.status()
<snip>

Note that our test 0 object has indeed made it to this secondary
replica. This is all normal so far. Next, let's kill one of the mongods.
You can do it with a regular old sigterm if you like. I decided to kill
the oldest, but it doesn't really matter which you choose:

[vagrant at replicationtest ~]$ sudo pkill -o mongod

OK, now we have a degraded replica set as we've gone from three nodes
down to two (feel free to inspect with rs.status() on one of the
remaining nodes):

[vagrant at replicationtest ~]$ pgrep mongod | wc -l
2

However, we have also configured our replica set to have a default write
concern of 3. This means that by default, our calls to getLastError()
will not return success unless our writes have propagated to and been
confirmed by three nodes. In this state, let's try to run test_1.py
which will simply try to insert one more document into the database
without specifying any write concerns:

[vagrant at replicationtest ~]$ time python devel/test_1.py
Successfully written in 1 attempts.

real	1m30.918s
user	0m0.159s
sys	0m0.053s

That code above ran for 1.5 minutes, and it only exited because in
another shell I started the missing mongod back up like this:

[vagrant at replicationtest ~]$ sudo mongod --fork --nojournal --syslog
--port 27017 --dbpath /home/vagrant/rs0-0 --replSet rs0

If you wish to try that too, make sure you start the one that was
missing (so if you didn't kill the one on port 27017, adjust the above
port and dbpath to match what the missing one would have been. You can
see all of the mongod args in the Vagrantfile). Note that even though it
took 1.5 minutes, it wasn't due to AutoReconnects - it was the save()
call hanging until the write made it to all three nodes. Now is a good
time to play around in the mongo shell to prove to yourself that the
replica set is healthy, and they they all have the expected writes.

So far all we've done so far is prove that we can write to the DB when
it's up and that writes hang when a node is down, which is what we
expect given how we configured the replica set. Now let's perform the
last test. We need to get back to two running mongods again, so if you
started your missing one back up like I did, kill one again:

[vagrant at replicationtest ~]$ sudo pkill -o mongod

You should now have two mongods running again, meaning your replica set
is degraded (again, rs.status() will give you more detail):

[vagrant at replicationtest ~]$ pgrep mongod | wc -l
2

Now, have a look at test_2.py and note that it is overriding the write
concern in the way that has been suggested in pull request #1864[1]:

[vagrant at replicationtest ~]$ grep write_concern < devel/test_2.py
        t.save(write_concern={'fsync': True})

Note that we've only inserted an fsync, and have not provided the w
option. Let's run test_2.py:

[vagrant at replicationtest ~]$ time python devel/test_2.py
Successfully written in 1 attempts.

real	0m0.186s
user	0m0.055s
sys	0m0.017s

Note that the write went through and returned before hitting all the
nodes, even though the replica set had been configured for a safer write
concern. Go ahead and check out the replica set again in the mongo
shell, and have a look at rs.status() again.

One cool thing is that if you start the missing node back up, the
missing node should automatically figure out who's master and rejoin the
replica set and get that last write. In my case, the one on port 27018
was the dead one, so I'll start it back up:

[vagrant at replicationtest ~]$ sudo mongod --fork --nojournal --syslog
--port 27018 --dbpath /home/vagrant/rs0-1 --replSet rs0

To end, I'll connect to the one that had been dead and demonstrate that
everything is back to normal:

[vagrant at replicationtest ~]$ mongo --port 27018
MongoDB shell version: 2.4.13
connecting to: 127.0.0.1:27018/test
rs0:SECONDARY> rs.slaveOk()
rs0:SECONDARY> rs.status()
{
	"set" : "rs0",
	"date" : ISODate("2015-05-22T03:47:04Z"),
	"myState" : 2,
	"syncingTo" : "localhost:27017",
	"members" : [
		{
			"_id" : 0,
			"name" : "localhost:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 38,
			"optime" : Timestamp(1432266200, 1),
			"optimeDate" : ISODate("2015-05-22T03:43:20Z"),
			"lastHeartbeat" : ISODate("2015-05-22T03:47:04Z"),
			"lastHeartbeatRecv" : ISODate("2015-05-22T03:47:03Z"),
			"pingMs" : 0,
			"syncingTo" : "localhost:27019"
		},
		{
			"_id" : 1,
			"name" : "localhost:27018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 38,
			"optime" : Timestamp(1432266200, 1),
			"optimeDate" : ISODate("2015-05-22T03:43:20Z"),
			"errmsg" : "syncing to: localhost:27017",
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "localhost:27019",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 38,
			"optime" : Timestamp(1432266200, 1),
			"optimeDate" : ISODate("2015-05-22T03:43:20Z"),
			"lastHeartbeat" : ISODate("2015-05-22T03:47:04Z"),
			"lastHeartbeatRecv" : ISODate("2015-05-22T03:47:03Z"),
			"pingMs" : 0
		}
	],
	"ok" : 1
}
rs0:SECONDARY> db.test_document.find()
{ "_id" : ObjectId("555ea1ea99038b0b7d8c0b22"), "test_field" : "test 0" }
{ "_id" : ObjectId("555ea46799038b149e1e4d7a"), "test_field" : "test 1" }
{ "_id" : ObjectId("555ea5d899038b19d92f5449"), "test_field" : "test 2" }

[0]
http://api.mongodb.org/python/current/api/pymongo/read_preferences.html#pymongo.read_preferences.ReadPreference
[1] https://github.com/pulp/pulp/pull/1864

-- 
Randy Barlow
-------------- next part --------------
A non-text attachment was scrubbed...
Name: write_concern_test.tar.bz2
Type: application/x-bzip
Size: 1505 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20150522/72325809/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20150522/72325809/attachment.sig>