[dm-devel] Another cache target

Darrick J. Wong darrick.wong at oracle.com
Fri Dec 14 02:34:25 UTC 2012


On Thu, Dec 13, 2012 at 09:19:19PM -0500, Mike Snitzer wrote:
> On Thu, Dec 13 2012 at  8:16pm -0500,
> Darrick J. Wong <darrick.wong at oracle.com> wrote:
> 
> > On Thu, Dec 13, 2012 at 04:57:15PM -0500, Mike Snitzer wrote:
> > > On Thu, Dec 13 2012 at  3:19pm -0500,
> > > Joe Thornber <ejt at redhat.com> wrote:
> > > 
> > > > Here's a cache target that Heinz Mauelshagen, Mike Snitzer and I
> > > > have been working on.
> > > > 
> > > > It's also available in the thin-dev branch of my git tree:
> > > > 
> > > > git at github.com:jthornber/linux-2.6.git
> > > 
> > > This url is best for others to clone from:
> > > git://github.com/jthornber/linux-2.6.git
> > > 
> > > > The main features are a plug-in architecture for policies which decide
> > > > what data gets cached, and reuse of the metadata library from the thin
> > > > provisioning target.
> > > 
> > > It should be noted that there are more cache replacement policies
> > > available in Joe's thin-dev branch via the "basic" policy, see:
> > > drivers/md/dm-cache-policy-basic.c
> > > 
> > > (these basic policies include fifo, lru, lfu, and many more)
> > >  
> > > > These patches apply on top of the dm patches that agk has got queued
> > > > for 3.8.
> > > 
> > > agk's patches are here:
> > > http://people.redhat.com/agk/patches/linux/editing/series.html
> > > 
> > > But agk hasn't staged all the required patches yet.  I've imported agk's
> > > editing tree (and a couple other required patches that I previously
> > > posted to dm-devel, which aren't yet in agk's tree) into the
> > > 'dm-for-3.8' branch on my github tree here:
> > > git://github.com/snitm/linux.git
> > > 
> > > This 8 patch patchset from Joe should apply cleanly ontop of my
> > > 'dm-for-3.8' branch.
> > > 
> > > But if all you care about is a tree with all the changes then please
> > > just use Joe's github 'thin-dev' branch.
> > 
> > A full list of broken-out patches would've been nice, but oh well, I ate this
> > git tree. :)
> > 
> > Curiously, the Documentation/device-mapper/dm-cache.txt says to specify devices
> > in the order: metadata, origin, and cache, but the code (and Joe's mail) seeem
> > to want metadata, cache, origin.  This sort of makes me wonder what's going on?
> 
> The patch Joe posted has the proper order (metadata, cache, origin -- I
> fixed the ordering in dm-cache,txt and Joe pulled it in before posting
> the patches).  Seems Joe forgot to push his last few tweaks to his
> thin-dev branch.

Ahh. :)

> > Also, I found a bug when using the mru policy.  If I do this:
> 
> Pretty sure you'd be best served to focus on the code Joe posted.  Maybe
> best to clone my github tree and start with my 'dm-for-3.8' branch.  And
> then apply all the patches Joe posted.
> 
> I'd stick to the "default" policy -- aka "mq".
> 
> Joe purposely didn't post the "basic" policies because they are less
> well tested.

Ok, I'll stick to mq for now then.  I'll try to figure out what it does exactly.

> > <set up a scsi_debug "ssd" with a 448M /dev/sda1 for cache and the rest for
> >  metadata on /dev/sda2>
> > # echo 0 67108864 cache /dev/sda2 /dev/sda1 /dev/vda 512 0 mru 0 | dmsetup create fubar
> > ...<use fubar, fill up the cache>...
> > # dmsetup remove fubar
> > # echo 0 67108864 cache /dev/sda2 /dev/sda1 /dev/vda 512 0 mru 0 | dmsetup create fubar
> > 
> > I see the following crash in dmesg:
> > 
> > [  426.661458] scsi1 : scsi_debug, version 1.82 [20100324], dev_size_mb=512, opts=0x0
> > [  426.663955] scsi 1:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
> > [  426.667005] sd 1:0:0:0: Attached scsi generic sg0 type 0
> > [  426.667020] sd 1:0:0:0: [sda] 1048576 512-byte logical blocks: (536 MB/512 MiB)
> > [  426.667046] sd 1:0:0:0: [sda] Write Protect is off
> > [  426.667057] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> > [  426.667203]  sda: unknown partition table
> > [  426.667311] sd 1:0:0:0: [sda] Attached SCSI disk
> > [  426.694055]  sda: sda1 sda2
> > [  448.155368] bio: create slab <bio-1> at 1
> > [  460.762930] promote thresholds = 65/4 queue stats = 1/0
> > [  468.121084] promote thresholds = 65/4 queue stats = 1/1
> > [  471.970865] dm-cache statistics:
> > [  471.974809] read hits:	887895
> > [  471.976948] read misses:	499
> > [  471.978195] write hits:	0
> > [  471.979380] write misses:	0
> > [  471.980716] demotions:	7
> > [  471.982391] promotions:	1799
> > [  471.983798] copies avoided:	7
> > [  471.985137] cache cell clashs:	0
> > [  471.986886] commits:		1653
> > [  471.988410] discards:		0
> > [  474.177476] bio: create slab <bio-1> at 1
> > [  474.206000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > [  474.209037] IP: [<ffffffffa01b1aad>] queue_evict_default+0x1d/0x50 [dm_cache_basic]
> > [  474.209969] PGD 0 
> > [  474.209969] Oops: 0002 [#1] PREEMPT SMP 
> > [  474.209969] Modules linked in: scsi_debug dm_cache_basic dm_cache_mq dm_cache dm_bio_prison dm_persistent_data dm_bufio crc_t10dif nfsv4 sch_fq_codel eeprom nfsd auth_rpcgss exportfs af_packet btrfs zlib_deflate libcrc32c [last unloaded: scsi_debug]
> > [  474.209969] CPU 0 
> > [  474.209969] Pid: 1285, comm: kworker/u:2 Not tainted 3.7.0-dmcache #1 Bochs Bochs
> > [  474.209969] RIP: 0010:[<ffffffffa01b1aad>]  [<ffffffffa01b1aad>] queue_evict_default+0x1d/0x50 [dm_cache_basic]
> > [  474.209969] RSP: 0018:ffff880055641be8  EFLAGS: 00010282
> > [  474.209969] RAX: ffff880073a85eb0 RBX: ffff880037ca5c00 RCX: 0000000000000000
> > [  474.209969] RDX: 0000000000000000 RSI: 0007fff80005ffff RDI: ffff880073a85eb0
> > [  474.209969] RBP: ffff880055641be8 R08: e000000000000000 R09: ffff880072d619a0
> > [  474.209969] R10: 0000000000000034 R11: fffffff80005ffff R12: ffff880037f33d30
> > [  474.209969] R13: ffff880037ca5c78 R14: ffff880055641c98 R15: 000000000001ffff
> > [  474.209969] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> > [  474.209969] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  474.209969] CR2: 0000000000000008 CR3: 0000000001a0c000 CR4: 00000000000407f0
> > [  474.209969] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  474.209969] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  474.209969] Process kworker/u:2 (pid: 1285, threadinfo ffff880055640000, task ffff88007cb62de0)
> > [  474.209969] Stack:
> > [  474.209969]  ffff880055641c58 ffffffffa01b28a4 0000000000000040 0000000000000286
> > [  474.209969]  ffff880000000000 ffffffffa017658c 0000000000000000 ffff880155641cd0
> > [  474.209969]  ffff880055641c58 ffff88007cac7400 ffff880055641d50 ffff880037f33d30
> > [  474.209969] Call Trace:
> > [  474.209969]  [<ffffffffa01b28a4>] basic_map+0x484/0x708 [dm_cache_basic]
> > [  474.209969]  [<ffffffffa017658c>] ? dm_bio_detain+0x5c/0x80 [dm_bio_prison]
> > [  474.209969]  [<ffffffffa019c221>] process_bio+0x101/0x4c0 [dm_cache]
> > [  474.209969]  [<ffffffffa019cb4f>] do_worker+0x56f/0x630 [dm_cache]
> > [  474.209969]  [<ffffffff81081ab6>] ? finish_task_switch+0x56/0xb0
> > [  474.209969]  [<ffffffff8106fa31>] process_one_work+0x121/0x490
> > [  474.209969]  [<ffffffffa019c5e0>] ? process_bio+0x4c0/0x4c0 [dm_cache]
> > [  474.209969]  [<ffffffff81070be5>] worker_thread+0x165/0x3f0
> > [  474.209969]  [<ffffffff81070a80>] ? manage_workers+0x2a0/0x2a0
> > [  474.209969]  [<ffffffff81076010>] kthread+0xc0/0xd0
> > [  474.209969]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
> > [  474.209969]  [<ffffffff815680ac>] ret_from_fork+0x7c/0xb0
> > [  474.209969]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
> > [  474.209969] Code: de 48 89 47 08 48 89 f8 5d c3 0f 0b 66 90 66 66 66 66 90 55 48 8b bf f8 01 00 00 48 89 e5 e8 ab ff ff ff 48 8b 48 28 48 8b 50 30 <48> 89 51 08 48 89 0a 48 ba 00 01 10 00 00 00 ad de 48 b9 00 02 
> > [  474.209969] RIP  [<ffffffffa01b1aad>] queue_evict_default+0x1d/0x50 [dm_cache_basic]
> > [  474.209969]  RSP <ffff880055641be8>
> > [  474.209969] CR2: 0000000000000008
> > [  474.333040] ---[ end trace 20dda5f362594054 ]---
> > [  474.336010] BUG: unable to handle kernel paging request at ffffffffffffffd8
> > [  474.336680] IP: [<ffffffff810761f0>] kthread_data+0x10/0x20
> > [  474.336680] PGD 1a0e067 PUD 1a0f067 PMD 0 
> > [  474.336680] Oops: 0000 [#2] PREEMPT SMP 
> > [  474.336680] Modules linked in: scsi_debug dm_cache_basic dm_cache_mq dm_cache dm_bio_prison dm_persistent_data dm_bufio crc_t10dif nfsv4 sch_fq_codel eeprom nfsd auth_rpcgss exportfs af_packet btrfs zlib_deflate libcrc32c [last unloaded: scsi_debug]
> > [  474.336680] CPU 0 
> > [  474.336680] Pid: 1285, comm: kworker/u:2 Tainted: G      D      3.7.0-dmcache #1 Bochs Bochs
> > [  474.336680] RIP: 0010:[<ffffffff810761f0>]  [<ffffffff810761f0>] kthread_data+0x10/0x20
> > [  474.336680] RSP: 0018:ffff8800556417a8  EFLAGS: 00010096
> > [  474.336680] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81bb2f80
> > [  474.336680] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88007cb62de0
> > [  474.336680] RBP: ffff8800556417a8 R08: 0000000000000001 R09: 0000000000000083
> > [  474.336680] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> > [  474.336680] R13: ffff88007cb631d0 R14: 0000000000000000 R15: 0000000000000001
> > [  474.336680] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> > [  474.336680] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  474.336680] CR2: ffffffffffffffd8 CR3: 0000000001a0c000 CR4: 00000000000407f0
> > [  474.336680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  474.336680] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  474.336680] Process kworker/u:2 (pid: 1285, threadinfo ffff880055640000, task ffff88007cb62de0)
> > [  474.336680] Stack:
> > [  474.336680]  ffff8800556417c8 ffffffff81071445 ffff8800556417c8 ffff88007fc12880
> > [  474.336680]  ffff880055641848 ffffffff81565a58 ffff8800556417f8 ffff880037daeba0
> > [  474.336680]  ffff88007cb62de0 ffff880055641fd8 ffff880055641fd8 ffff880055641fd8
> > [  474.336680] Call Trace:
> > [  474.336680]  [<ffffffff81071445>] wq_worker_sleeping+0x15/0xc0
> > [  474.336680]  [<ffffffff81565a58>] __schedule+0x5f8/0x7c0
> > [  474.336680]  [<ffffffff81565d39>] schedule+0x29/0x70
> > [  474.336680]  [<ffffffff81057748>] do_exit+0x678/0x9e0
> > [  474.336680]  [<ffffffff8155fe50>] ? printk+0x4d/0x4f
> > [  474.336680]  [<ffffffff8100662b>] oops_end+0xab/0xf0
> > [  474.336680]  [<ffffffff8155f7a6>] no_context+0x201/0x210
> > [  474.336680]  [<ffffffff8155f986>] __bad_area_nosemaphore+0x1d1/0x1f0
> > [  474.336680]  [<ffffffff8110ba75>] ? mempool_kmalloc+0x15/0x20
> > [  474.336680]  [<ffffffff8155f9b8>] bad_area_nosemaphore+0x13/0x15
> > [  474.336680]  [<ffffffff810311a2>] __do_page_fault+0x322/0x4d0
> > [  474.336680]  [<ffffffff8111109f>] ? get_page_from_freelist+0x1bf/0x460
> > [  474.336680]  [<ffffffff81335eca>] ? virtblk_request+0x44a/0x460
> > [  474.336680]  [<ffffffff81232d56>] ? cpumask_next_and+0x36/0x50
> > [  474.336680]  [<ffffffff81232d56>] ? cpumask_next_and+0x36/0x50
> > [  474.336680]  [<ffffffff8108fa53>] ? update_sd_lb_stats+0x123/0x610
> > [  474.336680]  [<ffffffff8103138e>] do_page_fault+0xe/0x10
> > [  474.336680]  [<ffffffff8102e425>] do_async_page_fault+0x35/0xa0
> > [  474.336680]  [<ffffffff81567925>] async_page_fault+0x25/0x30
> > [  474.336680]  [<ffffffffa01b1aad>] ? queue_evict_default+0x1d/0x50 [dm_cache_basic]
> > [  474.336680]  [<ffffffffa01b1aa5>] ? queue_evict_default+0x15/0x50 [dm_cache_basic]
> > [  474.336680]  [<ffffffffa01b28a4>] basic_map+0x484/0x708 [dm_cache_basic]
> > [  474.336680]  [<ffffffffa017658c>] ? dm_bio_detain+0x5c/0x80 [dm_bio_prison]
> > [  474.336680]  [<ffffffffa019c221>] process_bio+0x101/0x4c0 [dm_cache]
> > [  474.336680]  [<ffffffffa019cb4f>] do_worker+0x56f/0x630 [dm_cache]
> > [  474.336680]  [<ffffffff81081ab6>] ? finish_task_switch+0x56/0xb0
> > [  474.336680]  [<ffffffff8106fa31>] process_one_work+0x121/0x490
> > [  474.336680]  [<ffffffffa019c5e0>] ? process_bio+0x4c0/0x4c0 [dm_cache]
> > [  474.336680]  [<ffffffff81070be5>] worker_thread+0x165/0x3f0
> > [  474.336680]  [<ffffffff81070a80>] ? manage_workers+0x2a0/0x2a0
> > [  474.336680]  [<ffffffff81076010>] kthread+0xc0/0xd0
> > [  474.336680]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
> > [  474.336680]  [<ffffffff815680ac>] ret_from_fork+0x7c/0xb0
> > [  474.336680]  [<ffffffff81075f50>] ? flush_kthread_worker+0xb0/0xb0
> > [  474.336680] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 98 03 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 
> > [  474.336680] RIP  [<ffffffff810761f0>] kthread_data+0x10/0x20
> > [  474.336680]  RSP <ffff8800556417a8>
> > [  474.336680] CR2: ffffffffffffffd8
> > [  474.336680] ---[ end trace 20dda5f362594055 ]---
> > [  474.336680] Fixing recursive fault but reboot is needed!
> > [  477.004016] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
> > [  477.004016] Shutting down cpus with NMI
> > [  477.004016] panic occurred, switching back to text console
> > 
> > *Before* it crashes, though, I can run my iops exerciser and watch the numbers
> > climb from ~300 to ~100000.  Nice work! :)
> > 
> > (The default policy engine doesn't seem to have this problem, but I haven't
> > figured out how to make it cache blocks yet...)
> 
> What is your iops exerciser?  Do you have a pointer?  You're running the
> same workload against "default" and not seeing what you'd expect?

Actually, I decided to try out "mru" and see what it would do (or not do).  My
current theory is that mq doesn't seem to like putting blocks in the cache
until after you write 'em(?)  There's no way to spit out the cache stats while
it's running, so it's difficult to make observations.

The "exerciser" is called maxiops, from:
http://djwong.org/programs/bogodisk/bogoseek-0.6.2.tar.gz

untar, make, ./maxiops /dev/somethingorother -b 4096

The third column of output is a rough estimate of iops.  maxiops is really just
an aio port of bogoseek -n, which is in the same package.  If you want to do a
destructive write test with them, pass -w.

--D
> 
> Mike




More information about the dm-devel mailing list