[Linux-cluster] Possible problem with gfs2

Juan Pablo Lorier jplorier at gmail.com
Fri Oct 25 16:36:54 UTC 2013


Hi Digimer,

Thanks for the reply. Here are the contents of the cluster.conf and log
as requested. That day I had to add back a disk to the raid and it was
recovering while in production.
Regards,

<?xml version="1.0"?>
<cluster config_version="7" name="NAS">
<clusternodes>
<clusternode name="host2" nodeid="1"/>
</clusternodes>
<rm>
<resources>
<clusterfs device="72aea53c-7a96-2c5a-3064-034d8a8ed797" fsid="3261"
fstype="gfs2" mountpoint="/disco" name="disco"/>
</resources>
<service name="Samba" recovery="relocate">
<samba config_file="/etc/samba/smb.conf" name="Samba" shutdown_wait="0"/>
</service>
</rm>
</cluster>

messages log:

Oct 22 11:40:18 nas kernel: GFS2: fsid=: Trying to join cluster
"lock_dlm", "NAS:disco"
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: Joined cluster. Now
mounting FS...
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=0, already
locked for use
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=0: Looking at
journal...
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=0: Done
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=1: Trying to
acquire journal lock...
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=1: Looking at
journal...
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=1: Done
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=2: Trying to
acquire journal lock...
Oct 22 11:40:18 nas kernel: GFS2: fsid=NAS:disco.0: jid=2: Looking at
journal...
Oct 22 11:40:19 nas kernel: GFS2: fsid=NAS:disco.0: jid=2: Done
Oct 22 11:40:19 nas kernel: GFS2: fsid=NAS:disco.0: jid=3: Trying to
acquire journal lock...
Oct 22 11:40:19 nas kernel: GFS2: fsid=NAS:disco.0: jid=3: Looking at
journal...
Oct 22 11:40:19 nas kernel: GFS2: fsid=NAS:disco.0: jid=3: Done
Oct 22 11:40:19 nas kernel: Connecting to MDS 192.168.30.8
Oct 22 11:40:19 nas kernel: MDS list size change 1->4
Oct 22 11:40:19 nas kernel: MDS IPs: 192.168.30.8 192.168.30.9
192.168.30.10 192.168.30.11
Oct 22 11:40:19 nas kernel: ClusterUid is DDB859E30A176302
Oct 22 11:40:19 nas kernel: using MDS 192.168.30.8 version=3.1 sess
id=243ae251
Oct 22 11:44:04 nas smbd[3176]: [2013/10/22 11:44:04.545789, 0]
../lib/util/pidfile.c:153(pidfile_unlink)
Oct 22 11:44:04 nas smbd[3176]: Failed to delete pidfile
/var/run/samba/smbd.pid. Error was No existe el fichero o el directorio
Oct 22 11:46:37 nas ata_id[4211]: HDIO_GET_IDENTITY failed for '/dev/sde'
Oct 22 11:46:37 nas kernel: md: bind<sde>
Oct 22 11:46:37 nas kernel: md: recovery of RAID array md127
Oct 22 11:46:37 nas kernel: md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
Oct 22 11:46:37 nas kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Oct 22 11:46:37 nas kernel: md: using 128k window, over a total of
1953382912k.
Oct 22 12:07:48 nas kernel: Session 0xffff8802c748c000 expired.
reauthenticating
Oct 22 12:08:09 nas kernel: Connecting to MDS 192.168.30.8
Oct 22 12:08:09 nas kernel: MDS list size change 1->4
Oct 22 12:08:09 nas kernel: MDS IPs: 192.168.30.8 192.168.30.9
192.168.30.10 192.168.30.11
Oct 22 12:08:09 nas kernel: ClusterUid is DDB859E30A176302
Oct 22 12:08:09 nas kernel: using MDS 192.168.30.8 version=3.1 sess
id=545e5ed7
Oct 22 13:02:42 nas kernel: Session 0xffff880321122000 expired.
reauthenticating
Oct 22 13:10:24 nas esets_daemon[3078]: error[0c060000]: Error updating
Antivirus modules: Bad link to update server.
Oct 22 13:35:31 nas kernel: Session 0xffff880321122000 expired.
reauthenticating


there's a lot of this samba entries one hour earlier, but seems
unrelated so I'm just pasting a few:

Oct 22 15:53:33 nas smbd[5630]: [2013/10/22 15:53:33.225581, 0]
../lib/util/charset/convert_string.c:438(convert_string_talloc_handle)
Oct 22 15:53:33 nas smbd[5630]: Conversion error: Incomplete multibyte
sequence(��_TeI_cran 2010-08-13 aI_ 08.20.58.png)
Oct 22 15:53:33 nas smbd[5630]: [2013/10/22 15:53:33.225657, 0]
../lib/util/charset/convert_string.c:438(convert_string_talloc_handle)
Oct 22 15:53:33 nas smbd[5630]: Conversion error: Incomplete multibyte
sequence(��_TeI_cran 2010-08-13 aI_ 08.20.58)
Oct 22 15:53:33 nas smbd[5630]: [2013/10/22 15:53:33.226076, 0]
../lib/util/charset/convert_string.c:438(convert_string_talloc_handle)
Oct 22 15:53:33 nas smbd[5630]: Conversion error: Incomplete multibyte
sequence(��_TeI_cran 2010-08-13 aI_ 08.20.58)
Oct 22 15:53:33 nas smbd[5630]: [2013/10/22 15:53:33.226496, 0]
../lib/util/charset/convert_string.c:438(convert_string_talloc_handle)
Oct 22 15:53:33 nas smbd[5630]: Conversion error: Incomplete multibyte
sequence(��_TeI_cran 2010-08-13 aI_ 08.34.19.png)
Oct 22 15:53:33 nas rsyslogd-2177: imuxsock begins to drop messages from
pid 5630 due to rate-limiting
Oct 22 17:50:04 nas kernel: INFO: task smbd:5994 blocked for more than
120 seconds.
Oct 22 17:50:04 nas kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 17:50:04 nas kernel: smbd D 0000000000000000 0 5994 4870 0x00000084
Oct 22 17:50:04 nas kernel: ffff88017c61b948 0000000000000086
ffff8801bb44b500 0000000000000000
Oct 22 17:50:04 nas kernel: 00000000ffffffff 00000000ffffffff
ffff88017c61b998 ffffffff8105b4d3
Oct 22 17:50:04 nas kernel: ffff880175b03058 ffff88017c61bfd8
000000000000fb88 ffff880175b03058
Oct 22 17:50:04 nas kernel: Call Trace:
Oct 22 17:50:04 nas kernel: [<ffffffff8105b4d3>] ?
perf_event_task_sched_out+0x33/0x80
Oct 22 17:50:04 nas kernel: [<ffffffffa0557570>] ?
gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055757e>]
gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffff814fef1f>] __wait_on_bit+0x5f/0x90
Oct 22 17:50:04 nas kernel: [<ffffffffa0557570>] ?
gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffff814fefc8>]
out_of_line_wait_on_bit+0x78/0x90
Oct 22 17:50:04 nas kernel: [<ffffffff81092190>] ?
wake_bit_function+0x0/0x50
Oct 22 17:50:04 nas kernel: [<ffffffffa05594f5>]
gfs2_glock_wait+0x45/0x90 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055a8f7>]
gfs2_glock_nq+0x237/0x3d0 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055c5c9>]
gfs2_inode_lookup+0x129/0x300 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa054ea4d>] ?
gfs2_dirent_search+0x16d/0x1a0 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa054efbe>]
gfs2_dir_search+0x5e/0x80 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055c32e>] gfs2_lookupi+0xde/0x1e0
[gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa0559ca8>] ?
do_promote+0x208/0x330 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055c39d>] ?
gfs2_lookupi+0x14d/0x1e0 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa0569c76>] gfs2_lookup+0x36/0xd0
[gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffff81193ed7>] ? d_alloc+0x137/0x1b0
Oct 22 17:50:04 nas kernel: [<ffffffff81189935>] do_lookup+0x1a5/0x230
Oct 22 17:50:04 nas kernel: [<ffffffff81189ccd>]
__link_path_walk+0x20d/0x1030
Oct 22 17:50:04 nas kernel: [<ffffffff814a1d1a>] ? inet_recvmsg+0x5a/0x90
Oct 22 17:50:04 nas kernel: [<ffffffff8118ad7a>] path_walk+0x6a/0xe0
Oct 22 17:50:04 nas kernel: [<ffffffff8118af4b>] do_path_lookup+0x5b/0xa0
Oct 22 17:50:04 nas kernel: [<ffffffff8118bbb7>] user_path_at+0x57/0xa0
Oct 22 17:50:04 nas kernel: [<ffffffff81092150>] ?
autoremove_wake_function+0x0/0x40
Oct 22 17:50:04 nas kernel: [<ffffffff811807ec>] vfs_fstatat+0x3c/0x80
Oct 22 17:50:04 nas kernel: [<ffffffff8118095b>] vfs_stat+0x1b/0x20
Oct 22 17:50:04 nas kernel: [<ffffffff81180984>] sys_newstat+0x24/0x50
Oct 22 17:50:04 nas kernel: [<ffffffff810d6ce2>] ?
audit_syscall_entry+0x272/0x2a0
Oct 22 17:50:04 nas kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b
Oct 22 17:50:04 nas kernel: INFO: task flush-253:0:6356 blocked for more
than 120 seconds.
Oct 22 17:50:04 nas kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 22 17:50:04 nas kernel: flush-253:0 D 0000000000000001 0 6356 2
0x00000080
Oct 22 17:50:04 nas kernel: ffff88031c7339a0 0000000000000046
0000000000000000 ffff88013fd94260
Oct 22 17:50:04 nas kernel: ffff88031c733950 ffffea0003f85498
ffff880154372198 ffff880154372198
Oct 22 17:50:04 nas kernel: ffff88033a74b098 ffff88031c733fd8
000000000000fb88 ffff88033a74b098
Oct 22 17:50:04 nas kernel: Call Trace:
Oct 22 17:50:04 nas kernel: [<ffffffffa0557570>] ?
gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055757e>]
gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffff814fef1f>] __wait_on_bit+0x5f/0x90
Oct 22 17:50:04 nas kernel: [<ffffffffa0561720>] ?
gfs2_get_block_noalloc+0x0/0x40 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa0557570>] ?
gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffff814fefc8>]
out_of_line_wait_on_bit+0x78/0x90
Oct 22 17:50:04 nas kernel: [<ffffffff81092190>] ?
wake_bit_function+0x0/0x50
Oct 22 17:50:04 nas kernel: [<ffffffffa05594f5>]
gfs2_glock_wait+0x45/0x90 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa055a8f7>]
gfs2_glock_nq+0x237/0x3d0 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa057244e>]
gfs2_glock_nq_init+0x1e/0x40 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa05732ee>]
gfs2_write_inode+0x28e/0x2f0 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffffa0572446>] ?
gfs2_glock_nq_init+0x16/0x40 [gfs2]
Oct 22 17:50:04 nas kernel: [<ffffffff811a547c>]
writeback_single_inode+0x20c/0x290
Oct 22 17:50:04 nas kernel: [<ffffffff811a575e>]
writeback_sb_inodes+0xce/0x180
Oct 22 17:50:04 nas kernel: [<ffffffff811a58bb>]
writeback_inodes_wb+0xab/0x1b0
Oct 22 17:50:04 nas kernel: [<ffffffff811a5c5b>] wb_writeback+0x29b/0x3f0
Oct 22 17:50:04 nas kernel: [<ffffffff814fddd0>] ? thread_return+0x4e/0x76e
Oct 22 17:50:04 nas kernel: [<ffffffff8107ebc2>] ? del_timer_sync+0x22/0x30
Oct 22 17:50:04 nas kernel: [<ffffffff811a5f49>] wb_do_writeback+0x199/0x240
Oct 22 17:50:04 nas kernel: [<ffffffff811a6053>]
bdi_writeback_task+0x63/0x1b0
Oct 22 17:50:04 nas kernel: [<ffffffff81092017>] ? bit_waitqueue+0x17/0xd0
Oct 22 17:50:04 nas kernel: [<ffffffff81138940>] ? bdi_start_fn+0x0/0x100
Oct 22 17:50:04 nas kernel: [<ffffffff811389c6>] bdi_start_fn+0x86/0x100
Oct 22 17:50:04 nas kernel: [<ffffffff81138940>] ? bdi_start_fn+0x0/0x100
Oct 22 17:50:04 nas kernel: [<ffffffff81091de6>] kthread+0x96/0xa0
Oct 22 17:50:04 nas kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Oct 22 17:50:04 nas kernel: [<ffffffff81091d50>] ? kthread+0x0/0xa0
Oct 22 17:50:04 nas kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20




More information about the Linux-cluster mailing list