[Linux-cluster] cluster send request failed: Bad address

Tue Aug 31 18:49:13 UTC 2004

I am trying to set up a simple 3-node cluster (containing iota6-8). I get
up to running clvmd on each node. At this point, iota8 works fine, all lvm
commands work (although with some error messages about lock failures on
the other nodes). However, any attempt to use a lvm command on the other
nodes gives some sort of locking error.  For example:

[root at iota8g LVM2]#  pvremove /baddev
  /baddev: Couldn't find device.

[root at iota6g LVM2]# pvremove /baddev
  cluster send request failed: Bad address
  Can't get lock for orphan PVs

I have tracked the failure down to the fact that the call to
dlm_ls_lock() from sync_lock() in LVM2/daemons/clvmd/clvmd-cman.c
is failing, but I can not figure out why.

In particular, I am perplexed that it works on the one machine and not
the others.  Any hints about what might be causing this would be
appreciated.

	Thanks,
	Fred

The failure in more detail:

[root at iota6g cluster]# pvremove -vvv /baddev
      Setting global/locking_type to 2
      Setting global/locking_library to liblvm2clusterlock.so
      Setting global/library_dir to /lib
      Opening shared locking library /lib/liblvm2clusterlock.so
    Loaded external locking library liblvm2clusterlock.so
      External locking enabled.
  FRED - called lock_resource(cmd, , 0x24)
      Locking P_orphans at 0x4
  FRED - called _lock_for_cluster(51, 0x4, P_orphans)
  FRED - _cluster_request(51, ., data='\x04\x00P_orphans\x00', len=12)
  FRED - in _send_request: outheader =
  cmd=1, flags=0x0, xid=0, cid=134912944, status=-14, arglen=1, node=
  cluster send request failed: Bad address
  Can't get lock for orphan PVs

[root at iota6g root]# clvmd -d
CLVMD[13066]: 1093975495 CLVMD started
CLVMD[13066]: 1093975495 FRED - init_cluster
CLVMD[13066]: 1093975496 Cluster ready, doing some more initialisation
CLVMD[13066]: 1093975496 starting LVM thread
CLVMD[13066]: 1093975496 LVM thread function started
CLVMD[13066]: 1093975496 clvmd ready for work
CLVMD[13066]: 1093975496 Using timeout of 60 seconds
  No volume groups found
CLVMD[13066]: 1093975496 LVM thread waiting for work
CLVMD[13066]: 1093975500 Got new connection on fd 7
CLVMD[13066]: 1093975500 Read on local socket 7, len = 30
CLVMD[13066]: 1093975500 creating pipe, [8, 9]
CLVMD[13066]: 1093975500 in sub thread: client = 0x80a8b60
CLVMD[13066]: 1093975500 doing PRE command LOCK_VG P_orphans at 4
CLVMD[13066]: 1093975500 FRED - sync_lock(P_orphans, 4, 0x0)
CLVMD[13066]: 1093975500 FRED - sync_lock status = -1
CLVMD[13066]: 1093975500 hold_lock. lock at 4 failed: Bad address
CLVMD[13066]: 1093975500 Writing status 14 down pipe 9
CLVMD[13066]: 1093975500 Waiting to do post command - state = 0
CLVMD[13066]: 1093975500 read on PIPE 8: 4 bytes: status: 14
CLVMD[13066]: 1093975500 background routine status was 14,
sock_client=0x80a8b60CLVMD[13066]: 1093975500 Send local reply
CLVMD[13066]: 1093975500 Read on local socket 7, len = -1
CLVMD[13066]: 1093975500 EOF on local socket: inprogress=0
CLVMD[13066]: 1093975500 Waiting for child thread
CLVMD[13066]: 1093975500 SIGUSR2 received
CLVMD[13066]: 1093975500 Joined child thread
CLVMD[13066]: 1093975500 ret == 0, errno = 104. removing client