[Linux-cluster] [RESOLVED] HA iSCSI with DRBD

Wed Jan 23 01:05:57 UTC 2013

I realized, quite accidentally, that any downtime on either of the nodes (e.g., a reboot) causes corruption/inconsistencies in the DRBD resources because the DRBD node that was the DRBD primary (i.e., the preferred-primary) will forcefully become primary again when the node returns [thereby discarding modifications made on the older primary].

Therefore,  in order to prevent this from happening, it's probably best to REMOVE the final primitive from each group:

> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
> crm configure location l_iSCSI-san1+DRBD-r1 p_IP-1_253 10240: san2
This will prevent Pacemaker from promoting the younger primary and overwriting the modifications made on the older primary [when the preferred-primary node returns]. The DRBD resources can be moved manually...

> crm resource move p_IP-1_254 san1
> crm resource move p_IP-1_253 san2

...in order to distribute the workload between san1 & san2.

Thoughts? Suggestions?

Eric Pretorious
Truckee, CA

>________________________________
> From: Eric <epretorious at yahoo.com>
>To: linux clustering <linux-cluster at redhat.com> 
>Sent: Friday, January 18, 2013 12:40 PM
>Subject: Re: [Linux-cluster] [RESOLVED] HA iSCSI with DRBD
> 
>
>After rebooting both nodes, I checked the cluster status again and found this:
>Code:
>
>> san1:~ # crm_mon -1
>> ============
>> Last updated: Fri Jan 18 11:51:28 2013
>> Last change: Fri Jan 18 09:00:03 2013 by root via cibadmin on san2
>> Stack: openais
>> Current DC: san2 - partition with quorum
>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 9 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>      Masters: [ san2 ]
>>      Slaves: [ san1 ]
>>  Resource Group: g_iSCSI-san1
>>     
 p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san2
>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_4    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Stopped 
>> 
>> Failed actions:
>>    
 p_iSCSI-san1_4_start_0 (node=san1, call=25, rc=1, status=complete): unknown error
>>     p_iSCSI-san1_4_start_0 (node=san2, call=30, rc=1, status=complete): unknown error
>
>...and that's when it occured to me: There are only four volumes defined in the DRBD cofiguration (0, 1, 2, & 3) - not five (0, 1, 2, 3, & 4)! i.e., The p_iSCSI-san1_4 primitive was failing (because there is no volume /dev/drbd4) and that, in turn, was holding up theresource group g_iSCSI-san1 and causing all of the other primitives [e.g., p_IP-1_254] to fail too!
>
>So, I deleted p_iSCSI-san1_4 from the CIB and the cluster began working as designed:
>
>> san2:~ # ll /dev/drbd*
>> brw-rw---- 1 root disk 147, 0 Jan 18 11:47 /dev/drbd0
>> brw-rw---- 1 root disk 147, 1 Jan 18 11:47 /dev/drbd1
>> brw-rw---- 1 root disk 147, 2 Jan 18 11:47 /dev/drbd2
>> brw-rw---- 1 root disk 147, 3 Jan 18 11:47 /dev/drbd3
>> 
>> ...
>> 
>
>> san2:~ # crm_mon -1
>> ============
>> Last updated: Fri Jan 18 11:53:03 2013
>> Last change: Fri Jan 18 11:52:58 2013 by root via cibadmin on san2
>> Stack: openais
>> Current DC: san2 - partition with quorum
>> Version: 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>> 2 Nodes configured, 2 expected votes
>> 8 Resources configured.
>> ============
>> 
>> Online: [ san1 san2 ]
>> 
>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>      Masters: [ san2
 ]
>>      Slaves: [ san1 ]
>>  Resource Group: g_iSCSI-san1
>>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san2
>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_iSCSI-san1_3    (ocf::heartbeat:iSCSILogicalUnit):    Started san2
>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Started san2
>
>From the iSCSI client (xen2):
>
>> xen2:~ #
 iscsiadm -m discovery -t st -p 192.168.1.254
>> 192.168.1.254:3260,1 iqn.2012-11.com.example.san1:sda
>> 192.168.0.2:3260,1 iqn.2012-11.com.example.san1:sda
>> 192.168.1.2:3260,1 iqn.2012-11.com.example.san1:sda
>
>
>Problem fixed!
>
>
>Eric Pretorious
>Truckee, CA
>
>
>
>>________________________________
>> From: Eric <epretorious at yahoo.com>
>>To: linux clustering <linux-cluster at redhat.com> 
>>Sent: Thursday, January 17, 2013 8:59 PM
>>Subject: [Linux-cluster] HA iSCSI with DRBD
>> 
>>
>>I've been attempting to follow the recipe laid-out in the Linbit guide "Highly available iSCSI storage with DRBD and Pacemaker" to create a highly-available iSCSI server on the two servers san1 & san2 but can't quite get the details right:
>>
>>
>>> crm configure property stonith-enabled=false
>>> crm configure property no-quorum-policy=ignore
>>> 
>>> crm configure primitive p_IP-1_254 ocf:heartbeat:IPaddr2 params ip=192.168.1.254 cidr_netmask=24 op monitor interval=30s
>>> 
>>> crm configure primitive p_DRBD-r0 ocf:linbit:drbd params drbd_resource=r0 op monitor interval=60s
>>> crm configure
 ms
 ms_DRBD-r0 p_DRBD-r0 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
>>> 
>>> crm configure primitive p_iSCSI-san1 ocf:heartbeat:iSCSITarget params iqn=iqn.2012-11.com.example.san1:sda op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_0 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=0 path=/dev/drbd0 op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_1 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=1 path=/dev/drbd1 op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_2 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=2 path=/dev/drbd2 op monitor interval=10s
>>> crm configure primitive p_iSCSI-san1_3 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=3 path=/dev/drbd3 op monitor interval=10s
>>> crm configure
 primitive p_iSCSI-san1_4 ocf:heartbeat:iSCSILogicalUnit params target_iqn=iqn.2012-11.com.example.san1:sda lun=4 path=/dev/drbd4 op monitor interval=10s
>>> 
>>> crm configure group g_iSCSI-san1 p_iSCSI-san1 p_iSCSI-san1_0 p_iSCSI-san1_1 p_iSCSI-san1_2 p_iSCSI-san1_3 p_iSCSI-san1_4 p_IP-1_254
>>> crm configure order o_DRBD-r0_before_iSCSI-san1 inf: ms_DRBD-r0:promote g_iSCSI-san1:start
>>> crm configure colocation c_iSCSI_with_DRBD-r0 inf: g_iSCSI-san1 ms_DRBD-r0:Master
>>> crm configure location l_iSCSI-san1+DRBD-r0 p_IP-1_254 10240: san1
>>
>>
>>IET (i.e., iscsitarget) is already running (with the default configuration) and DRBD's already correctly configured to create the resource r0...
>>
>>
>>> resource r0 {
>>>     volume 0 {
>>>         device /dev/drbd0 ;
>>>         disk /dev/sda7 ;
>>>         meta-disk internal ;
>>>     }
>>>     volume 1 {
>>>         device /dev/drbd1 ;
>>>         disk /dev/sda8 ;
>>>         meta-disk internal
 ;
>>>     }
>>>     volume 2 {
>>>         device /dev/drbd2 ;
>>>         disk /dev/sda9 ;
>>>         meta-disk internal ;
>>>     }
>>>     volume 3 {
>>>         device /dev/drbd3 ;
>>>         disk /dev/sda10 ;
>>>         meta-disk internal ;
>>>     }
>>>     on san1 {
>>>         address 192.168.1.1:7789 ;
>>>     }
>>>     on san2 {
>>>         address 192.168.1.2:7789 ;
>>>     }
>>> }
>>
>>
>>
>>But the shared IP address won't start nor will the LUN's:
>>
>>
>>> san1:~ # crm_mon -1
>>> ============
>>> Last updated: Thu Jan 17 20:55:55 2013
>>> Last change: Thu Jan 17 20:55:09 2013 by root via cibadmin on san1
>>> Stack: openais
>>> Current DC: san1 - partition with quorum
>>> Version:
 1.1.7-77eeb099a504ceda05d648ed161ef8b1582c7daf
>>> 2 Nodes configured, 2 expected votes
>>> 9 Resources configured.
>>> ============
>>> 
>>> Online: [ san1 san2 ]
>>> 
>>>  Master/Slave Set: ms_DRBD-r0 [p_DRBD-r0]
>>>      Masters: [ san1 ]
>>>      Slaves: [ san2 ]
>>>  Resource Group: g_iSCSI-san1
>>>      p_iSCSI-san1    (ocf::heartbeat:iSCSITarget):    Started san1
>>>      p_iSCSI-san1_0    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>>>      p_iSCSI-san1_1    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>>>      p_iSCSI-san1_2    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>>>      p_iSCSI-san1_3   
 (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>>>      p_iSCSI-san1_4    (ocf::heartbeat:iSCSILogicalUnit):    Stopped 
>>>      p_IP-1_254    (ocf::heartbeat:IPaddr2):    Stopped 
>>> 
>>> Failed actions:
>>>     p_iSCSI-san1_0_start_0 (node=san1, call=23, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_1_start_0 (node=san1, call=26, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_2_start_0 (node=san1, call=29, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_3_start_0 (node=san1, call=32, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_4_start_0 (node=san1, call=35, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_0_start_0 (node=san2, call=11, rc=1,
 status=complete): unknown error
>>>     p_iSCSI-san1_1_start_0 (node=san2, call=14, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_2_start_0 (node=san2, call=17, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_3_start_0 (node=san2, call=20, rc=1, status=complete): unknown error
>>>     p_iSCSI-san1_4_start_0 (node=san2, call=23, rc=1, status=complete): unknown error
>>
>>
>>
>>What am I doing wrong?
>>
>>
>>
>>TIA,
>>Eric Pretorious
>>Truckee, CA
>>
>>-- 
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130122/e7006690/attachment.htm>