[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash



So this last one looks like telemetry services went down. You could
check the logs on the controllers to see if it was OOM killed. My bet
would be this is what is happening.

The reason that HA is not the default for tripleo-quickstart is exactly
this type of issue. It is pretty difficult to fit a full HA deployment
of TripleO on a 32G virthost. I think there is near 100% chance that the
default HA config will crash when trying to do anything on the
deployed overcloud, due to running out of memory.

I have had some success in my local test setup using KSM [1] on the
virthost, and then changing the HA config to give the controllers more
memory. This results in overcommiting, but KSM can handle overcommiting
without going into swap. It might even be possible to try to setup KSM
in the environment setup part of quickstart. I would certainly accept an
RFE/patch for this [2,3].

If you have a larger virthost than 32G, you could similarly bump the
memory for the controllers, which should lead to a much higher success rate.

There is also a feature coming in TripleO [4] that will allow choosing
what services get deployed in each role, which will allow us to tweak
the tripleo-quickstart HA config to deploy a minimal service layout in
order to reduce memory requirements.

Thanks a ton for giving tripleo-quickstart a go!

[1] https://en.wikipedia.org/wiki/Kernel_same-page_merging
[2] https://bugs.launchpad.net/tripleo-quickstart
[3] https://review.openstack.org/#/q/project:openstack/tripleo-quickstart
[4]
https://blueprints.launchpad.net/tripleo/+spec/composable-services-within-roles

On 06/03/2016 06:20 AM, Boris Derzhavets wrote:
> =====================================
> 
> Fresh HA deployment attempt
> 
> =====================================
> 
> [stack undercloud ~]$ date
> Fri Jun  3 10:05:35 UTC 2016
> [stack undercloud ~]$ heat stack-list
> +--------------------------------------+------------+-----------------+---------------------+--------------+
> | id                                   | stack_name | stack_status    | creation_time       | updated_time |
> +--------------------------------------+------------+-----------------+---------------------+--------------+
> | 0c6b8205-be86-4a24-be36-fd4ece956c6d | overcloud  | CREATE_COMPLETE | 2016-06-03T08:14:19 | None         |
> +--------------------------------------+------------+-----------------+---------------------+--------------+
> [stack undercloud ~]$ nova list
> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
> | ID                                   | Name                    | Status | Task State | Power State | Networks            |
> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
> | 6a38b7be-3743-4339-970b-6121e687741d | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
> | 9222dc1b-5974-495b-8b98-b8176ac742f4 | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
> | 76adbb27-220f-42ef-9691-94729ee28749 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.0.2.11 |
> | 8f57f7b6-a2d8-4b7b-b435-1c675e63ea84 | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
> +--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
> [stack undercloud ~]$ ssh heat-admin 192 0 2 10
> Last login: Fri Jun  3 10:01:44 2016 from gateway
> [heat-admin overcloud-controller-0 ~]$ sudo su -
> Last login: Fri Jun  3 10:01:49 UTC 2016 on pts/0
> [root overcloud-controller-0 ~]# .  keystonerc_admin
> 
> [root overcloud-controller-0 ~]# pcs status
> Cluster name: tripleo_cluster
> Last updated: Fri Jun  3 10:07:22 2016        Last change: Fri Jun  3 08:50:59 2016 by root via cibadmin on overcloud-controller-0
> Stack: corosync
> Current DC: overcloud-controller-0 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
> 3 nodes and 123 resources configured
> 
> Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
> 
> Full list of resources:
> 
>  ip-192.0.2.6    (ocf::heartbeat:IPaddr2):    Started overcloud-controller-0
>  Clone Set: haproxy-clone [haproxy]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  ip-192.0.2.7    (ocf::heartbeat:IPaddr2):    Started overcloud-controller-1
>  Master/Slave Set: galera-master [galera]
>      Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: memcached-clone [memcached]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: rabbitmq-clone [rabbitmq]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-core-clone [openstack-core]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Master/Slave Set: redis-master [redis]
>      Masters: [ overcloud-controller-1 ]
>      Slaves: [ overcloud-controller-0 overcloud-controller-2 ]
>  Clone Set: mongod-clone [mongod]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  openstack-cinder-volume    (systemd:openstack-cinder-volume):    Started overcloud-controller-2
>  Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-gnocchi-metricd-clone [openstack-gnocchi-metricd]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-heat-api-clone [openstack-heat-api]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-glance-api-clone [openstack-glance-api]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-nova-api-clone [openstack-nova-api]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-sahara-api-clone [openstack-sahara-api]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-sahara-engine-clone [openstack-sahara-engine]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-gnocchi-statsd-clone [openstack-gnocchi-statsd]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: delay-clone [delay]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: neutron-server-clone [neutron-server]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
>      Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: httpd-clone [httpd]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
>  Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
>      Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
> 
> Failed Actions:
> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-1 'not running' (7): call=76, status=complete, exitreason='none',
>     last-rc-change='Fri Jun  3 08:47:22 2016', queued=0ms, exec=0ms
> * openstack-ceilometer-central_start_0 on overcloud-controller-1 'not running' (7): call=290, status=complete, exitreason='none',
>     last-rc-change='Fri Jun  3 08:51:18 2016', queued=0ms, exec=2132ms
> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-2 'not running' (7): call=76, status=complete, exitreason='none',
>     last-rc-change='Fri Jun  3 08:47:16 2016', queued=0ms, exec=0ms
> * openstack-ceilometer-central_start_0 on overcloud-controller-2 'not running' (7): call=292, status=complete, exitreason='none',
>     last-rc-change='Fri Jun  3 08:51:31 2016', queued=0ms, exec=2102ms
> * openstack-aodh-evaluator_monitor_60000 on overcloud-controller-0 'not running' (7): call=77, status=complete, exitreason='none',
>     last-rc-change='Fri Jun  3 08:47:19 2016', queued=0ms, exec=0ms
> * openstack-ceilometer-central_start_0 on overcloud-controller-0 'not running' (7): call=270, status=complete, exitreason='none',
>     last-rc-change='Fri Jun  3 08:50:02 2016', queued=0ms, exec=2199ms
> 
> 
> PCSD Status:
>   overcloud-controller-0: Online
>   overcloud-controller-1: Online
>   overcloud-controller-2: Online
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> 
> ________________________________
> From: rdo-list-bounces redhat com <rdo-list-bounces redhat com> on behalf of Boris Derzhavets <bderzhavets hotmail com>
> Sent: Monday, May 30, 2016 4:56 AM
> To: John Trowbridge; Lars Kellogg-Stedman
> Cc: rdo-list
> Subject: Re: [rdo-list] Tripleo QuickStart HA deployment attempts constantly crash
> 
> 
> Done one more time :-
> 
> 
> [stack undercloud ~]$ heat deployment-show 9cc8087a-6d82-4261-8a13-ee8c46e3a02d
> 
> Uploaded here :-
> 
> http://textuploader.com/5bm5v
> ________________________________
> From: rdo-list-bounces redhat com <rdo-list-bounces redhat com> on behalf of Boris Derzhavets <bderzhavets hotmail com>
> Sent: Sunday, May 29, 2016 3:39 AM
> To: John Trowbridge; Lars Kellogg-Stedman
> Cc: rdo-list
> Subject: [rdo-list] Tripleo QuickStart HA deploymemt attempts constantly crash
> 
> 
> Error every time is the same :-
> 
> 
> 2016-05-29 07:20:17 [0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
> 2016-05-29 07:20:18 [0]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:18 [overcloud-ControllerNodesPostDeployment-dzawjmjyaidt-ControllerServicesBaseDeployment_Step2-ufz2ccs5egd7]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
> 2016-05-29 07:20:18 [0]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:19 [ControllerServicesBaseDeployment_Step2]: CREATE_FAILED Error: resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
> 2016-05-29 07:20:19 [0]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:19 [0]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:20 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:20 [overcloud-ControllerNodesPostDeployment-dzawjmjyaidt]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
> 2016-05-29 07:20:21 [ControllerNodesPostDeployment]: CREATE_FAILED Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
> 2016-05-29 07:20:21 [0]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:22 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:20:22 [0]: SIGNAL_COMPLETE Unknown
> 2016-05-29 07:24:22 [ComputeNodesPostDeployment]: CREATE_FAILED CREATE aborted
> 2016-05-29 07:24:22 [overcloud]: CREATE_FAILED Resource CREATE failed: Error: resources.ControllerNodesPostDeployment.resources.ControllerServicesBaseDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 6
> Stack overcloud CREATE_FAILED
> Deployment failed:  Heat Stack create failed.
> + heat stack-list
> + grep -q CREATE_FAILED
> + deploy_status=1
> ++ heat resource-list --nested-depth 5 overcloud
> ++ grep FAILED
> ++ grep 'StructuredDeployment '
> ++ cut -d '|' -f3
> + for failed in '$(heat resource-list         --nested-depth 5 overcloud | grep FAILED |
>         grep '\''StructuredDeployment '\'' | cut -d '\''|'\'' -f3)'
> + heat deployment-show 66bd3fbe-296b-4f88-87a7-5ceafd05c1d3
> + exit 1
> 
> 
> Minimal configuration deployments run with no errors and build completely functional environment.
> 
> 
> However,   template :-
> 
> 
> #################################
> # Test Controller + 2*Compute nodes
> #################################
> control_memory: 6144
> compute_memory: 6144
> 
> undercloud_memory: 8192
> 
> # Giving the undercloud additional CPUs can greatly improve heat's
> # performance (and result in a shorter deploy time).
> undercloud_vcpu: 4
> 
> # We set introspection to true and use only the minimal amount of nodes
> # for this job, but test all defaults otherwise.
> step_introspect: true
> 
> # Define a single controller node and a single compute node.
> overcloud_nodes:
>   - name: control_0
>     flavor: control
> 
>   - name: compute_0
>     flavor: compute
> 
>   - name: compute_1
>     flavor: compute
> 
> # Tell tripleo how we want things done.
> extra_args: >-
>   --neutron-network-type vxlan
>   --neutron-tunnel-types vxlan
>   --ntp-server pool.ntp.org
> 
> network_isolation: true
> 
> 
> Picks up new memory setting but doesn't create second Compute Node.
> 
> Every time just Controller && (1)* Compute.
> 
> 
> HW - i74790 , 32 GB RAM
> 
> 
> Thanks.
> 
> Boris
> 
> ________________________________
> 
> 
> 
> 
> _______________________________________________
> rdo-list mailing list
> rdo-list redhat com
> https://www.redhat.com/mailman/listinfo/rdo-list
> 
> To unsubscribe: rdo-list-unsubscribe redhat com
> 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]