[rhos-list] Problems with quantum and dhcp-agent
Gary Kotton
gkotton at redhat.com
Thu Jun 6 11:21:29 UTC 2013
On 06/06/2013 11:29 AM, Gary Kotton wrote:
> On 06/05/2013 11:14 PM, Steven Dake wrote:
>> On 06/05/2013 12:14 AM, Gary Kotton wrote:
>>> On 06/05/2013 01:17 AM, Steven Dake wrote:
>>>> On 06/04/2013 01:09 PM, S Manoo wrote:
>>>>> Looking into this further, I'm observing the same error message
>>>>> relating to timeouts talking to qpid in dhcp-agent.log after every
>>>>> restart, perhaps this is why I'm unable to get any dhcp responses
>>>>> to instances? Any suggestions on what's causing this and where I
>>>>> might look to troubleshoot this further?
>>>
>>> When one restarts a host each process needs to register with the
>>> message broker. If you are running all of the services on the same
>>> host then they will only be able to connect when the qpid service is
>>> up and running. This usually takes a few seconds after reboot. If a
>>> service does not receive an answer from the qpid service then it
>>> will wait and retry again. This is why you see the timeouts. The
>>> wait is incremental. I have seen that all service are usually able
>>> to connect within a minute of booting a host (we should try and
>>> reduce this time).
>>>
>>> Please note that the quantum cli has an option: quantum agent-list.
>>> This provides the list of agents, their status and hosts that they
>>> are running on.
>>>
>>> If you spin up an instance after the dhcp agent is up and running do
>>> you see the problem?
>>>
>>>>>
>>>> S Manoo,
>>>>
>>>> We may have just fixed a bug related to this problem which is not
>>>> fixed in the preview. Please try the workaround in this bugzilla:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=970453
>>>
>>> This fix is good for an all in one setup but will not help if the
>>> DHCP agent is running on another host. In Quantum we have the notion
>>> of a network node. Please look at
>>> https://docs.google.com/drawings/d/167gegaoTBZpd318b2JTgF_Qi9YdkIX8pcQ6YBJLUtGY/edit?usp=sharing
>>>
>>> If the message broker goes down (say for example host reboot or
>>> network problems) then the dhcp agent will try and reconnect.
>>>
>> Gary,
>>
>> I have found dhcp agent stops responding permanently in this
>> condition on a all in one setup. Perhaps the same is true for
>> multinode (ie the retry logic doesn't work as expected). I don't
>> have multiple nodes to test, but might be worth double-checking if
>> you do.
>
> I have done the following check (on an all in one setup):
> 1. reboot host
> 2. stop quantum service
> 3. check that dhcp agent has a timeout with the quantum service
> 4. restart quantum service
>
> I see a number of issues which I am going to investigate:
>
> 1. The agent is up:
> [root at dhcp-4-126 ~(keystone_admin)]# quantum agent-list
> +--------------------------------------+--------------------+---------------------------+-------+----------------+
> | id | agent_type |
> host | alive | admin_state_up |
> +--------------------------------------+--------------------+---------------------------+-------+----------------+
> | 11e35126-6c07-4a2f-b681-399cdbc8210d | L3 agent |
> dhcp-4-126.tlv.redhat.com | :-) | True |
> | 5e75d5d9-edb0-462e-850b-013ad7a518f4 | DHCP agent |
> dhcp-4-126.tlv.redhat.com | :-) | True |
> | af51a50f-f45e-4736-8517-ed3cda759b3c | Open vSwitch agent |
> dhcp-4-126.tlv.redhat.com | :-) | True |
> | d7588cb1-b287-4b4c-a8a8-539d4c5129b2 | Open vSwitch agent |
> dhcp-4-227.tlv.redhat.com | :-) | True |
> +--------------------------------------+--------------------+---------------------------+-------+----------------+
> [root at dhcp-4-126 ~(keystone_admin)]#
> This means the the agent successfully sent a message to the plugin.
>
> 2. In the DHCP log there are timeouts with the qpid service and no
> notification of a resync (which used to happen in Folsom.
>
> I am on it. And will post on any progress.
>
> A few days ago I had problem with nova compute which seemed similar to
> this. I hope that it is not an issue with qpid and solely related to
> the dhcp agent (which is easier for me to address)
I have found the problem and am pushing patches.
Thanks
Gary
>
>>
>> Regards
>> -steve
>>
>>>>
>>>> Regards
>>>> -steve
>>>>
>>>>
>>>>> */var/log/quantum/dhcp-agent.log:*
>>>>> 2013-06-04 12:50:44 INFO [quantum.common.config] Logging enabled!
>>>>> 2013-06-04 12:50:44 INFO
>>>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server
>>>>> on localhost:5672
>>>>> 2013-06-04 12:50:44 INFO
>>>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server
>>>>> on localhost:5672
>>>>> 2013-06-04 12:50:44 INFO [quantum.agent.dhcp_agent] DHCP agent
>>>>> started
>>>>> 2013-06-04 12:51:44 ERROR [quantum.agent.dhcp_agent] Failed
>>>>> reporting state!
>>>>> Traceback (most recent call last):
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py",
>>>>> line 700, in _report_state
>>>>> self.agent_state)
>>>>> File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py",
>>>>> line 66, in report_state
>>>>> topic=self.topic)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py",
>>>>> line 80, in call
>>>>> return rpc.call(context, self._get_topic(topic), msg, timeout)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py",
>>>>> line 140, in call
>>>>> return _get_impl().call(CONF, context, topic, msg, timeout)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 611, in call
>>>>> rpc_amqp.get_connection_pool(conf, Connection))
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>>> line 613, in call
>>>>> rv = list(rv)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>>> line 555, in __iter__
>>>>> self.done()
>>>>> File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
>>>>> self.gen.next()
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>>> line 552, in __iter__
>>>>> self._iterator.next()
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 436, in iterconsume
>>>>> yield self.ensure(_error_callback, _consume)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 380, in ensure
>>>>> error_callback(e)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 421, in _error_callback
>>>>> raise rpc_common.Timeout()
>>>>> Timeout: Timeout while waiting on RPC response.
>>>>> 2013-06-04 12:51:44 WARNING
>>>>> [quantum.openstack.common.loopingcall] task run outlasted interval
>>>>> by 56.108887 sec
>>>>> 2013-06-04 12:51:44 INFO [quantum.agent.dhcp_agent]
>>>>> Synchronizing state
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 3, 2013 at 11:28 PM, S Manoo <smanoo76 at gmail.com
>>>>> <mailto:smanoo76 at gmail.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> *dhcp-agent.log:*
>>>>> [root at grizzly ~(keystone_admin)]# cat dhcp-agent.log
>>>>> 2013-06-03 22:27:09 INFO [quantum.common.config] Logging
>>>>> enabled!
>>>>> 2013-06-03 22:27:09 INFO
>>>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP
>>>>> server on 10.0.0.19:5672 <http://10.0.0.19:5672>
>>>>> 2013-06-03 22:27:09 INFO
>>>>> [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP
>>>>> server on 10.0.0.19:5672 <http://10.0.0.19:5672>
>>>>> 2013-06-03 22:27:10 INFO [quantum.agent.dhcp_agent] DHCP
>>>>> agent started
>>>>> 2013-06-03 22:28:10 ERROR [quantum.agent.dhcp_agent] Failed
>>>>> reporting state!
>>>>> Traceback (most recent call last):
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py", line
>>>>> 700, in _report_state
>>>>> self.agent_state)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py", line
>>>>> 66, in report_state
>>>>> topic=self.topic)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py",
>>>>> line 80, in call
>>>>> return rpc.call(context, self._get_topic(topic), msg, timeout)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py",
>>>>> line 140, in call
>>>>> return _get_impl().call(CONF, context, topic, msg, timeout)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 611, in call
>>>>> rpc_amqp.get_connection_pool(conf, Connection))
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>>> line 613, in call
>>>>> rv = list(rv)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>>> line 555, in __iter__
>>>>> self.done()
>>>>> File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
>>>>> self.gen.next()
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py",
>>>>> line 552, in __iter__
>>>>> self._iterator.next()
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 436, in iterconsume
>>>>> yield self.ensure(_error_callback, _consume)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 380, in ensure
>>>>> error_callback(e)
>>>>> File
>>>>> "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py",
>>>>> line 421, in _error_callback
>>>>> raise rpc_common.Timeout()
>>>>> Timeout: Timeout while waiting on RPC response.
>>>>> 2013-06-03 22:28:10 WARNING
>>>>> [quantum.openstack.common.loopingcall] task run outlasted
>>>>> interval by 56.133099 sec
>>>>> 2013-06-03 22:28:10 INFO [quantum.agent.dhcp_agent]
>>>>> Synchronizing state
>>>>> [root at grizzly ~(keystone_admin)]#
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> rhos-list mailing list
>>>>> rhos-list at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/rhos-list
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> rhos-list mailing list
>>>> rhos-list at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/rhos-list
>>>
>>
>
>
>
> _______________________________________________
> rhos-list mailing list
> rhos-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rhos-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhos-list/attachments/20130606/a92b0a16/attachment.htm>
More information about the rhos-list
mailing list