[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Rdo-list] Strange TCP reset when using NAT



Hi!

I installed Munin on two of our machines. The master runs in OpenStack, the first node I configured is the controller node of another OpenStack installation. Because of the high number of disks the "diskstats" plugin generates a lot of output (6154 lines, 218698 bytes). This much data kills the TCP connection between the master and the node.

I can reproduce this with just a telnet to the node, port 4949 like this:
# telnet 192.168.104.61 4949
Trying 192.168.104.61...
Connected to 192.168.104.61.
Escape character is '^]'.
# munin node at CNT64IB003.example.com
config diskstats
... lots of config data ...
graph_info This graph shows the number of IO operations pr second and the average size of these requests.  Lots of small requests should result in in lower throughput (separate graph) and higher service time (separate graph).  Please note Connection closed by foreign host.


The connection goes from the VM on the internal network through an OpenStack router to the external network and over a "real" router to the node. I have confirmed by using a different hardware machine that connection is OK outside of OpenStack and also by using an OpenStack VM as another Munin node, that the OpenStack L3/router is to blame, not the internal networking. For completeness, this is up-to-date Grizzly RDO running on up-to-date CentOS 6.4.

Networks:
  • 192.168.163.0/24 OpenStack internal
  • 192.168.142.0/24 OpenStack external
  • 192.168.104.0/24 outside of OpenStack

Here is the tcpdump output from the Munin master:

13:35:02.464812 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [.], ack 94081, win 143, options [nop,nop,TS val 83989877 ecr 338498688], length 0
13:35:02.465086 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq 94081:95529, ack 727, win 114, options [nop,nop,TS val 338498690 ecr 83989877], length 1448
13:35:02.465282 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq 95529:98425, ack 727, win 114, options [nop,nop,TS val 338498690 ecr 83989877], length 2896
13:35:02.469397 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [.], ack 98425, win 205, options [nop,nop,TS val 83989881 ecr 338498690], length 0
13:35:02.469686 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq 98425:99873, ack 727, win 114, options [nop,nop,TS val 338498694 ecr 83989881], length 1448
13:35:02.469881 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq 99873:104217, ack 727, win 114, options [nop,nop,TS val 338498694 ecr 83989881], length 4344
13:35:02.474029 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [.], ack 104217, win 189, options [nop,nop,TS val 83989885 ecr 338498694], length 0
13:35:02.474292 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [R], seq 4001796938, win 0, length 0
13:35:02.605976 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq 39738:41186, ack 727, win 114, options [nop,nop,TS val 338498680 ecr 83989867], length 1448
13:35:02.606042 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [R], seq 3552918741, win 0, length 0
13:35:02.615483 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [P.], seq 41186:41197, ack 727, win 114, options [nop,nop,TS val 338498680 ecr 83989867], length 11
13:35:02.615512 IP 192.168.163.12.46630 > 192.168.104.61.munin: Flags [R], seq 3552918741, win 0, length 0
13:35:02.618052 IP 192.168.104.61.munin > 192.168.163.12.46630: Flags [.], seq 41197:42645, ack 727, win 114, options [nop,nop,TS val 338498680 ecr 83989867], length 1448


You can see that the Munin node terminates the connection with a RESET packet (13:35:02.474292). The Munin master isn't running NTP, so please disregard the timestamps.

Now, here is a tcpdump output from the node:

13:35:17.485792 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [.], ack 94081, win 143, options [nop,nop,TS val 83989877 ecr 338498688], length 0
13:35:17.485812 IP 192.168.104.61.munin > 192.168.142.11.46630: Flags [.], seq 94081:98425, ack 727, win 114, options [nop,nop,TS val 338498690 ecr 83989877], length 4344
13:35:17.490390 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [.], ack 98425, win 205, options [nop,nop,TS val 83989881 ecr 338498690], length 0
13:35:17.490406 IP 192.168.104.61.munin > 192.168.142.11.46630: Flags [.], seq 98425:104217, ack 727, win 114, options [nop,nop,TS val 338498694 ecr 83989881], length 5792
13:35:17.492061 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq 3552918741, win 0, length 0
13:35:17.495034 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [.], ack 104217, win 189, options [nop,nop,TS val 83989885 ecr 338498694], length 0
13:35:17.495055 IP 192.168.104.61.munin > 192.168.142.11.46630: Flags [R], seq 4001796938, win 0, length 0
13:35:17.501622 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq 3552918741, win 0, length 0
13:35:17.511234 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq 3552918741, win 0, length 0
13:35:17.520809 IP 192.168.142.11.46630 > 192.168.104.61.munin: Flags [R], seq 3552918741, win 0, length 0

You can see that this shows the Master sending the RESET packet (13:35:17.492061).

I turned on debugging for the L3 agent, but can't tell anything from the output. Without debugging, the log file has no new entries.

Please advise.

Best regards / Mit freundlichen Grüßen
Lutz Christoph

--

Lutz Christoph

arago Institut für komplexes Datenmanagement AG

Eschersheimer Landstraße 526 - 532
60433 Frankfurt am Main

eMail: lchristoph arago de - www: http://www.arago.de
Tel: 0172/6301004
Mobil: 0172/6301004


--
Bankverbindung: Frankfurter Sparkasse, BLZ: 500 502 01, Kto.-Nr.: 79343
Vorstand: Hans-Christian Boos, Martin Friedrich
Vorsitzender des Aufsichtsrats: Dr. Bernhard Walther
Sitz: Kronberg im Taunus - HRB 5731 - Registergericht: Königstein i.Ts
Ust.Idnr. DE 178572359 - Steuernummer 2603 003 228 43435

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]