[Freeipa-users] replica running trust-agents can't resolve AD users - which of these sssd errors should I be focusing on?

Alexander Bokovoy abokovoy at redhat.com
Thu Dec 22 21:34:01 UTC 2016


On to, 22 joulu 2016, Chris Dagdigian wrote:
>Hi folks,
>
>Summary:  Replica w/ Trust agents can't resolve AD users. Not sure 
>which debug_level=log error I should focus on. Would appreciate extra 
>eyeballs on this ..
>
>Have a brand new replica (v4.4) running and after installing the AD 
>trust agents I still can't recognize users who exist in the trusted AD 
>domains.
>
>Running at debug_level=10 for logging as usual however deleting the 
>logs and doing a fresh reboot followed by trying to resolve a users 
>still make 4000+ log entries so rather than include it here I've 
>posted a sanitized sssd_domain.log file here:
>
>http://chrisdag.me/sssd_companyidm.org.log.txt
>
>There are two sets of messages in that massive log file that concern 
>me but I don't know enough yet to figure out which one to focus on.
>
>The first set of messages show what appears to be a fatal error in 
>connecting to the local ldap:// server on the replica.
>
>However -
> - dirsirv logs look fine
> - the various ldapsearch commands in the Free-IPA troublehooting page 
>work to query both the replica and the remote master
> - 'ipactl status' shows directory services running
> - no firewall blocking and AWS VPC flowLogs show no REJECT traffic 
>whatsoever for the NIC on the replica
Can you show us ldap_child.log and krb5_child.log from /var/log/sssd on
the replica?

There seem to be something weird with networking stack, because at
15:43:13 the next attempt to connect gets 'connection refused'. May be
389-ds is just warming up and there is not enough CPU or I/O to handle
the load?

(Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]]
[be_resolve_server_process] (0x0200): Found address for server usaeilidmp002.companyidm.org: [10.127.66.11] TTL 7200
(Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_send] (0x4000): Using file descriptor [27] for the connection.
(Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]]
[sssd_async_connect_done] (0x0020): connect failed [111][Connection refused].
(Thu Dec 22 15:43:13 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request failed: [111]: Connection refused.

this is definitely is different from the result of two seconds before:

(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_send] (0x4000): Using file descriptor [21] for the connection.
(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sssd_async_connect_send] (0x0020): connect failed [101][Network is unreachable].
(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_done] (0x0020): sdap_async_sys_connect request failed: [101]: Network is unreachable.
(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_state_destructor] (0x0400): closing socket [21] 
(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sss_ldap_init_sys_connect_done] (0x0020): sssd_async_socket_init request failed: [101]: Network is unreachable.
(Thu Dec 22 15:43:11 2016) [sssd[be[companyidm.org]]]
[sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [101]: Network is unreachable.

Later, in a minute it seems to respond just well:


(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_send] (0x4000): Using file descriptor [27] for the connection.
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://usaeilidmp002.companyidm.org:389/??base] with fd [27].
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_get_rootdse_send] (0x4000): Getting rootdse
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_print_server] (0x2000): Searching 10.127.66.11:389
...

(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range]
(0x2000): No sub-attributes for [supportedSASLMechanisms]
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range]
(0x2000): No sub-attributes for [defaultNamingContext]
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]] [sdap_parse_range]
(0x2000): No sub-attributes for [lastUSN]
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_process_result] (0x2000): Trace: sh[0x7f4f63ace530], connected[1], ops[0x7f4f63b283d0], ldap[0x7f4f63b28720]
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_op_destructor] (0x2000): Operation 1 finished
(Thu Dec 22 15:44:28 2016) [sssd[be[companyidm.org]]]
[sdap_get_rootdse_done] (0x2000): Got rootdse

So, sssd on the replica is able to retrieve information from the
replica's LDAP server. It also is able to retrieve the trust topology
information and retrieve the trusted domain objects to use against the
forest root domains your deployment trusts.

But at the point when it tries to contact global catalog and domain
controllers from the trusted domains, it cannot access them, so it
considers them offline.

Can you show us your /etc/krb5.conf on this replica, content of files in
/var/lib/sss/pubconf/krb5.include.d subdirectory which get included into
/etc/krb5.conf, and the logs I asked above?

Can you make sure that the replica is actually able to reach AD DCs for
the trusted domains (ports tcp/3268, tcp/389, tcp/88, udp/88, udp/53 at
least)?

>The second scary thing I see relates to some "Invalid Argument" errors 
>related to ipa_get_*_acct stuff that I see when I try to resolve my AD 
>user account:
These are coming from the failure to contact trusted domain's GC or LDAP
servers. Looking at the code flow, it seems an attempt to actually
connect to the server failed and there was no retry allowed
(internally), so SSSD marked the subdomain it tried to connect to as
'offline'.

So I'd bet you have something blocking access from the replica towards
domain controllers of those AD domains.

-- 
/ Alexander Bokovoy




More information about the Freeipa-users mailing list