[Spacewalk-list] Subtask repo-sync failed

Matt Moldvan matt at moldvan.com
Wed Dec 12 18:45:46 UTC 2018


I found a similar issue yesterday while training someone else on
Spacewalk.  I noticed some provisioning/patching issues with our systems,
and found that a spacewalk-repo-sync job had been stuck since October 1st.
>From a technical perspective, I have trouble understanding how a process
could be stuck for so long.  On running strace against the process, there
was no movement other than a MUTEX call that sat there.  Running lsof
against it showed some outbound connections to EC2 addresses in AWS (the
CentOS official repos) that seem to have been been stuck for over 2
months.  Again, I can't comprehend why a connection would be stuck that
long, as basic TCP settings and other default network concepts would
probably drop a connection that was in that state for so long.

We don't generally see that in applications and it seems that others are
also seeing the issue, so I can only venture to guess it is something that
can be addressed at the application level.  At the very least, it would be
good to have some mechanism to report when channel synchronization failed,
was hung, or timed out.  In my opinion it is better to fail quickly, time
out, and notify than to hang indefinitely and let someone stumble across it
two months later, and my position is that should be done at the app layer.

For what it's worth, there may have been other issues at play as well,
including database or taskomatic.  However, in initial troubleshooting we
found that restarting Taskomatic, the external Postgres database instance
(the database, not the server completely), and killing the process out
right did not seem to fix the issue.

On Wed, Dec 12, 2018 at 11:19 AM Dimitri Yioulos <dyioulos at netatlantic.com>
wrote:

> Thanks for the reply, Dennis.  From what I can gather, taskomatic seems
> to be working OK.  The epel repo syncs (for rhel 6 and rhel 7) are the
> only ones that are creating this issue.
>
>
>
> *From:* spacewalk-list-bounces at redhat.com <
> spacewalk-list-bounces at redhat.com> *On Behalf Of *Dennis Pittman
> *Sent:* Wednesday, December 12, 2018 10:09 AM
>
>
> *To:* spacewalk-list at redhat.com
> *Subject:* Re: [Spacewalk-list] Subtask repo-sync failed
>
>
>
> What is error code 137?
>
>
>
> Ans:  Exit Code Number 128+n “Fatal error signal “n” could be generated by
> “kill -9 $PPID of script” returns comment “script   $? returns 137 (128 +
> 9)”
>
> So that would more than likely be a red herring.  You need to check the
> state of taskomatic as it tend to be the primary source of problems of this
> nature.
>
>
>
> “2018-12-06 08:08:55,736 [DefaultQuartzScheduler_Worker-5] ERROR
> com.redhat.rhn.taskomatic.task.RepoSyncTask  - Stack
> trace:org.quartz.JobExecutionException: Command
> '[/usr/bin/spacewalk-repo-sync, --channel, epel7-x86_64, --type, yum]'
> exited with error code 137”
>
>
>
>
>
> *Dennis J. Pittman *
>
> *(e)      djpittma at outlook.com <djpittma at outlook.com>*
>
> *(m)    919-426-8907 <(919)%20426-8907>*
>
> *(a)     310 Acorn Hollow Pl., Durham, NC 27703*
>
>
>
> *From:* spacewalk-list-bounces at redhat.com [
> mailto:spacewalk-list-bounces at redhat.com
> <spacewalk-list-bounces at redhat.com>] *On Behalf Of *Dimitri Yioulos
> *Sent:* Wednesday, December 12, 2018 9:57 AM
> *To:* spacewalk-list at redhat.com
> *Subject:* Re: [Spacewalk-list] Subtask repo-sync failed
>
>
>
> Anybody on this?  It’s making me crazy.
>
>
>
> Thanks.
>
>
>
> Dimitri
>
>
>
> *From:* Dimitri Yioulos
> *Sent:* Thursday, December 06, 2018 9:17 AM
> *To:* spacewalk-list at redhat.com
> *Subject:* Subtask repo-sync failed
>
>
>
> Hi, all.
>
>
>
> For a while now, scheduled repo syncs of with the epel 6 and 7
> repositories have produced emails from our Spacewalk 2.8 saying the
> following:
>
>
>
> Taskomatic bunch repo-sync-bunch was scheduled to run within the
> repo-sync-1-130 schedule.
>
> Subtask repo-sync failed.
>
> For more information check
> /var/log/rhn/tasko/org1/repo-sync-bunch/repo-sync_10814174_err.
>
>
>
> I’ve looked at the error log identified above, the output of which is:
>
>
>
> 2018-12-06 08:08:55,660 [DefaultQuartzScheduler_Worker-5] ERROR
> com.redhat.rhn.taskomatic.task.RepoSyncTask  - Executing a task threw an
> exception: org.quartz.JobExecutionException
>
> 2018-12-06 08:08:55,667 [DefaultQuartzScheduler_Worker-5] ERROR
> com.redhat.rhn.taskomatic.task.RepoSyncTask  - Message: Command
> '[/usr/bin/spacewalk-repo-sync, --channel, epel7-x86_64, --type, yum]'
> exited with error code 137
>
> 2018-12-06 08:08:55,670 [DefaultQuartzScheduler_Worker-5] ERROR
> com.redhat.rhn.taskomatic.task.RepoSyncTask  - Cause: null
>
> 2018-12-06 08:08:55,736 [DefaultQuartzScheduler_Worker-5] ERROR
> com.redhat.rhn.taskomatic.task.RepoSyncTask  - Stack
> trace:org.quartz.JobExecutionException: Command
> '[/usr/bin/spacewalk-repo-sync, --channel, epel7-x86_64, --type, yum]'
> exited with error code 137
>
>         at
> com.redhat.rhn.taskomatic.task.RhnJavaJob.executeExtCmd(RhnJavaJob.java:103)
>
>         at
> com.redhat.rhn.taskomatic.task.RepoSyncTask.execute(RepoSyncTask.java:70)
>
>         at
> com.redhat.rhn.taskomatic.task.RhnJavaJob.execute(RhnJavaJob.java:88)
>
>         at com.redhat.rhn.taskomatic.TaskoJob.execute(TaskoJob.java:186)
>
>         at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
>
>         at
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
>
>
>
> What is error code 137?
>
>
>
> Previously, I tried removing the schedule with the spacewalk-api (e.g. client.taskomatic.org.unscheduleBunch(key, 'repo-sync-1-130'), and creating it anew.  I’ve made sure that spacewalk-backend-2.8.60-1 is installed.  I’ve searched for any other ideas, but found none.  Help would be greatly appreciated.
>
>
>
> With thanks,
>
>
>
> Dimitri
>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20181212/8bfc081c/attachment.htm>


More information about the Spacewalk-list mailing list