[Crash-utility] Inaccurate timestamps in log -T output
HAGIO KAZUHITO(萩尾 一仁)
k-hagio-ab at nec.com
Wed Sep 22 08:42:48 UTC 2021
Hi Martin,
Thank you for the detailed report and interesting idea.
The "log -T" option imitates the "dmesg -T" command, and it also
looks inaccurate in nature and has a warning in its help text.
# systemctl restart psacct ; date
Wed Sep 22 17:30:49 JST 2021
# dmesg -T | tail -n 1
[Wed Sep 22 17:30:22 2021] Process accounting resumed
# dmesg --help | grep -- -T
-T, --ctime show human-readable timestamp (may be inaccurate!)
Maybe at least we should add a similar warning to the help page.
Personally I would prefer users to use it with the acknowledgement
of that inaccuracy like dmesg -T, rather than doing a crash specific
adjustment..
Thanks,
Kazu
-----Original Message-----
> From: crash-utility-bounces at redhat.com <crash-utility-bounces at redhat.com> On Behalf Of Moore, Martin
> (Linux ERT)
> Sent: Tuesday, September 21, 2021 1:58 AM
> To: Discussion list for crash utility usage, maintenance and development <crash-utility at redhat.com>
> Subject: [Crash-utility] Inaccurate timestamps in log -T output
>
> Hello,
>
>
>
> As the current maintainer of the PyKdump extension, I have been investigating a reported discrepancy between
> the timestamps generated by crash’s log -T and PyKdump’s tslog command, which provides the same functionality
> of timestamping the kernel log entries. What I have found is that *both* log -T and tslog generate some
> inaccurate timestamps on vmcores with more than a small amount of uptime, but in different ways.
> Specifically, log -T timestamps are accurate at boot time but become less accurate later in the log, while
> tslog timestamps are just the opposite: they’re accurate around crash time but less accurate earlier in
> the log.
>
>
>
> Here’s an example from a panic on a kernel 3.10.0-957.5.1 system with a moderate uptime:
>
>
>
> DATE: Tue Jul 6 09:50:58 +03 2021
>
> UPTIME: 32 days, 16:29:29
>
> PANIC: "BUG: unable to handle kernel "
>
>
>
> Crash correctly computes the boot time based on the date (crash time) and uptime shown above. From “help
> -k”:
>
>
>
> boot_date: Thu Jun 3 17:21:29 +03 2021
>
>
>
> log -T shows the correct timestamps of the boot messages:
>
>
>
> [Thu Jun 3 17:21:29 +03 2021] Initializing cgroup subsys cpuset
>
> [Thu Jun 3 17:21:29 +03 2021] Initializing cgroup subsys cpu
>
> [Thu Jun 3 17:21:29 +03 2021] Initializing cgroup subsys cpuacct
>
>
>
> However, the timestamps of the BUG and surrounding messages are off by 150 seconds or 2.5 minutes (09:53:28,
> but the crash occurred at 09:50:58):
>
>
>
> [Tue Jul 6 09:53:28 +03 2021] BUG: unable to handle kernel
>
> [Tue Jul 6 09:53:28 +03 2021] NULL pointer dereference at (null)
>
> [Tue Jul 6 09:53:28 +03 2021] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40
>
>
>
> For comparison, tslog has the correct timestamps around the crash:
>
>
>
> 2021-07-06 09:50:58 [2824319.188837] BUG: unable to handle kernel
>
> 2021-07-06 09:50:58 [2824319.193440] NULL pointer dereference at (null)
>
> 2021-07-06 09:50:58 [2824319.197853] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40
>
>
>
> But it’s off by the same 2.5-minute interval during the boot messages (17:18:59, but it should be 17:21:29):
>
>
>
> 2021-06-03 17:18:59 [ 0.000000] Initializing cgroup subsys cpuset
>
> 2021-06-03 17:18:59 [ 0.000000] Initializing cgroup subsys cpu
>
> 2021-06-03 17:18:59 [ 0.000000] Initializing cgroup subsys cpuacct
>
>
>
> The reasons for these inaccuracies are:
>
>
>
> 1. log -T computes boot time as (crash time minus uptime), and then adds the seconds part of the time
> stored in each log entry to generate the entry’s timestamp.
>
>
>
> 2. tslog computes a base time as (crash time minus seconds part of the final log entry) and then adds
> the seconds part of the time stored in each log entry.
>
>
>
> 3. The key to the problem: printk() stores the log entry timestamp based on a call to local_clock(),
> which provides the raw time in the local CPU, which is not adjusted by NTP. As such, inaccuracy accumulates
> as the clock drifts over time; in the example above, the discrepancy has reached 150 seconds by the time
> of the crash.
>
>
>
> I have seen this now in numerous vmcores from kernels 3.0 through 4.12, which is the latest version I’ve
> checked. The discrepancy exists in all of them with more than a few hours of uptime and can be positive
> or negative. In one vmcore it was over 13 hours after 239 days of uptime.
>
>
>
> From the PyKdump perspective, I could live with the current tslog behavior because it’s accurate for log
> entries near the crash time, which are generally the most useful for crash analysis. But in an attempt
> to improve it, I’m testing an experimental version of tslog that smooths out the discrepancy. It computes
> the boot time the same way that crash does (crash time minus uptime) and a scale factor for the time discrepancy
> equal to (uptime)/(seconds part of final log entry). Then each log entry’s timestamp is computed as (boot
> time) + (seconds part of log entry)*(scale factor). This version results in correct timestamps at both
> ends of the log:
>
>
>
> 2021-06-03 17:21:29 [ 0.000000] Initializing cgroup subsys cpuset
>
> 2021-06-03 17:21:29 [ 0.000000] Initializing cgroup subsys cpu
>
> 2021-06-03 17:21:29 [ 0.000000] Initializing cgroup subsys cpuacct
>
> :
>
> 2021-07-06 09:50:58 [2824319.188837] BUG: unable to handle kernel
>
> 2021-07-06 09:50:58 [2824319.193440] NULL pointer dereference at (null)
>
> 2021-07-06 09:50:58 [2824319.197853] IP: [<ffffffff91cd6e66>] scsi_uninit_cmd+0x26/0x40
>
>
>
> This is something of a hack since there’s no guarantee that the clock will drift uniformly during the system
> uptime, so timestamps in the middle of the log may still be somewhat inaccurate (but probably more accurate
> than without this adjustment).
>
>
>
> Regards,
>
> Martin
>
>
>
>
>
> Explore the HPE digital support experience: https://support.hpe.com <https://support.hpe.com>
>
>
>
> Martin Moore
> Linux Engineering Resolution Team
>
> HPE Pointnext Services
> Hewlett Packard Enterprise
>
> Martin.Moore at hpe.com <mailto:Martin.Moore at hpe.com>
> 8AM-5PM EDT (UTC-4) Monday-Friday
> Manager: Becky McBride (becky.mcbride at hpe.com <mailto:becky.mcbride at hpe.com> )
>
>
More information about the Crash-utility
mailing list