[Crash-utility] Fix in bt for ARM64

Karlsson, Jan Jan.Karlsson at sonymobile.com
Fri May 22 09:11:23 UTC 2015


Hi Dave

I have had the time to look closer at the problem with symbols in modules that I reported some time ago. I now know why the problem occurs and have some idea how to solve it. However my knowledge in this area is limited so you really have to review my code change.

The module that causes the problem when it is loaded contains a symbol the is "per_cpu". Its address is higher than any symbol in any module, so the search in value_search_module in symbols.c is terminated too early.

The test below in the code from value_search_module solves the problem in my example.

    *  splast will contain the last module symbol encountered.
    *  Note: "__insmod_"-type symbols will be set in splast only 
    *  when they have unique values.
    */
    splast = NULL;
    for ( ; sp <= sp_end; sp++) {

      if (IN_MODULE_PERCPU(sp->value,lm) &&
          !IN_MODULE_PERCPU(value,lm)) continue;       // ADDED

      if (value == sp->value) {
        if (MODULE_END(sp) || MODULE_INIT_END(sp))
          break;

Jan

Jan Karlsson
Senior Software Engineer
System Assurance

Sony Mobile Communications
Tel: +46 703 062 174
jan.karlsson at sonymobile.com

sonymobile.com



-----Original Message-----
From: crash-utility-bounces at redhat.com [mailto:crash-utility-bounces at redhat.com] On Behalf Of Dave Anderson
Sent: den 12 maj 2015 15:28
To: Discussion list for crash utility usage, maintenance and development
Subject: Re: [Crash-utility] Fix in bt for ARM64



----- Original Message -----
> Thanks Dave,
> 
> I did not look close enough on this issue and you are quite right that 
> my fix has no effect. After some more testing I have found that the 
> problem has to do with one of the modules.
> 
> In the core file I investigate the kernel has six modules, where the 
> wlan module (where the strange names occur) is the last one. If I have 
> loaded the third module in the mod list, with "mod -s" or "mod -S" 
> then I get the strange printouts in bt. If that module is not loaded, 
> independently if other modules are loaded or not, then the bt printout is correct.
> 
> I understand fully that this will be very difficult to investigate 
> without the possibility to run and debug the example. I have tried to 
> do that but have not found anything useful so far. Any hints what to 
> look for? I will also try to understand if there is anything specific with the module itself.

I looked at the data that you sent me offline, but nothing stands out as a smoking gun.

If you take "bt" out of the picture and just run "sym <wlan-module-address>", you should see the same symptom, where before the third module is loaded, it finds the correct symbol name of the <wlan-module-address> OK, but fails to do so after the third module is loaded.  cmd_sym() will call value_search(), which should call value_search_symbol(), and in that function you will be able to see it cycling through the modules, and at least get an idea as to why the wlan module addresses are being prematurely found in another module's symbol list.

Dave
  

> 
> I will by the way be out of office up to next Monday mainly due to 
> national holidays here in Sweden.
> 
> Jan Karlsson
> Senior Software Engineer
> System Assurance
> 
> Sony Mobile Communications
> Tel: +46 703 062 174
> jan.karlsson at sonymobile.com
> 
> sonymobile.com
> 
> 
> 
> -----Original Message-----
> From: crash-utility-bounces at redhat.com 
> [mailto:crash-utility-bounces at redhat.com] On Behalf Of Dave Anderson
> Sent: den 11 maj 2015 17:49
> To: Discussion list for crash utility usage, maintenance and 
> development
> Subject: Re: [Crash-utility] Fix in bt for ARM64
> 
> 
> 
> ----- Original Message -----
> > 
> > 
> > Hi Dave
> > 
> > 
> > 
> > I found an ARM64 problem for bt when a function belongs to a module.
> > 
> > 
> > 
> > Printout before fix given below:
> > 
> > #16 [ffffffc0be96f8d0] __this_module at ffffffbffc15a2f8 [wlan]
> > #17 [ffffffc0be96f9b0] __this_module at ffffffbffc161b18 [wlan]
> > #18 [ffffffc0be96f9c0] __this_module at ffffffbffc16033c [wlan]
> > #19 [ffffffc0be96fa10] __this_module at ffffffbffc1630f8 [wlan]
> > #20 [ffffffc0be96fab0] __this_module at ffffffbffc156ff8 [wlan]
> > #21 [ffffffc0be96faf0] __this_module at ffffffbffc15aa58 [wlan]
> > #22 [ffffffc0be96fb20] __this_module at ffffffbffc15bfc8 [wlan]
> > #23 [ffffffc0be96fb60] __this_module at ffffffbffc115fac [wlan]
> > #24 [ffffffc0be96fb90] tasklet_action at ffffffc000223738
> > #25 [ffffffc0be96fbb0] __do_softirq at ffffffc000222e94
> > 
> > 
> > 
> > Printout after fix:
> > 
> > #16 [ffffffc0be96f8d0] dhd_bus_rx_frame at ffffffbffc15a2f8 [wlan]
> > #17 [ffffffc0be96f9b0] dhd_update_flow_prio_map at ffffffbffc161b18 
> > [wlan]
> > #18 [ffffffc0be96f9c0] dhd_update_flow_prio_map at ffffffbffc16033c 
> > [wlan]
> > #19 [ffffffc0be96fa10] dhd_prot_process_ctrlbuf at ffffffbffc1630f8 
> > [wlan]
> > #20 [ffffffc0be96fab0] dhd_bus_ringbell at ffffffbffc156ff8 [wlan]
> > #21 [ffffffc0be96faf0] dhd_bus_console_in at ffffffbffc15aa58 [wlan]
> > #22 [ffffffc0be96fb20] dhd_bus_dpc at ffffffbffc15bfc8 [wlan]
> > #23 [ffffffc0be96fb60] dhd_sched_dpc at ffffffbffc115fac [wlan]
> > #24 [ffffffc0be96fb90] tasklet_action at ffffffc000223738
> > #25 [ffffffc0be96fbb0] __do_softirq at ffffffc000222e94
> > 
> > 
> > 
> > From arm64.c:
> > 
> > 
> > 
> > static int
> > 
> > arm64_print_stackframe_entry(struct bt_info *bt, int level, struct 
> > arm64_stackframe *frame)
> > 
> > {
> > 
> > char *name, *name_plus_offset;
> > ulong symbol_offset;
> > struct syment *sp;
> > struct load_module *lm;
> > char buf[BUFSIZE];
> > 
> > name = closest_symbol(frame->pc);
> > name_plus_offset = NULL;
> > 
> > if (bt->flags & BT_SYMBOL_OFFSET) {
> > /*ADDED*/
> > if (module_symbol(frame->pc, NULL, &lm, NULL, 0)) sp = 
> > value_search_module(frame->pc, &symbol_offset); else /*END ADDED*/ 
> > sp = value_search(frame->pc, &symbol_offset);
> > 
> 
> Hi Jan,
> 
> I don't dispute that this is something to be fixed, but at the same 
> time I don't quite understand (1) why it's happening, and (2) how your 
> fix addresses it?
> 
> The value_search() function does this if it's a module address:
> 
> struct syment *
> value_search(ulong value, ulong *offset) { ...
>         if (IS_VMALLOC_ADDR(value))
>                 goto check_modules;
> 
> ...
> check_modules:
>         sp = value_search_module(value, offset);
> 
>         return sp;
> }
> 
> And even if IS_VMALLOC_ADDR() above fails, it should just fail to find 
> it in the base kernel symbols, and fall through to the value_search_module() call.
> 
> Does something different happen in your case?
> 
> I also note that in all cases "__this_module" is in the "(d)" section 
> of each module, and typically is the last/highest symbol value of the 
> module.  So I'm confused as to how it's getting picked up as the 
> closest value to all of the different text addresses in the wlan module?
> 
> What does "sym -m wlan" look like?
> 
> Thanks,
>   Dave
> 
> 
> 
> > 
> > You probably also want to prevent calling module_symbol a second 
> > time later in the function.
> > 
> > 
> > 
> > Jan
> > 
> > 
> > 
> > Jan Karlsson
> > 
> > Senior Software Engineer
> > 
> > System Assurance
> > 
> > 
> > 
> > Sony Mobile Communications
> > 
> > Tel: +46 703 062 174
> > 
> > jan.karlsson at sonymobile.com
> > 
> > 
> > 
> > sonymobile.com
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility at redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility




More information about the Crash-utility mailing list