Notes
Slide Show
Outline
1
Hardware debugging presentation 
2
Don’t be scared of hardware only bugs
  • Everyone has horror stories
  •  full of embellished gory details.
  • They are like veterans talking about war wounds!
  • Lack of documentation about tools and techniques.
  • Further mystifies the black art.
  • There will be some difficult hardware only bugs.
  • But majority are quite easy to progress using systematic methods.



3
So
  • lets start with what you know already
  • The emulator and Metrowerks CodeWarrior
4
Using the emulator
  • What happens when a thread panics?
  • Breakpoint is hit. Causing the emulator to stop at the line that caused the failure.
  • A Source level call stack is shown.
  • objects and variables in all functions of the call stack can be examined


  • You are spoilt! Always try to reproduce problems on the emulator first - it is a good debugging environment.
5
What does a panic look like?
6
What does a panic look like?
  • Find the line of code which calls user::Panic
7
What does a panic look like?
  • Or an access violation
  • below - trying to call Cancel() on a NULL pointer

8
Tips
  • Make sure Just In Time debugging is enabled
  • Set the following registry value:

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug] "UserDebuggerHotKey"=dword:00000000 "Debugger"="\"C:\\apps\\Metrowerks\\bin\\IDE.exe\" -p %ld -e %ld" "Auto"="0"


  • also ensure that the following macro is removed from \epoc32\data\epoc.ini:

    JustInTime 0


  • Debug messages also appear in %Temp%epocwind.out
9
Tips
  • Enable Logging of System messages…
  • From the "Target Settings" panel, go to the "Debugger | Debugger Settings" options and tick the box labelled "Log System Messages“


10
What is a panic?
  • A panic is a Symbian term used to denote an unexpected exit of a thread
  • A thread is the unit of execution on Symbian OS
  • Processes must have at least one thread to begin executing code
  • A panic denotes a serious coding error.
  • Either by the caller of a function which has violated an API contract (e.g calling a function with invalid paramaters)
  • or that a object or memory structure has moved into a bad internal state causing an Invariant
  • Panics are helpful
  •  They aim to inform you about the exact nature of the problem during development
11
What does a panic look like?

      • TReal PercentageToDecimal(Tint aPercentage)
      • {
      • ASSERT__ALWAYS(aPercentage>=0 && aPercentage <=100), Panic( EInvalidInput)
      • TReal result = aPercentage/100;
      • ASSERT(result>=0.0 && result<=1.0);
      • return result;
        }
      12
      Call Stack
      • What is a call stack?
      13
      Call Stack
      • The cascade of function calling functions which resulted in the panic.
      • Shows some history of the current operation.
      • This often gives a pretty good idea of the chain of events leading to a panic.
      • Essential for tracking down problems and knowing where to put breakpoints.
      • Also a good way of identifying duplicate defects.
      14
      Debugging Memory Leaks
      15
      Using Hook Logger
      • Provides logging for:
      • memory allocations
      • process and thread creation
      • leaves
      • more in the future?
      • main use is for most pin-pointing the source of leaked memory
      • To use this tool you need to:
      • Install it on your machine: download from Symbian DevNet
      • Attach the hooks to EUSER.DLL
      • Run HookLogger.EXE
      • Run the code to be hooked
      16
      1. Attach the Hooks
      • Run “HookEUser.cmd” from the “emulator” drive.
      • x:\> HookEUser WINSCW
      • ‘x’ is the drive containing the epoc32 folder
      • Replaces EUSER with a hook "parasite" DLL
      • Undo by using the “-r” (remove) option
      17
      2. Start the UI
      • Run “HookLogger.exe”
      • Connection status shown in title bar
      • Set the options for monitoring heaps threads


      18
      3. Reproduce the leak
      • Start the emulator and reproduce the memory leak
      • the emulator will panic
      • Break into Codewarrior
      • Walk back up the stack to User::__DbgMarkEnd
      • take a note of leaked memory location (badCell) and thread id.

      19
      3. Reproduce the leak
      20
      4. Find the bad thread
      • Go to the Threads tab in the hook logger
      • find the thread that leaked memory

      21
      5. Show heap allocations
      • Right-click and select "Show allocations”
      • may take 10 to 20 seconds to respond
      22
      6. Find the bad allocation
      • Order list by “Ptr”
      • Find address indicated by “badCell” in part 3
      • Double click to get a nice callstack
      23
      Panics on Hardware
      24
      Hardware situation
      • What happens when a thread panics?
      • Either a panic dialog appears
      • or device reboots
      • No context is stored
      • Oh dear - No wonder it’s scary.
      • But you need to use tools to get the same information which emulator gives so easily.
      25
      Why are there two kinds?
      • Marking a thread or process as “system critical” means that it is an integral and essential part of the system
      • e.g. the file server
      • The thread or process is being declared necessary for correct functioning of the device
      • If a system critical thread exits or panics the device will reboot
      • This is why panics in some threads cause the device to reset
      26
      Here’s where it happens
      • \src\cedar\generic\base\e32\kernel\sthread.cpp
      • void DThread::Exit()
      • {
      • if (iExitType!=EExitKill && (iFlags & (KThreadFlagSystemPermanent|KThreadFlagSystemCritical)))
      • K::Fault(K::ESystemThreadPanic);
      • <snip>
      • }
      27
      Need some more information!
      • The most important information to get hold of
      • Which thread panicked/caused an access violation
      • What was the panic reason and number?
      • What was the callstack of the thread when it paniced?
      28
      Hardware Panics
      • There are two kinds
      • Application panic
      • Where a Panic dialog appears
      • not critical - device carries on working


      • System thread panic
      • critical - the device halts and resets.
      • Or possibly device may enter a special debug mode (called crash debugger or debug monitor)
      29
      Application panic
      • Dialog will tell you
      • Thread which panicked
      • Panic reason
      • What else do we need?
      • The call stack.

      • A tool called D_EXC can provide the call stack.
      30
      System panic
      • Must enable a tool called the debug monitor (or crash debugger) to get more info
      • Crash debugger tells you
      • which thread paniced
      • the category and number of the panic
      • where the stack for the paniced thread is located in memory
      • The crashdebugger can be coaxed to dump the callstack
      31
      Tackling a hardware panic
      32
      Use the OS Library to look up Panic codes
      • E.g if the dialog says “KERN-EXEC 3”.
      • Type in KERN-EXEC panic into the search
      • this will help you understand what to look for in code
      33
      Useful call stacks from Hardware
      • To get a useful call stack two things are always needed.
      • A hex dump of the memory used by the stack of the thread which paniced
      • A ROM symbol file for the software flashed onto the device.
      • With this information a Symbian perl script can decode a human readable call stack  ( similar to the call stack seen in the emulator).


      34
      How do I get a call stack?
      Application panic
      • run d_exc tool on the device first
      • reproduce the panic.
      • d_exc dialog  pops up
      • telling you some information about the panic. Press OK to save the stack to disk.
      • d_exc will have dumped 2 files to disk
      • a binary .stk file containing the thread’s stack
      • a .txt file detailing the panic code and category
      • get those files onto a PC and have your symbol file at hand.
      35
      How do I get a call stack?
      Application panic
      36
      Stack.txt
      • Open the output in notepad.
      • Do a find for “>>>>”.
      • This takes you to the top of the decoded stack!


      •  >>>> current stack pointer >>>>


      • r00=80007204 00000000 80000368 80000003
      • r04=00801bb0 00000001 00000000 00802bc4
      • r08=00000002 50340f15 00802bc4 00000000
      • r12=8041b36c 00801bb0 50160ff8 5000b34c
      • PC = 5000b34c L..P  __ArmVectorSwi(void) + 0x124
      • LR = 50160ff8 ...P  SvSendReceive(int, void *) + 0x1c


      •  >>>> current stack pointer >>>>
      37
      What next
      • Scroll down the text. Sometimes you may see this familiar finger print for a panic:


      •  >>>> current stack pointer >>>>
      • r00=80007204 00000000 80000368 80000003
      • r04=00801bb0 00000001 00000000 00802bc4
      • r08=00000002 50340f15 00802bc4 00000000
      • r12=8041b36c 00801bb0 50160ff8 5000b34c
      • PC = 5000b34c L..P  __ArmVectorSwi(void) + 0x124
      • LR = 50160ff8 ...P  SvSendReceive(int, void *) + 0x1c
      •  >>>> current stack pointer >>>>


      • 1bb0  80000001 ....
      • 1bb4  00000082 ....
      • 1bb8  50161018 ...P  SvSendReceiveCheck(int, void *) + 0x8
      • 1bbc  5016594c LY.P  RThread::Panic(TDesC16 const &, int) + 0x24
      • 1bc0  ffff8001 ....
      • 1bc4  00000082 ....
      • 1bc8  00801bdc ....  Stack + 0x1bdc
      • 1bcc  0000003c <...
      • 1bd0  50162024 $ .P  User::Panic(TDesC16 const &, int) + 0x24
      • 1bd4  ffff8001 ....
      • 1bd8  5016ce20  ..P  Panic(TCdtPanic) + 0x24
      • 1bdc  10000004 ....
      • 1be0  50178000 ...P  TUnicode::CjkWidthFoldTable + 0x5408
      38
      And then?
      • Look at all the functions that follow
      • In my case, after cutting out the lines that looked garbled. I got:


      • 1e18  50650031 1.eP  CBaLockChangeNotifier::DoRunL(void) + 0x5d
      • 1e24  5064fdef ..dP  RBaBackupSession::GetBacukupOperationEvent(..
      • 1e5c  5064ff65 e.dP  CBaLockChangeNotifier::RunL(void) + 0x19


      • That was enough to tell me to look at the code for DoRunL(), and to put some logging in there to see what is going on.


      • That’s the basics for d_exc
      39
      But what about system panics?
      • Same idea - we want to get panic reason and call stack:
      • Firstly get the base porting people to show you how to enable crash debugger build.
      • Reproduce the problem - If the device enters crash debugger - then you can get more information.
      • You use a terminal program on the pc to “talk” the the crashed device.
      40
      What do I ask it?
      • Same as always
      • Which thread caused a panic or access violation
      • What is the panic reason and number
      • What is the callstack of the thread
      41
      Connect the crash debugger
      • Launch terminal emulator (e.g. hyperterm) on your PC
      • Connect serial port to serial port which provides debug tracing
      • The terminal window should show a “password” prompt
      • Type in “replacement” and you have entered the debug monitor prompt
      • the kernel is frozen allowing you to interrogate it’s current state
      42
      Find the fault
      • Type ‘f’ into the crash debugger to get the Fault information
      • If the category is KERN 4 then you are in business.
      • KERN 4 simply says that a panic happened in a system thread
      • The actual panic, such as KERN-EXEC3, is hidden
      • Type ‘i’ into the crash debugger to get information about the real panic reason
      • Sometimes even this doesn’t work - a non-system critical thread which crashes can cause the process to exit if it is process critical e.g. the main thread
      • If another thread in the process is marked as system critical this will take down the platform.
      • A fool proof method of finding the real panic is to look at the output from the KPanic debug tracing


      43
      Find the fault
      • Type ‘r’ to get the values of all the registers
      • the ones to look at depends on the processor mode!
      • Type ‘c0’ to get the details of all the threads
      • ‘C0’ will pause between each screen full

      44
      Some background - APCS
      • The ARM Procedure Calling Standard (APCS)
      • Imposes conventions on the use of registers
      • So we always know the important registers to look at

      45
      Finding the panicked thread
      • If you have KPanic debug tracing enabled use that to identify the panic


      • RLibrary::Load - aFileName: BMPANSRV.DLL, -aPath:  threadName: Wserv
      • RLibrary::Load - OK
      • RLibrary::Load ......1
      • RLibrary::Load ......2
      • RLibrary::Load ......3
      • RLibrary::Load ......4
      • RLibrary::Load ......5
      • RLibrary::Load Init() - OK
      • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
      •  R0=00614a40  R1=806a21d7  R2=006029c4  R3=006029c4
      •  R4=0060f448  R5=0060f4c8  R6=006126b8  R7=0060f484
      •  R8=00000012  R9=00000040 R10=c8087d78 R11=00000000
      • R12=8009fced R13=004060e0 R14=8108b0f8 R15=8144710c
      • R13Svc=c924c000 R14Svc=80020108 SpsrSvc=00000010
      • Thread 37, KernCSLocked=0
      • FAULT: KERN 00000004
      • Password: replacement
      46
      What do all those numbers mean?
      • The type of exception or why the processor was unhappy
      • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
      47
      What do all those numbers mean?
      • The processor mode or which registers are valid
      • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

      48
      What do all those numbers mean?
      • The Fault Address Register (FAR) indicates the dodgy address that was accessed
      • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801

      • Least significant 4 bits of the Fault Status Register (FSR) indicates the MMU fault
      • Exc 1 Cpsr=20000010 FAR=0060fa6d FSR=00000801
      49
      Finding the panicked thread
      • The “i” command gives you a lot more information, but all you are interested in is finding a fingerprint.


        • <snip>
        • THREAD at c8084ef0 VPTR=00000000 AccessCount=6 Owner=c80848a8
        • Full name apprun.exe::Calcsoft
        • Thread MState READY
        • Default priority 16 WaitLink Priority 16
        • ExitInfo 2,3,KERN-EXEC
        • Flags 00000002, Handles c8084a70
        • Supervisor stack base c9208000 size 4000
        • User stack base 00402000 size 5000
        • Id=29, Alctr=00600000, Created alctr=00600000, Frame=00406e1c
        • <snip>
        • R13_USR 8005e414 R14_USR 000002a8 SPSR_SVC c8084ef0
        •  R4 c8085198  R5 00000000  R6 00000000  R7 00000001
        •  R8 00000000  R9 8005e834 R10 000002a8 R11 c8085198
        •  PC 8005e81c


        • TheCurrentProcess=c80848a8
        • PROCESS at c80848a8 VPTR=00000000 AccessCount=7 Owner=00000000
        • Full name apprun.exe
        • ExitInfo 3,0,
        • <snip>
        50
        What do all those numbers mean?
        • The Exit Type
        • ExitInfo 2,3,KERN-EXEC


        51
        What next? - The Program counter
        • From the same information as previous page
        •  look at R15 and copy that number.
        •  This is the PC - address of the last instruction to execute in the thread which panicked
        • Be careful to get the right version depending on the “mode” of the processor

        • Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
        •  R0=00600080  R1=00405ee0  R2=08cc014c  R3=00000000
        •  R4=00619ae8  R5=00000000  R6=00ffffff  R7=ffffffff
        •  R8=00000012  R9=00000040 R10=c808a2e0 R11=00000000
        • R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
        • R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
        • Thread 37, KernCSLocked=0
        • Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
        • Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
        • Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup -1 nest 0
        • Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
        • Exec::ThreadId
        • Exec::SemaphoreWait
        • Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
        • FAULT: KERN 00000004
        52
        Program counter
        • Lookup the PC to find the “top” of the callstack
        • Either look at the symbol file directly or use printsym to decode the address
        • You were probably half way through a function
        • so you may have to look for the closest match, e.g. where R15=80728562

        • 80728498    0000    CAknViewAppUi::~CAknViewAppUi__sub_object()  avkon.in(.text)
        • 80728584    0010    CAknViewAppUi::~CAknViewAppUi__deallocating()  avkon.in(.text)

        • R14 (the link register – lr) may also give you a clue
        • e.g. where R14=810b1b59

        • 810b1b56    001c    CCoeEnv::CreateResourceReaderLC(TResourceReader&, int) const  CONE.in(.text)
        • 810b1b72    0074    CCoeEnv::ReadResourceAsDes16(TDes16&, int) const  CONE.in(.text)
        53
        What next? - The call stack
        • From the same information as previous page
        •  look at R13 and copy that number.
        •  This is the address of the stack.

        • Exc 1 Cpsr=48000030 FAR=01000003 FSR=00000001
        •  R0=00600080  R1=00405ee0  R2=08cc014c  R3=00000000
        •  R4=00619ae8  R5=00000000  R6=00ffffff  R7=ffffffff
        •  R8=00000012  R9=00000040 R10=c808a2e0 R11=00000000
        • R12=800c3a35 R13=00405ee0 R14=810b1b59 R15=80728562
        • R13Svc=c924c000 R14Svc=80020234 SpsrSvc=08000010
        • Thread 37, KernCSLocked=0
        • Thread eiksrvs.exe::!EikAppUiServer Die: 2,3,KERN-EXEC
        • Thread eiksrvs.exe::!EikAppUiServer SetDefaultPriority 16
        • Thread eiksrvs.exe::!EikAppUiServer SetRequiredPriority def 16 cleanup -1 nest 0
        • Thread eiksrvs.exe::!EikAppUiServer MState 2 SetActualPriority 16
        • Exec::ThreadId
        • Exec::SemaphoreWait
        • Thread eiksrvs.exe::!EikAppUiServer Panic KERN-EXEC 3
        • FAULT: KERN 00000004
        54
        Yuk! Hex
        • All you need to do now is
        • Type command M. into the crash debugger with the address of the stack from R13
        • and take dump about 200 bytes of stack - that should be plenty.
        • You can dump the stacks of all threads by using the ‘S’ command
        • Command is…
        • m 00405ee0+200
        55
        More hex!
        • That will dump some HEX and text to your terminal:


          • 00405ee0: 00 00 00 00 00 00 00 00 44 04 77 80 c6 56 00 10 ........D.w..V..
          • 00405ef0: 4c 01 cc 08 e8 9a 61 00 40 04 77 80 50 2a 60 00 [email protected]*`.
          • 00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00 .....xr...`.....
          56
          Decoding the data using printsym
          • Type the following into a windows command prompt :
          57
          Warning: stack overflow == KE3
          • Always be aware to check whether you’re suffering from a stack overflow
          • A stack overflow will cause unexplainable   KERN-EXEC 3 errors
          • If you can’t get hold of the stack (you see a line like that shown below), it may indicate a stack overflow


          • .m 00414ff8 00415fff
          • Exception: Type 1 Code 80073280 Data 00414ff8 Extra 00000007
          58
          Checking for stack overflow
          • You need the value of R13 and the thread id
          • Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
          •  R0=00415190  R1=00415190  R2=800e19bc  R3=00000038
          •  R4=7fffffff  R5=00000000  R6=00415028  R7=00000100
          •  R8=00000000  R9=00000040 R10=c808a2e8 R11=00000000
          • R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
          • R13Svc=c931c000 R14Svc=8002014c SpsrSvc=08000010
          • Thread 58, KernCSLocked=0
          • Look up the details of the thread from the output of the ‘i’ or ‘c0’ commands to get the stack base
          • THREAD at c809ae88 VPTR=00000000 AccessCount=3 Owner=c8089e30
          • Full name eiksrvs.exe::KeySoundServerThread
          • <snip>
          • Supervisor stack base c9318000 size 4000
          • User stack base 00415000 size 1000
          • Id=58, Alctr=00600000, Created alctr=00600000, Frame=00415bd4
          • <snip>


          59
          Checking for stack overflow
          • The stack has overflowed if R13 < stack base
          • Plus: the exception id will indicate a data abort


          • Exc 1 Cpsr=68000030 FAR=00414fe8 FSR=00000807
          • R12=800c3699 R13=00414ff8 R14=800c8d43 R15=800c8a1c
          • User stack base 00415000 size 1000




          60
          Decoding the stack dump output
          • The output in will be similar to the decoded d_exc stack except
          • The top of printout represents the function which called Panic() (with d_exc you have to find the top)
          • So your output may start with something like this:


          • 00405f00: ff ff ff ff e3 78 72 80 00 00 60 00 18 00 00 00 .....xr...`.....


          • = ffffffff ....
          • = 807278e3 .xr.  CAknNoteAttributes::ConstructFromResourceL(TResourceReader&)  avkon.in(.text) + 0x1a1
          • = 00600000 ..`.
          • = 00000018 ....


          • 00405f10: 00 00 60 00 a1 40 0d 80 a8 60 40 00 ce 7b 61 00 ..`[email protected]`@..{a.


          • = 00600000 ..`.
          • = 800d40a1 [email protected]  RHeap::Alloc(int)                         euser.in(.text) + 0x8b
          • = 004060a8 .`@.
          • = 00617bce .{a.
          61
          What if the program is not in ROM?
          • D_EXC knows how to decode RAM based symbols also
          • D_EXC .txt file, lists any DLLs which were loaded into RAM (hence not present in the ROM symbol file
          • You have to place the .MAP file of every RAM DLL you are interested in into the same directory as the d_exc trace
          •  Run printstk.pl as usual, and it should pick up the addresses correctly
          62
          Anything else?
          • Now that you know the thread, panic code and have the stack for both application panics and system panics:
          • It gives you a good idea of what functions to put logging in
          • It quickly allows you to see if a defect is a duplicate (if the callstack has already been posted on a previous defect)


          63
          Debugging on hardware is hard?
          • But with practice it’s a systematic method - not a black art
          • Application and System thread panics cover 90% of application side hardware crashes
          • So learn how to diagnose these first, then worry about more advanced debugging facilities
          • Print out the documentation as you work, it will help you with other kinds of problems and is a good reference
          • Read the other debugging guides to help you understand what is going on under the hood
          64
          Tips
          • Try these techniques out on a panic you put in the code yourself
          • so that you are confident about the result
          • make sure the call stack matches what you know to be the problem
          • When looking into a defect. It is often enough to find the component which paniced
          • This is what triage may do
          • The component owner may then be able to take over apply some knowledge and logging etc
          • Sometimes it can be helpful to make every thread a system thread
          • so all panics go to the debug monitor

          65
          ROM Symbol file format
          • 5055ced4    004c    CMmPhoneTsy::UpdatePhoneIndicator(RMobileCall::TMobileCallStatus
          • 5055cf20    0054    CMmPhoneTsy::UpdatePhoneIndicator(RMobilePhone::TMobilePhoneRegistr
          • 5055cf74    001c    CMmPhoneTsy::GetSubscriberIdL(TBuf<15> &)
          • 5055cf90    0038    CMmPhoneTsy::CompleteReadNamData(int, TPtrC8)
          • 5055cfc8    0058    CMmPhoneTsy::CompleteProductInfoNumId(TBuf8<50> &)