Q2.1.15: How to debug a SXEmacs problem with a debugger
If SXEmacs does crash on you, one of the most productive things you
can do to help get the bug fixed is to poke around a bit with the
debugger. Here are some hints:
- First of all, if the crash is at all reproducible, consider very
strongly recompiling your SXEmacs with debugging symbols and with no
optimisation (e.g. with GCC use the compiler flags ‘-g -O0’ –
that’s an "oh" followed by a zero), and with the configure options
‘--debug=yes’ and ‘--error-checking=all’. This will make
your SXEmacs run somewhat slower, but you are a lot more likely to
catch the problem earlier (closer to its source). It makes it a lot
easier to determine what’s going on with a debugger.
- If it’s not a true crash (i.e., SXEmacs is hung, or a zombie
process), or it’s inconvenient to run SXEmacs again because SXEmacs is
already running or is running in batch mode as part of a bunch of
scripts, you may be able to attach to the existing process with your
debugger. Most debuggers let you do this by substituting the process ID
for the core file when you invoke the debugger from the command line, or
by using the
attach command or something similar.
- If you’re able to run SXEmacs under a debugger and reproduce the crash,
here are some things you can do:
- If SXEmacs is hitting an assertion failure, put a breakpoint on
- If SXEmacs is hitting some weird Lisp error that’s causing it to crash
(e.g. during startup), put a breakpoint on
declared static in eval.c.
- If SXEmacs is outputting lots of X errors, put a breakpoint on
x_error_handler(); that will tell you which call is causing them.
- Internally, you will probably see lots of variables that hold objects of
Lisp_Object. These are references to Lisp objects.
Printing them out with the debugger probably won’t be too
useful—you’ll likely just see a number. To decode them, do this:
where OBJECT is whatever you want to decode (it can be a variable,
a function call, etc.). This uses the Lisp printing routines to out a
readable representation on the TTY from which the sxemacs process was
- If you want to get a Lisp backtrace showing the Lisp call
stack, do this:
db has two disadvantages - they can only be
used with a running (including hung or zombie) sxemacs process, and they
do not display the internal C structure of a Lisp Object. Even if all
you’ve got is a core dump, all is not lost.
If you’re using GDB, there are some macros in the file
src/.gdbinit in the SXEmacs source distribution that should
make it easier for you to decode Lisp objects. This file is
automatically read by gdb if gdb is run in the directory where sxemacs
was built, and contains these useful macros to inspect the state of
Usage: pobj lisp_object
Print the internal C representation of a lisp object.
Usage: xtype lisp_object
Print the Lisp type of a lisp object.
Print the current Lisp stack trace.
Requires a running sxemacs process. (It works by calling the db
routine described above.)
Usage: ldp lisp_object
Print a Lisp Object value using the Lisp printer.
Requires a running sxemacs process. (It works by calling the dp
routine described above.)
Run temacs interactively, like sxemacs.
Use this with debugging tools (like purify) that cannot deal with
dumping, or when temacs builds successfully, but sxemacs does not.
Run the dumping part of the build procedure.
Use when debugging temacs, not sxemacs!
Use this when temacs builds successfully, but sxemacs does not.
Run the test suite. Equivalent to ’make check’.
Run the test suite on temacs. Equivalent to ’make check-temacs’.
Use this with debugging tools (like purify) that cannot deal with dumping,
or when temacs builds successfully, but sxemacs does not.
If you are using Sun’s dbx debugger, there is an equivalent file
src/.dbxrc, which defines the same commands for dbx.
- If you’re using a debugger to get a C stack backtrace and you’re seeing
stack traces with some of the innermost frames mangled, it may be due to
dynamic linking. (This happens especially under Linux.) Consider
reconfiguring with ‘--dynamic=no’. Also, sometimes (again under
Linux), stack backtraces of core dumps will have the frame where the
fatal signal occurred mangled; if you can obtain a stack trace while
running the SXEmacs process under a debugger, the stack trace should
- If you’re using a debugger to get a C stack backtrace and you’re
getting a completely mangled and bogus stack trace, it’s probably due to
one of the following:
- Your executable has been stripped. Bad news. Tell your sysadmin not to
do this—it doesn’t accomplish anything except to save a bit of disk
space, and makes debugging much much harder.
- Your stack is getting trashed. Debugging this is hard; you have to do a
binary-search type of narrowing down where the crash occurs, until you
figure out exactly which line is causing the problem. Of course, this
only works if the bug is highly reproducible. Also, in many cases if
you run SXEmacs from the debugger, the debugger can protect the stack
somewhat. However, if the stack is being smashed, it is typically the
case that there is a wild pointer somewhere in the program, often
quite far from where the crash occurs.
- If your stack trace has exactly one frame in it, with address 0x0, this
could simply mean that SXEmacs attempted to execute code at that
address, e.g. through jumping to a null function pointer.
Unfortunately, under those circumstances, GDB under Linux doesn’t know
how to get a stack trace.
Yes, this is the fourth Linux-related problem I’ve mentioned. I
have no idea why GDB under Linux is so bogus. Complain to the GDB
authors, or to comp.os.linux.development.system. Again, you’ll have
to use the narrowing-down process described above.
- You will get a Lisp backtrace output when SXEmacs crashes, so you’ll have