Parallel Debugging Examples

Example 1

is a parallel debugging session started by mpirun.

 

% mpirun -dbg=idb -np 8 cpi

Intel(R) Debugger for ItaniumŪ-based applications, Version XX

Reading symbolic information ...done

stopped at [void* MPIR_Breakpoint(void):101 0x40000000000b3060]

    101 {

Process has exited

(idb)

   [0:7] Intel(R) Debugger for ItaniumŪ-based applications, Version XX

   [0:7] ------------------

   [0:7] object file name: /home/user/examples/cpi

   [0:7] Reading symbolic information ...   [0:7] done

%1 [0:7] Attached to process id [30596;30636]  ....

   [1:7] stopped at [ 0x20000000001ef962]

   [0] stopped at [void* MPIR_Breakpoint(void):101 0x40000000000b3060]

   [0]     101 {

(idb)

   [0:7] stopped at [int main(int, char**):20 0x4000000000003520]

   [0:7]      20     MPI_Init(&argc,&argv);

(idb)

   [0:7]      16     double startwtime = 0.0, endwtime;

   [0:7]      17     int  namelen;

   [0:7]      18     char processor_name[MPI_MAX_PROCESSOR_NAME];

   [0:7]      19

   [0:7] >    20     MPI_Init(&argc,&argv);

   [0:7]      21     MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

   [0:7]      22     MPI_Comm_rank(MPI_COMM_WORLD,&myid);

   [0:7]      23     MPI_Get_processor_name(processor_name,&namelen);

   [0:7]      24

(idb) stop in f

(idb)

   [0:7] [#1: stop in double f(double) ]

(idb) focus [0:3]

[0:3]>

[0:3]>  cont

[0:3]>  Process 3 on nht6005.spt.intel.com

Process 2 on nht6005.spt.intel.com

Process 0 on nht6005.spt.intel.com

Process 1 on nht6005.spt.intel.com

   [0:3] [1] stopped at [double f(double):7 0x4000000000003390]

   [0:3]       7 {

[0:3]>  where

[0:3]>

   [0:3] >0  0x4000000000003390 in f(a=<no value>) "cpi.c":7

%2 [0:3] #1  0x4000000000003a30 in main(argc=0, argv=0x[0;80000fffffffba7c]) "cpi.c":51

   [0:3] #2  0x20000000000906b0 in /lib/libc.so.6.1

   [0:3] #3  0x4000000000003220 in _start(...) in /home/user/examples/cpi

[0:3]>  focus [4:7]

[4:7]>

[4:7]>  cont

[4:7]>  Process 7 on nht6005.spt.intel.com

Process 4 on nht6005.spt.intel.com

Process 6 on nht6005.spt.intel.com

Process 5 on nht6005.spt.intel.com

   [4:7] [1] stopped at [double f(double):7 0x4000000000003390]

   [4:7]       7 {

[4:7]>  where

[4:7]>

   [4:7] >0  0x4000000000003390 in f(a=<no value>) "cpi.c":7

%3 [4:7] #1  0x4000000000003a30 in main(argc=0, argv=0x[0;80000fffffffba7c]) "cpi.c":51

   [4:7] #2  0x20000000000906b0 in /lib/libc.so.6.1

   [4:7] #3  0x4000000000003220 in _start(...) in /home/user/examples/cpi

[4:7]>  focus [*]

[0:7]>

[0:7]>  next

[0:7]>

   [0:7] stopped at [double f(double):8 0x40000000000033b1]

   [0:7]       8     return (4.0 / (1.0 + a*a));

[0:7]>  where

[0:7]>

%4 [0:7] >0  0x40000000000033b1 in f(a=[0.0050000000000000001;0.074999999999999997]) "cpi.c":8

%5 [0:7] #1  0x4000000000003a30 in main(argc=1, argv=0x[80000fffffffb768;6000000000014a50]) "cpi.c":51

   [0:7] #2  0x20000000000906b0 in /lib/libc.so.6.1

   [0:7] #3  0x4000000000003220 in _start(...) in /home/user/examples/cpi

[0:7]>  show aggregated message

%1 [0:7] Attached to process id [30596;30636]  ....

%2 [0:3] #1  0x4000000000003a30 in main(argc=0, argv=0x[0;80000fffffffba7c]) "cpi.c":51

%3 [4:7] #1  0x4000000000003a30 in main(argc=0, argv=0x[0;80000fffffffba7c]) "cpi.c":51

%4 [0:7] >0  0x40000000000033b1 in f(a=[0.0050000000000000001;0.074999999999999997]) "cpi.c":8

%5 [0:7] #1  0x4000000000003a30 in main(argc=1, argv=0x[80000fffffffb768;6000000000014a50]) "cpi.c":51

[0:7]>

[0:7]>  expand aggregated message 1

%1 [0:7] Attached to process id [30596;30636]  ....

 [3] Attached to process id 30612  ....

 [2] Attached to process id 30606  ....

 [0] Attached to process id 30596  ....

 [1] Attached to process id 30600  ....

 [4] Attached to process id 30618  ....

 [5] Attached to process id 30624  ....

 [7] Attached to process id 30636  ....

 [6] Attached to process id 30630  ....

[0:7]>  disable 1

[0:7]>

[0:7]>  cont

[0:7]>  pi is approximately 3.1416009869231249, Error is 0.0000083333333318

wall clock time = 69.300781

   [0:7] Process has exited with status 0

[0:7]>  quit

 

The following are explanatory notes from the previous example:

Component of Example

Meaning

-np 8 This parallel session creates 8 processes.
[0:7] This is a message from processes 0 to 7.
%1 This aggregated message contains messages with differing portions (in this case, the process id's are different from process to process), and 1 is the message id.
focus [0:3] This focus command sets the current process set to include processes 0, 1, 2, and 3.
[0:3]> This prompt shows the current process set.
show aggregated message This show aggregated message command displays all the aggregated messages saved in the message list.
expand aggregated message 1 This expand aggregated message command expands the aggregated message with message id 1.

Example 2

demonstrates how to start a parallel debugging session with prun.

 

% idb -parallel `which prun` -n 16 -N 8 ./cpi

Intel(R) Debugger for ItaniumŪ-based applications, Version 7.0,

Build 20021118

Reading symbolic information ...done

stopped at [void _rms_breakpoint(void):2150 0x20000000001913e0]

Source file not found or not readable, tried...

    ./loader.cc

    /usr/bin/loader.cc

(Cannot find source file loader.cc)

stopped at [void _rms_breakpoint(void):2150 0x20000000001913e0]

Source file not found or not readable, tried...

    ./loader.cc

    /usr/bin/loader.cc

(Cannot find source file loader.cc)

Process has exited

(idb)

   [0:15] Intel(R) Debugger for ItaniumŪ-based applications,

Version 7.0, Build 20021118

   [0:15] ------------------

   [0:15] object file name: cpi

   [0:15] Reading symbolic information ...   [0:15] done

   [0:15]      13     int done = 0, n, myid, numprocs, i;

(idb) where

(idb)

   [0:15] >0  0x4000000000000c60 in _start(...) in cpi

(idb) stop in main

(idb)

   [0:15] [#1: stop in int main(int, char**) ]

(idb) cont

(idb)

   [0:15] [1] stopped at [int main(int, char**):13 0x4000000000000f52]

   [0:15]      13     int done = 0, n, myid, numprocs, i;

 

In this example, the first couple of messages about not being able to find the file loader.cc can be ignored; they are caused by the fact that this file usually does not exist on a production system.