X10RT is a library that an X10 program uses to communicate between places. There are several implementations of X10RT available for use. The different implementations have different capabilities and performance characteristics, as described below.
X10RT Implementation | Description |
---|---|
Sockets | ssh to support multiple places on one or more hosts. This is the default implementation, and is the only option when using Managed X10. It can be found in x10.runtime/x10rt/sockets. |
An open-source implementation that uses TCP/IP sockets and
Standalone | A open-source implementation that supports multiple places on a single host, using shared memory between places. Standalone has high bandwidth, but limited message sizes and only runs on a single machine. |
MPI | An implementation of X10RT on top of MPI2. It is fully open source and can be found in x10.runtime/x10rt/mpi . This supports all the hardware that your MPI implementation supports, such as Infiniband and Ethernet. |
PAMI | PAMI is an IBM communications API that supports high-end networks such HFI (Host Fabric Interface), Blue Gene, Infiniband, and also Ethernet. PAMI code is located in x10.runtime/x10rt/pami. Building and running with PAMI requires the IBM Parallel Environment. |
The default is sockets
on all platforms except Blue Gene/Q (which defaults to pami
). All platforms except Blue Gene support standalone
and sockets
.
Properties files
The programmer may specify X10RT
properties to be used by x10c++ in a properties file:
$ x10c++ -x10rt FOO
This will cause the compiler to search to read etc/x10rt_FOO.properties
. Typically the contents of this file tell x10c++
how to compile the generated c++
code.
For instance, this file may contain:
$ cat etc/x10rt_mpi.properties
CXX=mpicxx
CXXFLAGS=
LDFLAGS=
LDLIBS=-lx10rt_mpi
Building alternative X10RT Implementations
By default, the build process will automatically build the Standalone and socket implementations of X10RT. If you have MPI installed on your build machine, then you may want to build the MPI implementation as well. This can be done by giving ant the extra command line -DX10RT_MPI=true
when building X10 (cd x10.dist && ant dist -DX10RT_MPI=true
). Similarly, if you have PAMI installed on your machine, you can build support for pami by passing the extra command -DX10RT_PAMI=true
when building X10.
Selecting alternative X10RT Implementations
When you compile a program, you can optionally select the X10RT implementation you want to use. This can be done on a per-compilation basis. There is no need to rebuild X10 to switch X10RT implementations; simply recompiling the X10 program (relinking the C++ executable) is sufficient.
By default, x10c++ will use the implementation that is appropriate for the target platform, which is usually sockets. The default can be overridden either by giving the -x10rt <impl> command line argument to x10c++ (valid values for <impl> are: mpi
, standalone
, pami, or sockets
). The string given corresponds to a properties file in etc, containing the specifics required to build an executable for that x10rt implementation. One can inspect the contents of this directory to see what x10rt implementations are available, and also to add custom ones.
Running X10 programs
Depending on which X10RT implementation you selected, you will execute the resulting executable in slightly different ways.
X10RT Implementation | Execution options |
---|---|
MPI | mpirun |
PAMI | poe, or execute directly |
STANDALONE | Execute directly (no extra tools needed) |
SOCKETS | X10Launcher, or execute directly |
Running with Sockets backend
The sockets transport is currently the default backend if you don't compile with the "-x10rt" flag. Running with sockets is easy - you simply execute the binary that was produced when you compiled your program. This will run your program in a single place on your local machine. To use more places, you set an environment variable X10_NPLACES
to specify the number of places.
There are two ways to specify the machines to run those places on:
- Set the environment variable
X10_HOSTFILE
to the full path for a hostfile. The hostfile is a simple text file that contains a list of hostnames to run on, with one line per machine. - Set the environment variable
X10_HOSTLIST
, to a comma-separated list of hostnames, without spaces. This environment variable is checked only ifX10_HOSTFILE
was not set.
Both of the above will wrap if there are more places than hostnames specified. For example, setting X10_NPLACES=4
and X10_HOSTLIST=host1,host2
will cause places 0 and 2 to run on host1, and places 1 and 3 to run on host2. If neither of the above is set, and there is more than 1 place, then it defaults to running everything on localhost.
If you're running on more than one machine, you should have public/private key ssh authentication set up, so you can ssh from one machine to another without getting a password prompt. You should also have your executable and hostfile available in the same location on every machine listed in the hostfile. You compile your program with "-x10rt sockets", and with ssh in place, and the environment variables set, you run your executable. You don't have to launch from one of the machines in the hostfile, but you do need to have the ssh authentication set up between the machine you launch from and the first machine in the list.
The sockets backend supports gdb debugging through the X10_GDB
environment variable. The value of this has two forms:
- "place:port", where place is the place that you want to be debugged or the string "all", and port is the port number that you want gdbserver to use. This launches the runtime for the specified place under gdbserver, which allows you to connect to the remote runtime with your local gdb session. See Using the gdbserver program for more details. If you specify "all" for the place, then all places will be started under gdbserver at the specified port. Be aware that if you have multiple places running on the same machine, then this will cause port number conflicts.
- "place", where place is either the place that you want to be debugged, or the string "all". Setting this to a number causes the specified place to be launched under gdb in a new xterm. For example, setting
X10_GDB=0
will cause the x10 runtime for place 0 to be started in a gdb session in a new xterm, while other places run normally. Setting this to "all" will cause all runtimes to execute in separate gdb xterms. Each xterm is given a title showing which place it is running so you can keep track.
Additional flags that may be of use:
X10_NOYIELD
The X10 runtime regularly polls the network to see if data has arrived to work on. If you have more runtime threads or places than CPU cores, then this polling in idle places can starve real work in other places. So the sockets runtime will give up the CPU if one of these polls comes up empty. By setting the X10_NOYIELD
flag to true, you disable this yield. Consider doing this if you have as many worker threads as you have cores.
X10_LAZYLINKS
The socket links between places are point-to-point. By default, these links are established at the beginning of communications, from every place to every other place. This gives better performance when running everything locally, but if you are running with a large number of places across multiple machines, you may want to establish the links on-demand, by setting X10_LAZYLINKS
to true.
X10_FORCEPORTS
Normally the port number that each place opens for other places to connect to it is left up to the operating system. This eliminates the possibility of port contention, but it also requires a fancy place lookup mechanism supported by the launchers. It's possible to force the listen port for each place to a specific number, by setting X10_FORCEPORTS
. If everything is running on localhost, this will bypass the need for the lookup. The value can take one of two forms. Firsy, you can set it to a single number. This will be the port number for place 0, and each additional place will listen at that port number plus the place number. For example, if X10_FORCEPORTS=7000
, then place 0 will listen on port 7000, place 1 on port 7001, etc. The other form requires a comma-separated list of values, one per place. So if you set X10_FORCEPORTS=7000,7005,7010
, and run with 3 places, then place 0 will be at port 7000, place 1 at port 7005, and place 2 at port 7010.
It's possible to run without the launcher, and start up each place manually. To do this, you need to set the X10_LAUNCHER_PLACE
, and X10_FORCEPORTS
environment variables, which would normally be set by the launcher. For example:
// Launch three places on triloka1-3, all using port 7001:
X10_LAUNCHER_PLACE=0 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka1
X10_LAUNCHER_PLACE=1 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka2
X10_LAUNCHER_PLACE=2 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka3
Running with the MPI backend
In MPI mode, the X10 program is launched with the MPI launcher program e.g. mpirun
, mpiexec
, poe
, etc.
By default, an MPI threading level of MPI_THREAD_MULTIPLE
is requested.
As some MPI implementations do not support MPI_THREAD_MULTIPLE
, it is possible to request MPI_THREAD_SERIALIZED
instead by setting the environment variable X10RT_MPI_THREAD_SERIALIZED=1
.
In the special case of single-threaded places with static threads (X10_NTHREADS=1 X10_NUM_IMMEDIATE_THREADS=0 X10_STATIC_THREADS=true
), a threading level of MPI_THREAD_SINGLE
is requested.
Running with the Standalone backend
Standalone mode sets up shared memory regions and forks off one instance of the program per place. The places all run on the local machine. There is one environment variable "X10_NPLACES", which should be set to the number of places. If not set, it defaults to 1 and prints a warning. The standalone has transport has a limit on the size of the data blocks that can be sent (about 512k), because of the shared memory regions. To run, just compile with "-x10rt standalone", set the environment variable, and run your executable. This transport is a good choice if you're running everything one one machine and don't use large messages.