X10RT is a library that an X10 program uses to communicate between places. There are several implementations of X10RT available for use. The different implementations have different capabilities and performance characteristics, as described below.
|Sockets|| An open-source implementation that uses TCP/IP sockets and ssh to support multiple places on one or more hosts. This is the default implementation, and is the only option when using Managed X10. It can be found in
|Standalone||A open-source implementation that supports multiple places on a single host, using shared memory between places. Standalone has high bandwidth, but limited message sizes and only runs on a single machine.|
|MPI||An implementation of X10RT on top of MPI2. It is fully open source and can be found in
|PAMI||PAMI is an IBM communications API that supports high-end networks such HFI (Host Fabric Interface), Blue Gene, Infiniband, and also Ethernet. PAMI code is located in
The default is
sockets on all platforms except Blue Gene/Q (which defaults to
pami). All platforms except Blue Gene support
The programmer may specify
X10RT properties to be used by x10c++ in a properties file:
$ x10c++ -x10rt FOO
This will cause the compiler to search to read
etc/x10rt_FOO.properties. Typically the contents of this file tell
x10c++ how to compile the generated
For instance, this file may contain:
$ cat etc/x10rt_mpi.properties CXX=mpicxx CXXFLAGS= LDFLAGS= LDLIBS=-lx10rt_mpi
By default, the build process will automatically build the Standalone and socket implementations of X10RT. If you have MPI installed on your build machine, then you may want to build the MPI implementation as well. This can be done by giving ant the extra command line
-DX10RT_MPI=true when building X10 (
cd x10.dist && ant dist -DX10RT_MPI=true). Similarly, if you have PAMI installed on your machine, you can build support for pami by passing the extra command
-DX10RT_PAMI=true when building X10.
When you compile a program, you can optionally select the X10RT implementation you want to use. This can be done on a per-compilation basis. There is no need to rebuild X10 to switch X10RT implementations; simply recompiling the X10 program (relinking the C++ executable) is sufficient.
By default, x10c++ will use the implementation that is appropriate for the target platform, which is usually sockets. The default can be overridden either by giving the -x10rt <impl> command line argument to x10c++ (valid values for <impl> are:
pami, or sockets). The string given corresponds to a properties file in etc, containing the specifics required to build an executable for that x10rt implementation. One can inspect the contents of this directory to see what x10rt implementations are available, and also to add custom ones.
Depending on which X10RT implementation you selected, you will execute the resulting executable in slightly different ways.
|X10RT Implementation||Execution options|
|PAMI||poe, or execute directly|
|STANDALONE||Execute directly (no extra tools needed)|
|SOCKETS||X10Launcher, or execute directly|
The sockets transport is currently the default backend if you don't compile with the "-x10rt" flag. Running with sockets is easy - you simply execute the binary that was produced when you compiled your program. This will run your program in a single place on your local machine. To use more places, you set an environment variable
X10_NPLACES to specify the number of places.
There are two ways to specify the machines to run those places on:
- Set the environment variable
X10_HOSTFILEto the full path for a hostfile. The hostfile is a simple text file that contains a list of hostnames to run on, with one line per machine.
- Set the environment variable
X10_HOSTLIST, to a comma-separated list of hostnames, without spaces. This environment variable is checked only if
X10_HOSTFILEwas not set.
Both of the above will wrap if there are more places than hostnames specified. For example, setting
X10_HOSTLIST=host1,host2 will cause places 0 and 2 to run on host1, and places 1 and 3 to run on host2. If neither of the above is set, and there is more than 1 place, then it defaults to running everything on localhost.
If you're running on more than one machine, you should have public/private key ssh authentication set up, so you can ssh from one machine to another without getting a password prompt. You should also have your executable and hostfile available in the same location on every machine listed in the hostfile. You compile your program with "-x10rt sockets", and with ssh in place, and the environment variables set, you run your executable. You don't have to launch from one of the machines in the hostfile, but you do need to have the ssh authentication set up between the machine you launch from and the first machine in the list.
The sockets backend supports gdb debugging through the
X10_GDB environment variable. The value of this has two forms:
- "place:port", where place is the place that you want to be debugged or the string "all", and port is the port number that you want gdbserver to use. This launches the runtime for the specified place under gdbserver, which allows you to connect to the remote runtime with your local gdb session. See Using the gdbserver program for more details. If you specify "all" for the place, then all places will be started under gdbserver at the specified port. Be aware that if you have multiple places running on the same machine, then this will cause port number conflicts.
- "place", where place is either the place that you want to be debugged, or the string "all". Setting this to a number causes the specified place to be launched under gdb in a new xterm. For example, setting
X10_GDB=0will cause the x10 runtime for place 0 to be started in a gdb session in a new xterm, while other places run normally. Setting this to "all" will cause all runtimes to execute in separate gdb xterms. Each xterm is given a title showing which place it is running so you can keep track.
Additional flags that may be of use:
X10_NOYIELD The X10 runtime regularly polls the network to see if data has arrived to work on. If you have more runtime threads or places than CPU cores, then this polling in idle places can starve real work in other places. So the sockets runtime will give up the CPU if one of these polls comes up empty. By setting the
X10_NOYIELD flag to true, you disable this yield. Consider doing this if you have as many worker threads as you have cores.
X10_LAZYLINKS The socket links between places are point-to-point. By default, these links are established at the beginning of communications, from every place to every other place. This gives better performance when running everything locally, but if you are running with a large number of places across multiple machines, you may want to establish the links on-demand, by setting
X10_LAZYLINKS to true.
X10_FORCEPORTS Normally the port number that each place opens for other places to connect to it is left up to the operating system. This eliminates the possibility of port contention, but it also requires a fancy place lookup mechanism supported by the launchers. It's possible to force the listen port for each place to a specific number, by setting
X10_FORCEPORTS. If everything is running on localhost, this will bypass the need for the lookup. The value can take one of two forms. Firsy, you can set it to a single number. This will be the port number for place 0, and each additional place will listen at that port number plus the place number. For example, if
X10_FORCEPORTS=7000, then place 0 will listen on port 7000, place 1 on port 7001, etc. The other form requires a comma-separated list of values, one per place. So if you set
X10_FORCEPORTS=7000,7005,7010, and run with 3 places, then place 0 will be at port 7000, place 1 at port 7005, and place 2 at port 7010.
It's possible to run without the launcher, and start up each place manually. To do this, you need to set the
X10_FORCEPORTS environment variables, which would normally be set by the launcher. For example:
// Launch three places on triloka1-3, all using port 7001: X10_LAUNCHER_PLACE=0 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka1 X10_LAUNCHER_PLACE=1 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka2 X10_LAUNCHER_PLACE=2 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka3
In MPI mode, the X10 program is launched with the MPI launcher program e.g.
By default, an MPI threading level of
MPI_THREAD_MULTIPLE is requested.
As some MPI implementations do not support
MPI_THREAD_MULTIPLE, it is possible to request
MPI_THREAD_SERIALIZED instead by setting the environment variable
In the special case of single-threaded places with static threads (
X10_NTHREADS=1 X10_NUM_IMMEDIATE_THREADS=0 X10_STATIC_THREADS=true), a threading level of
MPI_THREAD_SINGLE is requested.
Standalone mode sets up shared memory regions and forks off one instance of the program per place. The places all run on the local machine. There is one environment variable "X10_NPLACES", which should be set to the number of places. If not set, it defaults to 1 and prints a warning. The standalone has transport has a limit on the size of the data blocks that can be sent (about 512k), because of the shared memory regions. To run, just compile with "-x10rt standalone", set the environment variable, and run your executable. This transport is a good choice if you're running everything one one machine and don't use large messages.