X10RT Implementations

X10RT is a library that an X10 program uses to communicate between places. There are several implementations of X10RT available for use. The different implementations have different capabilities and performance characteristics, as described below.

X10RT Implementation	Description
Sockets	An open-source implementation that uses TCP/IP sockets and ssh to support multiple places on one or more hosts. This is the default implementation, and is the only option when using Managed X10. It can be found in `x10.runtime/x10rt/sockets.`
Standalone	A open-source implementation that supports multiple places on a single host, using shared memory between places. Standalone has high bandwidth, but limited message sizes and only runs on a single machine.
MPI	An implementation of X10RT on top of MPI2. It is fully open source and can be found in `x10.runtime/x10rt/mpi`. This supports all the hardware that your MPI implementation supports, such as Infiniband and Ethernet.
PAMI	PAMI is an IBM communications API that supports high-end networks such HFI (Host Fabric Interface), Blue Gene, Infiniband, and also Ethernet. PAMI code is located in `x10.runtime/x10rt/pami.` Building and running with PAMI requires the IBM Parallel Environment.

The default is sockets on all platforms except Blue Gene/Q (which defaults to pami). All platforms except Blue Gene support standalone and sockets.

Properties files

The programmer may specify X10RT properties to be used by x10c++ in a properties file:

$ x10c++ -x10rt FOO

This will cause the compiler to search to read etc/x10rt_FOO.properties. Typically the contents of this file tell x10c++ how to compile the generated c++ code.
For instance, this file may contain:

$ cat etc/x10rt_mpi.properties
CXX=mpicxx
CXXFLAGS=
LDFLAGS=
LDLIBS=-lx10rt_mpi

Building alternative X10RT Implementations

By default, the build process will automatically build the Standalone and socket implementations of X10RT. If you have MPI installed on your build machine, then you may want to build the MPI implementation as well. This can be done by giving ant the extra command line -DX10RT_MPI=true when building X10 (cd x10.dist && ant dist -DX10RT_MPI=true). Similarly, if you have PAMI installed on your machine, you can build support for pami by passing the extra command -DX10RT_PAMI=true when building X10.

Selecting alternative X10RT Implementations

When you compile a program, you can optionally select the X10RT implementation you want to use. This can be done on a per-compilation basis. There is no need to rebuild X10 to switch X10RT implementations; simply recompiling the X10 program (relinking the C++ executable) is sufficient.

By default, x10c++ will use the implementation that is appropriate for the target platform, which is usually sockets. The default can be overridden either by giving the -x10rt <impl> command line argument to x10c++ (valid values for <impl> are: mpi, standalone, pami, or sockets). The string given corresponds to a properties file in etc, containing the specifics required to build an executable for that x10rt implementation. One can inspect the contents of this directory to see what x10rt implementations are available, and also to add custom ones.

Running X10 programs

Depending on which X10RT implementation you selected, you will execute the resulting executable in slightly different ways.

X10RT Implementation	Execution options
MPI	mpirun
PAMI	poe, or execute directly
STANDALONE	Execute directly (no extra tools needed)
SOCKETS	X10Launcher, or execute directly

Running with Sockets backend

The sockets transport is currently the default backend if you don't compile with the "-x10rt" flag. Running with sockets is easy - you simply execute the binary that was produced when you compiled your program. This will run your program in a single place on your local machine. To use more places, you set an environment variable X10_NPLACES to specify the number of places.

There are two ways to specify the machines to run those places on:

Set the environment variable X10_HOSTFILE to the full path for a hostfile. The hostfile is a simple text file that contains a list of hostnames to run on, with one line per machine.
Set the environment variable X10_HOSTLIST, to a comma-separated list of hostnames, without spaces. This environment variable is checked only if X10_HOSTFILE was not set.

Both of the above will wrap if there are more places than hostnames specified. For example, setting X10_NPLACES=4 and X10_HOSTLIST=host1,host2 will cause places 0 and 2 to run on host1, and places 1 and 3 to run on host2. If neither of the above is set, and there is more than 1 place, then it defaults to running everything on localhost.

If you're running on more than one machine, you should have public/private key ssh authentication set up, so you can ssh from one machine to another without getting a password prompt. You should also have your executable and hostfile available in the same location on every machine listed in the hostfile. You compile your program with "-x10rt sockets", and with ssh in place, and the environment variables set, you run your executable. You don't have to launch from one of the machines in the hostfile, but you do need to have the ssh authentication set up between the machine you launch from and the first machine in the list.

The sockets backend supports gdb debugging through the X10_GDB environment variable. The value of this has two forms:

"place:port", where place is the place that you want to be debugged or the string "all", and port is the port number that you want gdbserver to use. This launches the runtime for the specified place under gdbserver, which allows you to connect to the remote runtime with your local gdb session. See Using the gdbserver program for more details. If you specify "all" for the place, then all places will be started under gdbserver at the specified port. Be aware that if you have multiple places running on the same machine, then this will cause port number conflicts.
"place", where place is either the place that you want to be debugged, or the string "all". Setting this to a number causes the specified place to be launched under gdb in a new xterm. For example, setting X10_GDB=0 will cause the x10 runtime for place 0 to be started in a gdb session in a new xterm, while other places run normally. Setting this to "all" will cause all runtimes to execute in separate gdb xterms. Each xterm is given a title showing which place it is running so you can keep track.

Additional flags that may be of use:

X10_NOYIELD The X10 runtime regularly polls the network to see if data has arrived to work on. If you have more runtime threads or places than CPU cores, then this polling in idle places can starve real work in other places. So the sockets runtime will give up the CPU if one of these polls comes up empty. By setting the X10_NOYIELD flag to true, you disable this yield. Consider doing this if you have as many worker threads as you have cores.

X10_LAZYLINKS The socket links between places are point-to-point. By default, these links are established at the beginning of communications, from every place to every other place. This gives better performance when running everything locally, but if you are running with a large number of places across multiple machines, you may want to establish the links on-demand, by setting X10_LAZYLINKS to true.

X10_FORCEPORTS Normally the port number that each place opens for other places to connect to it is left up to the operating system. This eliminates the possibility of port contention, but it also requires a fancy place lookup mechanism supported by the launchers. It's possible to force the listen port for each place to a specific number, by setting X10_FORCEPORTS. If everything is running on localhost, this will bypass the need for the lookup. The value can take one of two forms. Firsy, you can set it to a single number. This will be the port number for place 0, and each additional place will listen at that port number plus the place number. For example, if X10_FORCEPORTS=7000, then place 0 will listen on port 7000, place 1 on port 7001, etc. The other form requires a comma-separated list of values, one per place. So if you set X10_FORCEPORTS=7000,7005,7010, and run with 3 places, then place 0 will be at port 7000, place 1 at port 7005, and place 2 at port 7010.

It's possible to run without the launcher, and start up each place manually. To do this, you need to set the X10_LAUNCHER_PLACE, and X10_FORCEPORTS environment variables, which would normally be set by the launcher. For example:

// Launch three places on triloka1-3, all using port 7001:
X10_LAUNCHER_PLACE=0 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka1
X10_LAUNCHER_PLACE=1 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka2
X10_LAUNCHER_PLACE=2 X10_NPLACES=3 X10_FORCEPORTS=7001,7001,7001 X10_HOSTLIST=triloka1,triloka2,triloka3 // run on triloka3

Running with the MPI backend

In MPI mode, the X10 program is launched with the MPI launcher program e.g. mpirun, mpiexec, poe, etc. By default, an MPI threading level of MPI_THREAD_MULTIPLE is requested. As some MPI implementations do not support MPI_THREAD_MULTIPLE, it is possible to request MPI_THREAD_SERIALIZED instead by setting the environment variable X10RT_MPI_THREAD_SERIALIZED=1.

In the special case of single-threaded places with static threads (X10_NTHREADS=1 X10_NUM_IMMEDIATE_THREADS=0 X10_STATIC_THREADS=true), a threading level of MPI_THREAD_SINGLE is requested.

Running with the Standalone backend

Standalone mode sets up shared memory regions and forks off one instance of the program per place. The places all run on the local machine. There is one environment variable "X10_NPLACES", which should be set to the number of places. If not set, it defaults to 1 and prints a warning. The standalone has transport has a limit on the size of the data blocks that can be sent (about 512k), because of the shared memory regions. To run, just compile with "-x10rt standalone", set the environment variable, and run your executable. This transport is a good choice if you're running everything one one machine and don't use large messages.

X10: Performance and Productivity at Scale