How Does the Distributed Batch Job Processing Work?

The following explains the process chain for the case where a user submits a job from a remote machine in the cluster, and the job steps for the job have been set up to run on a different remote machine, rather than the same machine or the machine with the master batch server (lajs). That is, assume in this scenario that Node A is the master batch server, Node B is the machine from which the job is submitted, and Node C is where the job steps will run.

  1. The user submits a job from Node B.
  2. The JS API checks that distributed batch processing is turned on and checks whether the machine from which the job was submitted is the master batch server or not.
  3. Since in this case Node B is not the master batch server, the JS API connects through a socket to tcpjsd on the master batch server machine (Node A).
  4. The tcpjsd daemon then forwards the job request to the master batch server (lajs).
  5. The master batch server (lajs) checks the EXECUTABLE table and finds out that the first job step for the job is supposed to run on Node C. It then calls the tcpexecjob program, which in turns connects through a socket to the tcpexecjobd daemon on Node C.
  6. The tcpexecjobd daemon starts execjob on Node C, which processes the job step.
  7. During the processing of the job step, the execjob program creates the print file and a log file on Node C. Because the $LAWDIR/print folder is linked to the shared file system, the print files are available to any node.
  8. After a step has completed, the master batch server checks if there are additional steps in the job to be processed, and, if so, begins processing the next step.
    Illustration: Distributed Batch Process Chain