The basics of the job system
- What is a job system and why do we need it?
- Specifying resources, output files and notifications
- Starting programs in your job
- Submitting and managing jobs
- Why doesn't my job start immediately? A story about scheduling policies and priorities.
- Job failure after a successful start
- Credit system basics: credits are used on all clusters at the KU Leuven (including the Tier-1 system BrENIAC) to control your compute time allocation
- Monitoring memory and CPU usage of programs, which helps to find the right parameters to improve your specification of the job requirements
- Worker framework: To manage lots of small jobs on a cluster. The cluster scheduler isn't meant to deal with tons of small jobs. Those create a lot of overhead, so it is better to bundle those jobs in larger sets.
- The checkpointing framework can be used to run programs that take longer than the maximum time allowed by the queue. It can break a long job in shorter jobs, saving the state at the end to automatically start the next job from the point where the previous job was interrupted.
- Running jobs on GPU or Xeon Phi nodes: The procedure is not standardised across the VSC, so we refer to the pages for each cluster in the "Available hardware" section of this web site