Queue priorities (Hexagon)

From HPC documentation portal
Jump to: navigation, search

We are continuously monitoring the queue and scheduling system to try to improve the balance. That said, and so that you better understand the many-dimensional priority system used by the scheduler, here is a somewhat simplified view of how hexagon prioritizes the jobs (in decreasing priority order).

  1. Hexagon is designed and bought for jobs that can run efficiently on high core counts (128 cores and higher). Jobs with high(er) core-count are prioritized.
  2. Hexagon is owned by several parties (NOTUR, local university etc). A "fairshare" system is configured to ensure that the different parties get their share (on a time-based average) of the resource. Previous usage will influence current priority, e.g. NOTUR users may have used more than their total share the last 14 days (time-weighted average) and will now get lower priority until usage equalizes.
  3. Fairshare is also in place between projects (i.e. nnXXXXk) and users to ensure that the percentage given in point 2. above is distributed among the users and projects in each group (nnXXXXk/nnYYYYk/local, userA/userB etc.).
  4. Queue time. The time that a job is eligible to run (see "showq" output) counts towards higher priority. Note that limits are in place to avoid users to stuff the queue with jobs that will just sit and earn queue time. See below for a list of the limits enforced.

Base scheduler rules

  • Ensure high total usage (load) on the system by running jobs out of priority order if it will not delay the start of higher priority jobs ("backfill").
  • Use the smallest possible resource that fits for the job (that is, unless the job needs more memory reserve those nodes for later).
  • Reserve part of the machine for debugging (less than 128 cores and less than 20 minutes) and short test jobs (less than 1 hour and less than 512 cores). These jobs are automatically moved to the "debug" and "small" queues and given higher priority.

Additional limits per user

  • 2 jobs in eligible-to-run state (as long as the following limits are not exceeded)
  • 4096 cores total in running jobs
  • 8 jobs running on a loaded system
  • 22 jobs running on a non-loaded system.

All jobs exceeding these limits go in the "blocked" state (as shown in "showq" output). When the resources are freed, job will be moved from "blocked" state to "idle" or "eligible-to-run", if the reason for being "blocked" were the scheduler constraints. (There are number of other reasons for a job to being "blocked", please use "checkjob" command to see the actual reason).

Additional info

  • You can see the calculated priorities for the eligible jobs in the output of: showq -i