UNIVERSITY MATHEMATICAL LABORATORY, CAMBRIDGE

Planning Document No. 22
Job Scheduling

The methods and algorithms used to schedule jobs in Mark 12, although working well much of the time, have been found to have inadequacies causing, among other things, some jobs to be inordinately delayed. Further, the major criterion currently used is the size of core needed for the next phase. In the revised system proposed for Mark 13, this figure will not usually be available for the initial scheduling of a job. For these reasons it is necessary to review the scheduling of jobs for Mark 13, and these documents suggest a possible algorithm.

There are three distinct stages in the scheduling of a job:
Stage 1.When the system first notices a job.
Stage 2.When the job is arrived, i.e. all paper tape documents, and private tapes are loaded. In this state the job is ready.
Stage 3.When a job is waiting for core in order to run.

At stage 1 the job is assigned a number (called the cipher) and a priority. If no ciphers are available the job can either be rejected - to try again later - or be held by the system until one becomes available. The priority determines which of the four queues the job is held in, and is normally set externally.

At stage 2 the job is waiting for an OP base - i.e. a slot in the system that it can be fitted into. It is normally the case that there are more ciphers than slots, and hence there is a scheduling problem.

At stage 3 jobs which are 'active' - i.e. have passed stage 2 and have an OP base - sue for space. Again the demand usually exceeds the supply. These remarks apply mainly to off-line jobs. However, on-line jobs must pass through each of these stages before even the log-in program can run, and it therefore appears that the same mechanism should be used for all jobs.

Stage 1. Assigning priority

The a priori priority of a job should be reasonably static. I would suggest a normal setting of :
console jobs3
priority off-line jobs2
normal off-line jobs0

The priority of off-line jobs would be determined by the tape reader on which the JD is read; reader 0 normally having priority 2, other readers priority 0. The following messages would be provided:

PRIORITY <n> READER <m>

PRIORITY <n> CONSOLE

Jobs which are initiated by the "call JD" mechanism will normally have the priority of the calling job, except that console initiated jobs should have priority 1.

Stage 2. Allocating slots

Here the problem is the fair allocation of resources. At this juncture the job is costing the system very little, from here on it is liable to cost the system a lot. It seems to me that an estimate should be made of how much each job will cost and the available slots filled up with the cheapest.

The quantities available at this point are the following, all derived from the job description.

LIMSTORE, COMP, EXEC, LIMOUT, TAPES
(TAPES is a one bit marker set if the job uses private tapes.)

The first problem is to make a sensible measure out of these quantities. I make the following suggestions.

  1. We do not use TAPES (although we could use it to ensure a better "mix").
  2. We do not use EXEC.
  3. We regard LIMOUT, which is an overall bound to the temporary disc space used by the job, as a "store size", and add it, suitably scaled to LIMSTORE.
  4. COMP should be translated to course-grained scale; perhaps the following values would be sensible:
    1if< 1 min
    2< 2 min
    3< 3 min
    4< 5 min
    5< 10 min
    6< 15 min
    7< 30 min
    8> 30 m

    Call this value C

  5. A sensible measure to experiment with is:

    X = C (LIMSTORE + LIMOUT /128)

    The scale factor of 128 on LIMOUT means that 128 blocks LIMOUT = 1K LIMSTORE.

The second problem is to deal fairly with four queues; to make sure that priority jobs have priority, but lower jobs do not hang around for ever. To deal with this one would revise the definition of the measure to be:

M = C (LS + LO/128)(4 - P)

Where P is the priority, LS LIMSTORE, LO LIMOUT.

The effect of this is to say that time for a priority 3 job costs less than the time for a priority 2 job, which would appear to be a reasonable thing to say. The parameters in this expression are all easily variable, indeed it really has the form:

M (a,b,c) = (C + a)(LS + LO/b)(c - P)    a,b>0   c>3

I propose that an available slot is allocated to that job which has the smallest measure; if two or more jobs have the same measure, then the highest will be scheduled.

There still remains the danger of lower priority jobs sticking for a long time, while higher priority jobs jump the queue. The easiest way of avoiding this appears to be to make the measure decrease with time. Hence the 1 minute routine should keep a count for jobs held in this status, and we can re-define our measure to be R = resource where

R = M - kT

Where T is the elapsed waiting time, and k some constant, which again can be adjusted to give decent results. An example below, suggests the value k = 10 - although I would not place much faith in this figure.

JOB 1 : priority 0, 40k LIMSTORE, 10 minutes : M = 4 x 40 x 5 = 800

JOB 2 : priority 2, 8 k LIMSTORE, 3 minutes : M = 2 x 8 x 3 = 48

Job 2 should naturally run before job 1. If jobs like 2 keep on coming, how long should job 1 wait ? With k = 10 (and no other slots available) it would wait about 75 minutes - which seems quite reasonable. To recapitulate: we define a figure R, which measures the resources demanded by the job, we make this decrease linearly with time, and we run those jobs with smallest R.

Stage 3. Space scheduling

The only criteria we have here are the position on the queues and the amount of space required.

The algorithm I suggest is to give what space there is to the highest possible job, except that if there is no space for a job at the top of its priority queue, then no lower jobs are offered the space. We thus ensure that large chunks of store will gradually accrete, if required. This is the algorithm that is currently in use, and has been found to work sufficiently well.

It is clearly imperative that all jobs are allocated space by one routine so that allocation is made to higher jobs first.

External scheduling

Experience has shown that a certain amount of external scheduling is currently needed to ensure a good job mix. There is a need for the ability to put some jobs through quickly, and also to be able to build up a large backlog.

I would like to canvass the following suggestions:

  1. Messages as above to regulate priorities.
  2. Messages to restrict the amount of resources available to jobs.

    This would have the effect of delaying a long job even if there were available slots, hence enabling a backlog to be built up while putting through. Possible formats for these messages are :

    MODE  <n>  :  n = 0  small,  1 medium  2 all jobs  (etc)
    
    SMALL JOBS,  MEDIUM JOBS,  ALL JOBS
    
    RESOURCE LIMIT <n>
    

    Any job whose resouce requirement is too large will be held up until it comes into range.

  3. Messages to restrict time or space available
                     hrs
    TIME LIMIT  <m>  mins
                     secs
    
    JOB LIMIT     n  K
    

    Any job transgressing these bounds to be thrown off. These messages would be useful, for example, for making sure that RPT jobs obeyed the rules.

  4. Message to restrict core space currently available
    STORE SIZE nK (n≤40)
    

    The effect of this will be to hold the space from nK to 40K as available for bufferage only, in order to expedite backlogging.

I would favour implementing the MODE message (which would be the least troublesome) and possibly the STORESIZE message, but not the others.

Supervisor scheduling / Accounting and Control Interface

There is clearly much common ground between the proposals of this document, and that of planning document 15 (Log-in, Accounting and Control). However, as I pointed out above, all jobs must pass through the Three scheduling stages in the supervisor before thay can run at all, even to enter the log-in program. It is worth noting that the log-in program is capable of exercising a dynamic user control based on his past history and also the running history of the system, while the scheduling in the supervisor is essentially a static control based on the parameters of the one job and the status of other jobs at that time. Thus there is a case to be made out for having both systems.

It would appear sensible that jobs that go through the log-in mechanism should be initially set up as a low resource (i.e. high priority) job, and then when log-in has reset the parameters the job should pass once more through the supervisor scheduling, in order to regulate fair shares.

I suggest that the feasability of this be examined; that it possibly be implemented in systems 1 or 2; that if it would be too difficult to implement in system 1 we allow on-line jobs to have an apparent high initial priority, as an expedient.

BL 21.11.66


Copyright © 1966 University of Cambridge Computer Laboratory. Distributed by permission. Thanks to Barry Landy, Roger Needham and David Hartley for giving permission to distribute these documents. Thanks to Barry Landy for lending me the paper document from which this was scanned. Any typographical errors probably arose in the course of OCR.


Previous Planning Document: 21. Log-in Accounting and Control - Pilot Scheme, JMHH, 1 November 1966 (plus Logging in and Control : System 1, JMHH, 4.2.67)
Next Planning Document: 23. File Protection, AGF, 16.11.66
Return to Cambridge Supervisor Planning Documents
Return to CUCPS TITAN page
Return to CUCPS home page
Return to University of Cambridge home page


Contact: CUCPS Committee (soc-cucps-committee@lists.cam.ac.uk)
This HTML version last updated: $Date: 1999/06/21 20:32:09 $