OSC:Scheduler

From SOFTICE

Jump to: navigation, search

Contents


Pedagogical Objectives

  • Introduce Kernel Data Structures:
  • Introduce Kernel APIs: sys_nice
  • Big picture:
    • Kernel scheduler-related Data Structures
    • Static priority logic

Developed by:


[Briefing]

struct task_struct

Let's go back to the struct task_struct as it is defined in /include/linux/sched.h and examine some of the fields that are related to scheduling. We are not going to discuss in this lab the fields that are specific to SMP (Symmetric Multi-Processing, i.e. multiple CPUs systems).

693:         volatile long state;   /* -1 unrunnable, 0 runnable, >0 stopped */

The first field indicates the state of the process. Since Linux is implementing multi-threading with a one-to-one model (more information about that in lectures), the scheduler 's role is to schedule processes by scheduling their threads). Since we are in the PCB of the process, this field can't possibly reflect the state of all threads in the process; it's used to simply indicate if the entire process has been stopped or is otherwize unrunnable. The state information about each thread will be stored into the struct thread_struct data structure (cf OSC:Stealth Processes & PCBs).

  704:         int prio, static_prio;
  705:         struct list_head run_list;
  706:         prio_array_t *array;
  710:         unsigned long sleep_avg;
  711:         unsigned long long timestamp, last_ran;
  712:         unsigned long long sched_time; /* sched_clock time spent running */
  713:         int activated;
  714: 
  715:         unsigned long policy;
  717:         unsigned int time_slice, first_time_slice;
  718: 
  719: #ifdef CONFIG_SCHEDSTATS
  720:         struct sched_info sched_info;
  721: #endif
  722: 

The fields in the above box are related to the scheduler and will be covered in an uptoming laboratory.

Playing nice

One of the scheduling parameter that is first thought of is generally the processes' priority. In Linux, users can use a command called nice to reduce the priority of their process while superusers can use the same command to increase it also. In order to get a better understanding of the way priority is handled by the kernel, let's have a look at the sys_nice system call as defined in /kernel/sched.c:

3512 /*
3513  * sys_nice - change the priority of the current process.
3514  * @increment: priority increment
3515  *
3516  * sys_setpriority is a more generic, but much slower function that
3517  * does similar things.
3518  */

The only parameter, increment specifies how much this priority is going to be reduced / increased by.

3519 asmlinkage long sys_nice(int increment)
3520 {
3521         int retval;
3522         long nice;
3523 
3524         /*
3525          * Setpriority might change our priority at the same moment.
3526          * We don't have to worry. Conceptually one call occurs first
3527          * and we have a single winner.
3528          */
3529         if (increment < -40)
3530                 increment = -40;
3531         if (increment > 40)
3532                 increment = 40;

In the above code fragment, the parameter is checked against limit values and accordingly re-assigned if necessary.

3533 
3534         nice = PRIO_TO_NICE(current->static_prio) + increment;
The PRIO_TO_NICE macro converts the value stored in the static_prio field of the process descriptor into a value to which we can add our increment parameter. It is also defined in /kernel/sched.c as:
 56 /*
 57  * Convert user-nice values [ -20 ... 0 ... 19 ]
 58  * to static priority [ MAX_RT_PRIO..MAX_PRIO-1 ],
 59  * and back.
 60  */
 61 #define NICE_TO_PRIO(nice)      (MAX_RT_PRIO + (nice) + 20)
 62 #define PRIO_TO_NICE(prio)      ((prio) - MAX_RT_PRIO - 20)
 63 #define TASK_NICE(p)            PRIO_TO_NICE((p)->static_prio)

With MAX_RT_PRIO defined in /include/linux/sched.h as:

473 /*
474  * Priority of a process goes from 0..MAX_PRIO-1, valid RT
475  * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
476  * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
477  * values are inverted: lower p->prio value means higher priority.
478  *
479  * The MAX_USER_RT_PRIO value allows the actual maximum
480  * RT priority to be separate from the value exported to
481  * user-space.  This allows kernel threads to set their
482  * priority to a value higher than any user task. Note:
483  * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
484  */
485 
486 #define MAX_USER_RT_PRIO        100
487 #define MAX_RT_PRIO             MAX_USER_RT_PRIO
488 
489 #define MAX_PRIO                (MAX_RT_PRIO + 40)

The above code fragments illustrate that the static priority field ranges from 100 (highest priority, MAX_USER_RT_PRIO) to 139 (lowest priority, MAX_PRIO-1). The default is actually 120 to which the so-called "nice" value, ranging between -20 and +19, is added. When converting this value to something we can add the increment parameter to, we simply subtract 120 which is the default static_prio value. This shows why, when a user calls the nice command with a positive value, (s)he is indeed trying to lower the static priority of the targetted process by increasing the static_prio field's value.


Now, back to /kernel/sched.c, we can see what we do with the adjusted "nice" value:

3535         if (nice < -20)
3536                 nice = -20;
3537         if (nice > 19)
3538                 nice = 19;
3539 

Since we have been modifying the value possibly from its highest to lowest value (was 139, increment was -40), we need to check again that we have a resulting value in the expected interval. The range for "nice values" is [-20 .. +19]. This range is also the range allowed by the nice command line tool.

3540         if (increment < 0 && !can_nice(current, nice))
3541                 return -EPERM;
<pre>
Now that we know what the priority modification requested by the process is, we need to be carefull when the ''increment'' specifies a negative value (meaning an actual increase of the process priority). While any user can decrease the priority of their processes to be "nice" to others, only the super user can increase the priority of his/her processes. The ''can_nice'' function will help us ensure that a static priority increase request is legitimate. If it isn't, the ''EPERM'' error code is returned. 

<pre>
3542 
3543         retval = security_task_setnice(current, nice);
3544         if (retval)
3545                 return retval;
3546 

Since modifying the priority of processes is a somewhat sensitive operation, the security_task_setnice function is also invoked. It's role is to serve as a hook for any security-related steps that need to be taken everyime a process's priority is messed around with.

3547         set_user_nice(current, nice);
3548         return 0;
3549 }

Finally, the set_user_nice function will modify the priority of the process as defined in


[Solved]

[Exercises]

[Projects]

References

[UTLK] Understanding the Linux Kernel, 3/e


[LKP] Linux Kernel Primer, The: A Top-Down Approach for x86 and PowerPC Architectures


[IBM-DW] Inside the Linux Scheduler

Personal tools