lesson7.ppt

Task-Switching
How the x86 processor assists
with context-switching among
multiple program-threads
Program Model
• Programs consist of data and instructions
• Data consists of constants and variables,
which may be ‘persistent’ or ‘transient’
• Instructions may be ‘private’ or ‘shared’
• These observations lead to a conceptual
model for the management of programs,
and to special processor capabilities that
assist in supporting that conceptual model
Conceptual Program-Model
runtime library
STACK
created
during
runtime
Private Data (transient)
heap
BSS
created
at
compile
time
Shared Instructions and Data
(persistent)
DATA
TEXT
Uninitialized Data (persistent)
Initialized Data (persistent)
Private Instructions (persistent)
Task Isolation
• The CPU is designed to assist the system
software in isolating the private portions of
one program from those of another while
they both are residing in physical memory,
while allowing them also to share certain
instructions and data in a controlled way
• This ‘sharing’ includes access to the CPU,
whereby the tasks take turns at executing
IDT
Multi-tasking
GDT
IDTR
TR
GDTR
TSS 1
TSS 2
shared runtime library
STACK
STACK
supervisor-space (ring0)
user-space (ring3)
SP
SS
heap
heap
BSS
BSS
DATA
DATA
TEXT
TEXT
Task #1
Task #2
DS
IP
CS
Context-Switching
• The CPU can perform a ‘context-switch’ to
save the current values of all its registers
(in the memory-area referenced by the TR
register), and to load new values into all its
registers (from the memory-area specified
by a new Task-State Segment selector)
• There are four ways to trigger this ‘taskswitch’ operation on x86 processors
How to cause a task-switch
• Use an ‘ljmp’ instruction (long jump):
ljmp $task_selector, $0
• Use an ‘lcall’ instruction (long call):
lcall $task_selector, $0
• Use an ‘int-n’ instruction (with a task-gate):
int $0x80
• Use an ‘iret’ instruction (with NT=1):
iret
‘ljmp’ and ‘lcall’
• These instructions are similar – they both
make use of a ‘selector’ for a Task-State
Segment descriptor
Base[31..24]
A
Limit
000 V
P
[19..16]
L
Base[ 15..0 ]
D
P 0 type Base[23..16]
L
Limit[ 15..0 ]
TSS Descriptor-Format
type: 16bitTSS( 0x1=available or 0x3=busy) or 32bitTSS( 0x9=available or 0xB=busy)
The two TSS formats
• Intel introduced the Task-State Segment in
the 80286 processor (used in IBM-PC/AT)
• The 80286 CPU had a 16-bit architecture
• Later Intel introduced its 80386 processor
which had a 32-bit architecture requiring a
larger and more elaborate format for its
Task-State Segment data-structure
• The 286 TSS is now considered ‘obsolete’
The 80286 TSS format
16-bits
22 words
= field is ‘static’
= field is ‘volatile’
link
sp0
ss0
sp1
ss1
sp2
ss2
IP
FLAGS
AX
CX
DX
BX
SP
BP
SI
DI
ES
CS
SS
DS
LDTR
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
The 80386 TSS format
32-bits
link
esp0
ss0
esp1
ss1
esp2
ss2
PTDB
EIP
26 longwords
ss0 EFLAGS ss0
EAX
ss0
ss0
ECX
ss0
ss0
EDX
ss0
ss0
EBX
ss0
ss0
ESP
ss0
ss0
EBP
ss0
ss0
ESI
ss0
ss0
EDI
ss0
ss0
= field is ‘static’
= field is ‘volatile’
IOMAP
ES
CS
SS
DS
FS
GS
LDTR
TRAP
= field is ‘reserved’
I/O permission bitmap
0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
68
72
76
80
84
88
92
96
100
Which to use: ‘ljmp’ or ‘lcall’?
• Use ‘ljmp’ to switch to a different task in
case you have no intention of returning
• Use ‘lcall’ to switch to a different task in
case you want to ‘return’ to this task later
• The CPU treats ‘ljmp’ and ‘lcall’ differently
in regard to the TSS, GDT and EFLAGS
No Task Reentrancy!
• Since each task has just one ‘save area’
(in its TSS), it must not not be permitted
for a task to be recursively reentered!
• The CPU enforces this prohibition using a
‘busy’ bit within each task’s TSS descriptor
• Whenever the TR register is loaded with a
new selector-value, the CPU checks to be
sure the task isn’t already ‘busy’; if it’s not,
the task is entered, but gets marked ‘busy’
Task-Nesting
• But it’s OK for one task to be nested within
another, and another, and another…
initial
TSS
LINK
LINK
LINK
LINK
lcall
TSS
#1
lcall
TSS
#2
lcall
TSS
#3
TSS
#4
current
TSS
TR
The NT-bit in FLAGS
• When the CPU switches to a new task via
an ‘lcall’ instruction, it sets NT=1 in FLAGS
(and it leaves the old TSS marked ‘busy’)
• The new task can then ‘return’ to the old
task by executing an ‘iret’ instruction (the
old task is still ‘busy’, so returning to it with
an ‘lcall’ or an ‘ljmp’ wouldn’t be possible)
Task-switch Semantics
Field
ljmp effect
lcall effect
iret effect
new busy-bit
changes
to 1
stays = 1
stays = 1
old busy-bit
changes
to 1
is cleared
new NT-flag
Is cleared
Is set to 1
no change
old NT-flag
no change
no change is cleared
is cleared
new LINK-field no change
new value
old LINK-field
no change no change
no change
no change
Task-Gate Descriptor
• It is also possible to trigger a task-switch
with a software or hardware interrupt, by
using a Task-Gate Descriptor in the IDT
D
type
P P 0
(=0x5)
L
Task-State Segment Selector
Task-Gate Descriptor Format
‘Threads’ versus ‘Tasks’
• In some advanced applications, a task can
consist of multiple execution-threads
• Like tasks, threads take turns executing
(and thus require ‘context-switching’)
• CPU doesn’t distinguish between ‘threads’
and ‘tasks’ – context-switching semantics
are the same for both
• Difference lies in ‘sharing’ of data/code
A task with multiple threads
TSS 1
TSS 2
Each thread has its own TSS-segment
supervisor-space (ring0)
user-space (ring3)
STACK 1
STACK 2
STACKS (each is thread-private)
heap
DATA 1
DATA 2
DATA (some shared, some private)
CODE 1
CODE 2
TEXT (some shared, some private)
Demo program: ‘twotasks.s’
• We have constructed a simple demo that
illustrates the CPU task-switching ability
• It’s one program, but with two threads
• Everything is in one physical segment, but
the segment-descriptors create a number
of different overlapping ‘logical’ segments
• One task is the ‘supervisor’ thread: it ‘calls’
a ‘subordinate’ thread (to print a message)
A thread could use an LDT
• To support isolation of memory-segments
among distinct tasks or threads, the CPU
allows use of ‘private’ descriptor-tables
• Same format for the segment-descriptors
• But selectors use a Table-Indicator bit
15
3 2 1 0
Descriptor-table index field
T
I
RPL
Format of a segment-selector (16-bits)
TI = Table-Indicator (0 = GDT, 1 = LDT)
RPL = Requested Privilege-Level
LDT descriptors
• Each Local Descriptor Table is described
by its own ‘system’ segment-descriptor in
the Global Descriptor Table
Base[31..24]
A
Limit
000 V
P
[19..16]
L
Base[ 15..0 ]
D
P 0 type Base[23..16]
L
Limit[ 15..0 ]
LDT Descriptor-Format
Type-field: the ‘type’ code for any LDT segment-descriptor is 0x2
In-class Exercise #1
• In our ‘twotasks.s’ demo, the two threads
will both execute at privilege-level zero
• An enhanced version of this demo would
have the ‘supervisor’ (Thread #1) execute
in ring 0 and the ‘subordinate’ (Thread #2)
execute in ring 3
• Can you modify the demo-program so it
incorporates that suggested improvement?
More enhancements?
• The demo-program could be made much
more interesting if it used more than one
subordinate thread, and if the supervisor
thread took turns repeatedly making calls
to each subordinate (i.e., ‘time-sharing’)
• You can arrange for a thread to be called
more than once by using a ‘jmp’ after the
‘iret’ instruction (to re-execute the thread)
In-class Exercise #2
• Modify the demo so it has two subordinate
threads, each of which prints a message,
and each of which can be called again and
again (i.e., add a jmp-instruction after iret):
begin:
; entry-point to the thread
...
iret
jmp begin