lesson22.ppt

What’s needed to transmit?
A look at the minimum steps
required for programming our
82573L nic to send packets
Typical NIC hardware
packet
main
memory
TX FIFO
buffer
B
U
S
CPU
nic
RX FIFO
transceiver
LAN
cable
Quotation
Many companies do an excellent job of providing information to help customers use their
products... but in the end there's no substitute for real-life experiments: putting together the
hardware, writing the program code, and watching what happens when the code executes.
Then when the result isn't as expected -- as it often isn't -- it means trying something else
or searching the documentation for clues.
-- Jan Axelson, author, Lakeview Research (1998)
Thanks, Intel!☻
• Intel Corporation has kindly posted details
online for programming its family of gigabit
Ethernet controllers – includes our 82573L
Our ‘nictx.c’ module
• We’ve created an LKM which has minimal
functionality – enough to be sure we know
how to ‘transmit’ a raw Ethernet packet –
but we do this in a forward-looking way so
that our source-code can later be turned
into a Linux character-mode device-driver
(once we’ve also seen how to write code
which allows our nic to ‘receive’ packets)
Access to PRO1000 registers
• Device registers are hardware mapped to
a range of addresses in physical memory
• We obtain the location (and the length) of
this memory-range from a BAR register in
the nic device’s PCI Configuration Space
• Then we request the Linux kernel to setup
an I/O ‘remapping’ of this memory-range to
‘virtual’ addresses within kernel-space
Tx-Desc Ring-Buffer
0x00
TDBA
base-address
0x10
0x20
TDH (head)
0x30
TDLEN
(in bytes)
0x40
0x50
0x60
TDT (tail)
0x70
0x80
= owned by hardware (nic)
= owned by software (cpu)
Circular buffer (128-bytes minimum)
How ‘transmit’ works
List of Buffer-Descriptors
descriptor0
descriptor1
descriptor2
descriptor3
0
0
0
0
Buffer0
Buffer1
Buffer2
We setup each data-packets that we want
to be transmitted in a ‘Buffer’ area in ram
We also create a list of buffer-descriptors
and inform the NIC of its location and size
Then, when ready, we tell the NIC to ‘Go!’
(i.e., start transmitting), but let us know
when these transmissions are ‘Done’
Buffer3
Random Access Memory
Allocating kernel-memory
• Our 82573L device-driver will need to use
a segment of contiguous physical memory
which is cache-aligned and non-pageable
• Such a memory-block can be allocated by
using the kernel’s ‘kzalloc()’ function (and
it can later be deallocated using ‘kfree()’)
• You should use the ‘GFP_KERNEL’ flag
(and we also used the ‘GFP_DMA’ flag)
NIC registers (for transmit)
enum
{
E1000_CTRL
E1000_STATUS
E1000_TCTL
E1000_TDBAL
E1000_TDBAH
E1000_TDLEN
E1000_TDH
E1000_TDT
E1000_TXDCTL
E1000_RA
};
= 0x0000,
= 0x0008,
= 0x0400,
= 0x3800,
= 0x3804,
= 0x3808,
= 0x3810,
= 0x3818,
= 0x3828,
= 0x5400,
// Device Control
// Device Status
// Transmit Control
// Tx-Descriptor Base-Address Low
// Tx-Descriptor Base-Address High
// Tx-Descriptor queue Length
// Tx-Descriptor Head
// Tx-Descriptor Tail
// Tx-Descriptor Control
// Receive-address Array
Device Control (0x0000)
31
30
29
R
PHY
VME
RST
=0
15
28
27
26
TFCE RFCE RST
14
13
R
R
R
=0
=0
=0
12
25
23
22
21
R
R
R
R
R
=0
=0
=0
=0
=0
11
FRC FRC
DPLX SPD
FD = Full-Duplex
GIOMD = GIO Master Disable
SLU = Set Link Up
FRCSPD = Force Speed
FRCDPLX = Force Duplex
24
10
R
=0
9
SPEED
8
=0
19
ADV
D3
WUC
7
R
20
6
S
L
U
R
=0
5
18
17
D/UD
status
4
R
R
=0
=0
3
R
R
R
=0
=0
=1
16
2
1
0
GIO
M
0
D
R
0=0
F
D
SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved)
ADVD3WUP = Advertise Cold Wake Up Capability
D/UD = Dock/Undock status
RFCE = Rx Flow-Control Enable
RST = Device Reset
TFCE = Tx Flow-Control Enable
PHYRST = Phy Reset
VME = VLAN Mode Enable
82573L
Device Status (0x0008)
31
?
30
29
28
0
0
27
0
26
0
25
24
0
0
23
0
0
22
0
21
20
0
0
19
18
GIO
Master
EN
17
0
16
0
0
some undocumented functionality?
15
0
14
0
13
0
12
0
11
0
10
PHY
RA
9
ASDV
8
7
6
I
S
L
SPEED
L
O
S
U
FD = Full-Duplex
LU = Link Up
TXOFF = Transmission Paused
SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved)
ASDV = Auto-negotiation Speed Detection Value
PHYRA = PHY Reset Asserted
5
0
4
TX
OFF
3
2
1
0
Function
ID 0
0U
L
F
D
82573L
Transmit Control (0x0400)
31
R
=0
30
R
=0
29
R
28
MULR
27
26
TXCSCMT
=0
15
14
13
12
25
UNO
RTX
11
COLD (lower 4-bits)
(COLLISION DISTANCE)
EN = Transmit Enable
PSP = Pad Short Packets
CT = Collision Threshold (=0xF)
COLD = Collision Distance (=0x3F)
24
RTLC
23
R
=0
10
0
9
22
21
20
18
17
16
COLD (upper 6-bits)
SW
XOFF
8
19
(COLLISION DISTANCE)
7
6
5
I
S
CT
L
TBI
(COLLISION
ASDV THRESHOLD)
SPEED
L
O
mode
S
U
4
3
P
S
P
2
1
0
R0
=0
0N
E
R
=0
SWXOFF = Software XOFF Transmission
RLTC = Retransmit on Late Collision
UNORTX = Underrun No Re-Transmit
TXCSCMT = TxDescriptor Minimum Threshold
MULR = Multiple Request Support
82573L
Tx-Descriptor Control (0x3828)
31
0
30
29
0
28
0
15
0
27
0
25
24
0
0
0
G
R
A
N
13
12
11
10
0
14
26
0
FRC HTHRESH
FRC
0
DPLX
SPD
(Host
Threshold)
23
22
0
0
9
8
21
20
19
18
17
16
WTHRESH
(Writeback Threshold)
7
I
L
0
O0
S
6
00
5
A
S
D
E
4
3
2
1
L
PTHRESH
R
0
00 00
(Prefetch
S Threshold)
T
“This register controls the fetching and write back of transmit descriptors.
The three threshhold values are used to determine when descriptors are
read from, and written to, host memory. Their values can be in units of
cache lines or of descriptors (each descriptor is 16 bytes), based on the
value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1,
all descriptors are written back (even if not requested).” --Intel manual
Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1)
0
An observation
• We notice that the 82573L device retains
the values in many of its internal registers
• This fact reduces the programming steps
that will be required to operate our nic on
the anchor cluster machines, since Intel’s
own Linux device driver (‘e1000e.ko’) has
already initialized many nic registers
• But we MAY need to bring ‘eth1’ down!
Using ‘/sbin/ifconfig’
• You can use the ‘/sbin/ifconfig’ command
to find out whether the ‘eth1’ interface has
been brought ‘down’:
$ /sbin/ifconfig eth1
• If it is still operating, you can turn it off with
the (privileged) command:
$ sudo /sbin/ifconfig eth1 down
Programming steps
1) Detect the presence of the 82573L network controller (VENDOR_ID, DEVICE_ID)
2) Obtain the physical address-range where the nic’s device-registers are mapped
3) Ask the kernel to map this address range into the kernel’s virtual address-space
4) Copy the network controller’s MAC-address into a 6-byte array for future access
5) Allocate a block of kernel memory large enough for our descriptors and buffers
6) Insure that the network controller’s ‘Bus Master’ capability has been enabled
7) Select our desired configuration-options for the DEVICE CONTROL register
8) Perform a nic ‘reset’ operation (by toggling bit 26), then delay until reset completes
9) Select our desired configuration-options for the TRANSMIT CONTROL register
10) Initialize our array of Transmit Descriptors with the physical addresses of buffers
11) Initialize the Transmit Engine’s registers (for Tx-Descriptor Queue and Control)
12) Setup the buffer-contents for an Ethernet packet we want to be transmitted
13) Enable the Transmit Engine
14) Give ‘ownership’ of a Tx-Descriptor to the network controller
15) Install our ‘/proc/nictx’ pseudo-file (for user-diagnostic purposes)
Legacy Tx-Descriptor Layout
31
0
Buffer-Address low (bits 31..0)
0x0
Buffer-Address high (bits 63..32)
0x4
CMD
CSO
special
Packet Length (in bytes)
CSS
reserved
=0
status
Buffer-Address = the packet-buffer’s 64-bit address in physical memory
Packet-Length = number of bytes in the data-packet to be transmitted
CMD = Command-field
CSO/CSS = Checksum Offset/Start (in bytes)
STA = Status-field
0x8
0xC
Suggested C syntax
typedef struct
{
unsigned long long base_address;
unsigned short
packet_length;
unsigned char
cksum_offset;
unsigned char
desc_command;
unsigned char
desc_status;
unsigned char
cksum_origin;
unsigned short
special_info;
} TX_DESCRIPTOR;
TxDesc Command-field
7
6
IDE
5
VLE
DEXT
4
reserved
=0
3
2
RS
1
IC
0
IFCS
EOP
EOP = End Of Packet (1=yes, 0=no)
IFCS = Insert Frame CheckSum (1=yes, 0=no) – provided EOP is set
IC = Insert CheckSum (1=yes, 0=no) as indicated by CSO/CSS fields
RS = Report Status (1=yes, 0=no)
DEXT = Descriptor Extension (1=yes, 0=no) use ‘0’ for Legacy-Mode
VLE = VLAN-Packet Enable (1=yes, 0=no) – provided EOP is set
IDE = Interrupt-Delay Enable (1=yes, 0=no)
TxDesc Status field
3
reserved
=0
2
1
LC
0
EC
DD
DD = Descriptor Done
this bit is written back after the NIC processes the descriptor
provided the descriptor’s RS-bit was set (i.e., Report Status)
EC = Excess Collisions
indicates that the packet has experienced more than the
maximum number of excessive collisions (as defined by
the TCTL.CT field) and therefore was not transmitted.
(This bit is meaningful only in HALF-DUPLEX mode.)
LC = Late Collision
indicates that Late Collision has occurred while operating in
HALF-DUPLEX mode. Note that the collision window size
is dependent on the SPEED: 64-bytes for 10/100-MBps, or
512-bytes for 1000-Mbps.
Bit-mask definitions
enum {
DD = (1<<0),
EC = (1<<1),
LC = (1<<2),
// Descriptor Done
// Excess Collisions
// Late Collision
EOP = (1<<0),
IFCS = (1<<1),
IC = (1<<2),
RS = (1<<3),
DEXT = (1<<5),
VLE = (1<<6),
IDE = (1<<7)
};
// End Of Packet
// Insert Frame CheckSum
// Insert CheckSum as per CSO/CSS
// Report Status
// Descriptor Extension
// VLAN packet
// Interrupt-Delay Enable
Ethernet packet layout
• Total size normally can vary from 64 bytes
up to 1536 bytes (unless ‘jumbo’ packets
and/or ‘undersized’ packets are enabled)
• The NIC expects a 14-byte packet ‘header’
and it appends a 4-byte CRC check-sum
0
6
destination MAC address
(6-bytes)
12
source MAC address
(6-bytes)
14
Type/length
(2-bytes)
the packet’s data ‘payload’ goes here
(usually varies from 56 to 1500 bytes)
Cyclic Redundancy
Checksum (4-bytes)
In-class exercises
• Modify the code in our ‘nictx.c’ module so
that it will transmit more than just one raw
packet when you install it into the kernel
• Can you also modify the ‘module_exit()’
function so that it will transmit a packet
before it disables the ‘Transmit Engine’?