088768 - architetture avanzate dei calcolatori

088768 - ARCHITETTURE AVANZATE DEI CALCOLATORI
AA 2013/2014
http://home.deib.polimi.it/silvano/aac.htm
Prof. Cristina Silvano
email: cristina.silvano@polimi.it
Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
Politecnico di Milano
Goals of the AAC course



Provide an overview of the most recent and advanced
computer architectures
Introduce the basic micro-architectural mechanisms
found in modern microprocessor architectures
Provide the reasoning behind the adoption of advanced
computer architectures
Cristina Silvano – Politecnico di Milano
-2-
Advanced Computer Architectures:
IBM Blue Gene P Supercomputer
Cristina Silvano – Politecnico di Milano
-3-
Advanced Computer Architectures:
Smart phones
Cristina Silvano – Politecnico di Milano
-4-
Advanced Computer Architectures:
Intel® Core™ i7-3770T Processor
(Nehalem, up to 3.70 GHz)
160mm² die @ 22nm
1.40 billion transistors.
Cristina Silvano – Politecnico di Milano
# of Cores
4
# of Threads
8
Clock Speed
2.5 GHz
Max Turbo Frequency
3.7 GHz
Intel® Smart Cache
8 MB
Instruction Set
64-bit
Instruction Set Extensions
SSE4.1/4.2, AVX
Embedded Options Available
No
Lithography
22 nm
Max TDP
45 W
Recomm. Customer Price
TRAY: $294.00
Max Memory Size
32 GB
Memory Types
DDR3-1333/1600
# of Memory Channels
2
Max Memory Bandwidth
25.6 GB/s
ARM Cortex-A8 core processor
in Apple A4 System-on-Chip



Based on the ARMv7 architecture
It’s a dual-issue in-order execution design
The Apple A4 at 1 GHz (45nm manufactured by Samsung from March
2010 to present), a System-on-Chip that combines an ARM Cortex-A8
and a PowerVR GPU, is in the:
• Original iPad, April 2010
• iPhone4: June 2010 (Black; GSM), February 2011 (Black; CDMA),
April 2011 (White; GSM & CDMA)
• iPod Touch (4th generation): September 2010 (Black model),
October 2011 (White model)
• Apple TV (2nd generation): Sept. 2010
6
ARM Cortex-A9 MP core processor in
Apple A5 System-on-Chip



Based on the ARMv7 architecture
It’s a dual-issue in-order execution design
The Apple A5 at 1 GHz (45nm to 32 nm manufactured by Samsung
from March 2011 to present), a System-on-Chip that combines a dual
core ARM Cortex-A9 with NEON SIMD accelerator and a dual core
PowerVR GPU, is in the:
• iPad 2 (A5 dual-core 45 nm) – March 2011; (A5 dual-core 32 nm) –
March 2012
• iPhone 4S (A5 dual-core 45 nm) – October 2011
• Apple TV 3rd generation (A5 single-core, 32 nm) – March 2012
• iPod Touch 5th generation (A5 dual-core 32 nm) – October 2012
• iPad Mini (A5 dual-core 32 nm) – November 2012
7
Apple A6 System-on-Chip




Apple A6 SoC was introduced on Sept. 2012 for the iPhone 5
Apple states that it is up to twice as fast and has up to twice the
graphics power compared to its predecessor the Apple A5
The A6 uses a 1.3 GHz custom Apple-designed ARMv7 based dual-core
CPU, called Swift, and an integrated triple-core PowerVR SGX 543MP3
GPU.
The A6 chip for iPhone 5 incorporates 1GB of LPDDR2-1066 RAM and
provides double the memory capacity of iPhone4S while increasing the
theoretical memory bandwidth from 6.4 GB/s to 8.5 GB/s.
8
Apple A6 System-on-Chip






ARMv7s ISA dual core
Triple-core PowerVR
SGX 543MP3 GPU
1MB L2 cache
1.3 GHz
32nm Samsung
96.71mm2 (22% smaller
than A5)
Cristina Silvano – Politecnico di Milano
-9-
Moore’s Law (1965) says that the numbers of
transistors on a processor will double every
18 to 24 months
Cristina Silvano – Politecnico di Milano
- 10 -
Stopper: Max. Clock Freq. Wall



Chip density is
continuing increase
~2x every 2 years
Clock speed is not
Expose parallelism in a
coarser level than ILP
Source: Intel, Microsoft (Sutter) and
Stanford (Olukotun, Hammond)
Cristina Silvano – Politecnico di Milano
- 11 -
Stopper: On-Chip Temperature Wall
Cristina Silvano – Politecnico di Milano
- 12 -
Paradigm shift : Multi-core architectures
ARM 9
180 nm
11.8 mm2
130 nm, 5.2 mm2
90 nm, 2.6 mm2
65 nm
1.4 mm2
Source: STMicroelectronics
Intel 80 core
Cristina Silvano – Politecnico di Milano
- 14 -
NVIDIA Fermi GPU
Cristina Silvano – Politecnico di Milano
- 15 -
NVIDIA Tesla GPU
Kepler GK110 Architecture
• 7.1B Transistors
• 15 SMX units (2880 cores)
• >1TFLOP FP64
• 1.5MB L2 Cache
• 384-bit GDDR5
• PCI Express Gen3
Cristina Silvano – Politecnico di Milano
- 16 -
Dark Silicon Problem
DARK SILICON : chip fraction not
usable due to the power budget
Processor frequency is affected by
technology effects (e.g. Vth)
AAC Course Schedule



Schedule: First Semester 2013-2014 (FALL 2013)
WEDNESDAY 10.15 - 12.15
Location: L.26.11 Leonardo Campus
THURSDAY 10.15 - 12.15
Location: L.26.16 Leonardo Campus
Cristina Silvano – Politecnico di Milano
- 18 -
Contact Information



Office hours for students:
Tuesday 10.00 - 11.00 at DEIB, Via Ponzio 34/5 First floor –
Internal phone number: 3692 (better to send an email to get
an appointment).
Main Contact:
The students can contact prof. Cristina Silvano by
e-mail (cristina.silvano@polimi.it)
by indicating: Subject: AAC COURSE Milano,
Your_Surname, Your_Name, Your_POLIMI_ID_NUMBER
Please use your POLIMI student e-mail account:
name.surname@mail.polimi.it
Cristina Silvano – Politecnico di Milano
- 19 -
AAC Teaching Assistants


Prof. Giovanni Agosta
e-mail (giovanni.agosta@polimi.it)
Prof. Gerardo Pelosi
e-mail (gerardo.pelosi@polimi.it)
Cristina Silvano – Politecnico di Milano
- 20 -
AAC Course Info


Teaching Activity: The course consists of 5 CFU and it is
organized in 30 hours of lectures and 20 hours of
written/tool-based exercises to prove the concepts
presented during the lectures.
Pre-requirements: Basic concepts on logic design and
computer architectures.
Cristina Silvano – Politecnico di Milano
- 21 -
AAC Final Exam


FINAL EXAM:
The final exam consists of a written exam. For each
written exam, a max. score of 33 points will be
assigned: max. 15 points will be assigned for the
solution of the exercise part and max. 18 points will be
assigned for answering to the theory part.
It is possible to ask an OPTIONAL project to the
instructor. The project must be concluded by January
31st, 2014 (firm deadline). The project assign an
additional score up to max 5 points. The additional
points given by the project will be added to the score of
the written exam only if the final score of the written
exam will be sufficient (>=18).
Cristina Silvano – Politecnico di Milano
- 22 -
AAC Teaching Material

Additional information in slides and papers available
through the course webpage:
http://home.dei.polimi.it/silvano/AAC.htm
•

If you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct
visualisation and printing of the PDF SLIDES, please use the SAVE AS
option and save the PDF FILE on your laptop for correct
visualisation and printing.
Reference Book: "Computer Architecture, A Quantitative
Approach", John Hennessy, David Patterson, Morgan
Kaufmann, Fourth Edition.
Cristina Silvano – Politecnico di Milano
- 23 -
Support for the international students





AAC course is offered in Italian
Teaching materials (slides/papers/textbook) available in
English
Final exam can be done in English
Teaching support available in English
Please notice international students can follow the
course HPPS (High Perfomance Processors and System)
held by prof. Donatella Sciuto during the Second
Semester 2013 - 2014. HPPS Course is completely offered
in English. AAC course objective and program are aligned
with HPPS course.
Cristina Silvano – Politecnico di Milano
- 24 -
March 2013
Overview of the AAC topics

How to increase performance while decrease the design cost ?
•
•

Can we gain more ?
•
•
•
•

RISC: Reduced Instruction Set Computer
Pipeline
Branch prediction
Instruction Level Parallelism (ILP)
Multithreading
Multiprocessors
Still performance does not scale ?
•
•
Memory hierarchy
Cache organization
Cristina Silvano – Politecnico di Milano
- 25 -
Main lectures topics (1)




Review of basic computer architecture definitions and components
(Central Processing Unit, Memory System, Input/Output Interfaces,
Communication System)
Basic performance evaluation metrics of computer architectures
Memory hierarchy: Basic and advanced concepts. Multi-level caches.
Performance evaluation, optimisation techniques.
Central Processing Unit: the RISC approach (Reduced Instruction Set
Computer).
Cristina Silvano – Politecnico di Milano
- 26 -
Main lectures topics (2)

Techniques for performance optimization:
• Pipelining: The problem of hazards: structural, control and data
hazards; Optimization techniques to solve the problem of
hazards
• Branch prediction techniques: Static and dynamic branch
prediction techniques
• Speculative execution
Cristina Silvano – Politecnico di Milano
- 27 -
Sequential vs. Pipelining Instruction Execution
I1
IF
ID
EX
I2
MEM
WB
IF
10 ns
Cristina Silvano – Politecnico di Milano
ID
EX
10 ns
- 28 -
MEM
WB
…
Main lectures topics (3)

Instruction Level Parallelism (ILP):
• Static and dynamic scheduling;
• Superscalar architectures;
• VLIW (Very Long Instruction Word) architectures;
Cristina Silvano – Politecnico di Milano
- 29 -
Instruction Level Parallelism:
Example of 2-issue processor
I1 I
1
I2 I
2
I3
I4
I5
I6
I7
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
2 ns
IF
ID
EX
MEM
IF
Time
WB
Instruction Per Clock = 2
CPI = Clock Per Instruction = 0.5
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
2 ns
2 ns
I8
I9
2 ns
I10
Cristina Silvano – Politecnico di Milano
- 30 -
Beyond ILP: Multithreading
Threads: Independent sequences of instructions
…
Single-threaded program
Multi-threaded program
Main lectures topics (4)

Beyond ILP:
• Multithreading (Thread Level Parallelism – TLP)
• Multiprocessors and multicore systems: taxonomy,
topologies, communication management, memory
management, cache coherency protocols, example of
architectures
• System-on-Chip and Network-on-Chip architectures;
Digital Signal Processors; Stream processors and
vector processors; Graphic Processors
Cristina Silvano – Politecnico di Milano
- 32 -