The final exam will be given on Monday, December 20th from
3:00-6:00 PM in DCC308. The final will be closed book, no notes or
computers are allowed.
An IA32 Instruction reference will be distributed with the
test, as will a description of the Y86 instruction
set and HCL reference, and figure 4.21.
Roughly 50% of the test will be on material covered in class
since test #2 (Y86 instruction set, Y86 sequential implementation, HCL
general pipelining and some Y86 pipeline issues, optimization and memory. Below are a list
of topics from these last sections.
The comprehensive part of the test will include material
described here and from test#1 and
test#2. You should expect lots of
short questions for this part of the test.
Topics covered since Test #2
| Topic |
What to know |
What to expect on the test |
|
| Processor Architecture - Sequential |
Y86 Instruction Set
HCL
Memory and Clocking
Stages of Y86 Instructions:
- Fetch
- Decode
- Execute
- Memory
- Write Back
- PC Update
HCL for Y86 Sequential Implementation
|
Be able to rewrite some IA32 code segments using only
the Y86 instruction set
Be able to describe some simple combinational circuits
using HCL.
Be able to describe the operation of simple memory elements
(registers), specifically how a value is read from a register and
how a new value is stored in a register (what signals are necessary,
when do things happen, etc.)
Be able to describe what happens during each of the 6 stages
for any Y86 Instruction, including new instructions (same idea as
the homework assignment).
Given Figure 4.21 and a list of controls, derive the HCL
expressions that describe each control to support some set
of instructions (or modify a given set of HCL expressions).
|
|
| Pipelining and a Y86 Pipeline |
Pipelining and it's impact on performance.
Pipeline registers
Data Hazards
Control Hazards
Y86 Pipeline
- conditional branching issues
- why is RET special?
|
Given a description of a sequential implementation of an
instruction set, and a pipelined version, be able to discuss the
potential performance improvements.
Be able to discuss data and control hazards (general issues).
Be able to identify data and control hazards in a sequence of
Y86 instructions (assuming the pipeline described in the Text).
Be able to discuss ways to (attempt to) avoid stalling the pipeline given a
a sequence of Y86 instructions (forwarding).
|
|
| Code Optimization |
Loop inefficiencies.
C function calls (and why compilers don't
attempt to reduce the number of function calls).
Avoiding memory references
|
Be able to predict the relative performance of
two code segments (C or assembly language).
Given some C/IA32 code, be able to make some
improvments and discuss the impact on performance.
Given a simple loop, be able to do some (simple) loop
un-rolling
Given some code, be able to discuss what optimizations
are possible, and which if these could be handled by a
compiler.
|
|
| Measuring Execution Time |
user, system and elapsed time.
Timer Interrupts
Interval Counting
Cycle Counting
Caching, system load and measuring time
K-best measurement scheme
|
Be able to define user, system and elapsed time.
Be able to compare using interval counting vs. cycle
counting to measure execution time.
Understand how the system load and/or caching can effect
execution time measurements. Be able to discuss how to predict
execution time even in environments where cache and load effects
are not under your control.
Be able to describe/simulate the k-best measurement scheme
for making execution time measurements.
|
|
| Memory Hierarchy and Caching |
Basic concepts related to the access time of
DRAM, SDRAM and hard disks.
General issues related to memory hierarchies
Principles of Temporal and Spatial Locality
Cache Architecture
- Blocks and Slots
- Direct Mapping
- Set-Associative
- Tag and Valid bits
- Replacement Policy
- Write Policy
Cache Friendly Code
- Memory access patterns
- The "Memory Mountain" (sect 6.6.1)
|
Be able to describe the timings involved in accessing
SDRAM, DRAM and hard disk.
Be able to describe the principles of locality, and
why these matter to cache designers
Be able to determine the size of a cache (total number of
bits of storage required) given a cache design.
Be able to compare two versions of some code in terms of
memory access patterns and which would be better if a cache is
used.
Be able to discuss the tradeoffs of implementing various
cache designs (direct-mapped vs set assoc., replacement and write policies).
Be able to make some predictions about the performance of
code given specific cache designs (code like the code used to produce the
"memory mountain".
Be able to describe how changes to the pattern of memory
accesses can have a dramatic impact on performance (look at lab 12!)
|
|
| Virtual Memory |
- Physical vs. Virtual Addresses
- Motivation:
- Caching of pages
- Memory Management (linking,loading,sharing)
- Protection
- Address Translation (Page Tables)
- Translation Lookaside Buffer
- Multi-level Page Tables
|
Be able to determine page table sizes, address sizes,
etc. Look at practice problems 10.1 and 10.2
Understand what a page fault is, and how it is
resolved.
Be able to discuss how temporal and spatial locality are
exploted by Virtual Memory
Be able to discuss why linking and loading of programs is
simplified by Virtual Memory
Understand how and why Virtual Memory provides memory
protection (so one process can't read the memory used by another).
Be able to describe a page table, including where it is
physically located, what the indicies are and what is held in the
page table.
Be able to describe the need for TLB, and talk about the
perfomance implications of having a TLB or not.
Be able to discuss the advantages/disadvantages of caching
virtual addresses vs. caching physical addresses.
|
Practice Problems
Practice problem 4.3 (Y86 Instruction Set)
Practice Problem 4.8 (HCL)
Practice Problems 4.9,4.10,4.11,4.12,4.13 (Y86 Instructions)
Practice Problem 4.14,4.15,4.16,4.18 (HCL)
Practice Problem 4.21 (Pipelining)
Practice Problem 4.22 (Pipeline Performance)
Identify the hazards in the following code, and show any
stall cycles that need to be inserted in order to avoid all the hazards
irmovl $100,%edx
addl %edx,%eax
rmmovl %eax,-12(%ebp)
addl %eax,%eax
jle foo
rrmovl %edx,%eax
foo:
Problem 5.14 (Loop Unrolling)
Problem 5.19 (Performance & Optimization)
Practice Problem 6.3 (Disk Access Time)
Practice Problem 6.5 (Spatial Locality)
Practice Problem 6.6 (Cache Design)
Practice Problems 6.15,16,17 (also in Lecture Notes)
Practice Problems 6.18,19 (Memory Mountain)
Practive Problems 9.3, 9.4 (Timing Measurements)
Practice Problems 10.1, 10.2 (Virtual memory)