Pipelining(1)

Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

관리 메뉴

수정입니다

Pipelining(1) 본문

전공/컴퓨터구조

Pipelining(1)

nongdamgom 2023. 12. 28. 12:34

Pipelining

overlapping execution

MIPS Pipeline

IF : Instruction fetch from memory
ID : Instuction decode & register read
EX : Execute operation or calculate address
MEM : Access memory operand
WB : Write result back to register

ex)

R-format : IF-ID-EX- X - WB
SW : IF-ID-EX-MEM - X -
LW : IF-ID-EX-MEM-WB => slowest

Pipelined -

- one stage must finish one clock -> clock sycle -> slowest stage

더 짧은 stage -> 동작 대기...

if all stages are balanced (all take the same time)

Time between instrctions(pipelined)

= Time between instrctions(nonpipelined) / number of stages

if not -> speedup is less

Speedup due to increased throughput.

Latency (time for each instruction) does not decrease

(it may actually increase...)

(속도가 빨라지는 건, 시간당 처리량이 많아져서지 instruction자체의 latency가 줄어드는건 아니라는 뜻)

MIPS ISA designed for pipelining => simple & easier...

All instruction are 32-bits (fetch&decode를 one cycle에)
Few and regular insturction formats(decode&read regi를 one cycle에)
load/store addressing(calculate address ->3rd stage, access memory -> 4th stage)
alignment of memory operands(memory access를 one cycle에)

Hazards => makes pipe-lining harder(or slower)

Structure hazards
Data hazard
Control hazard

Structure Hazards

conflict for use of a resource(Hardware)

만약에 memory가 1개 뿐이라면, instruction / data 두가지의 memory에 접근하는걸 한 cycle에서 진행할 수 없게 됨 -> stall , bubble (waiting for noe cycle and do nothing)

==> Hence, pipelined data paths require separate instruction/data memories.

(걍 메모리 두개 쓰면 된다는 뜻)

Data Hazards

an instruction depends on completion of data access by a previous instruction.

ex)

add $s0, $t0, $t1

sub $t2, $s0, $t3

왜 문제냐? 일단 이 instruction은 아래의 pipline을 탄다.

IF ID EX MEM WB --- (1)

IF ID EX MEM WB ---(2)

우선

(1)IF 에서 add 명령을 fetch후

(1)ID에서 regi $t0랑 $t1을 읽고 나서,

(1)EX 에서 두 register을 add 값이 계산 되고,

(1)MEM은 쉬고(R type이니까)

(1)WB stage에 와서야, 새로운 $s0의 값($t0값 + $t1값)이 저장된다.

그러나 사실은

(1)ID에서 $t0, $t1을 읽으면서 (2)IF에서 sub 명령을 fetch 한다.~~(그것이 파이프라인이니까)~~

즉, (2)ID에서는 $s0와 $t3를 읽어야 하는데, (2)ID를 할 시점에는 아직 $s0의 값이 갱신된 상태가 아니라서 old한 $s0의 값이 들어오고, incorrect answer을 도출하게 된다.

따라서 맞는 답을 내놓기 위해서는

IF ID EX MEM WB --- (1)

B B B B B --- stall

IF ID EX MEM WB ---(2)

B(bubble) 이와같은 wasting time이 생기게 된다 (stall)

* 틈새 질문 - WB과 과 ID가 같은 사이클이라고 해도, 뭘 먼저하는지는 어떻게 판단하나요?

= stage 반 갈라서 write는 왼쪽, read는 오른쪽에서 함(read속도가 매우 빨라 쓰자마자 읽기 가능, 즉 write -> read 순서로 진행하므로 $s0가 갱신 된 후 register를 읽을 수 있다.)

그러나? 우리는 stall을 최대한 줄여야한다.

속도 빠르게 하려고 pipeline을 쓰는건데, 기다릴 수는 없음

Forwarding (aka Bypassing)

use result when it is computed

stall을 없앨 수 있는 방법

add instruction에서 $s0에 저장 될 때까지 기다리지 말고,

EX단계에서 계산 된 값을, sub instruction으로 바로 넘기자!

IF ID EX MEM WB --- (1)

IF ID EX MEM WB ---(2)

즉, (1)EX 에서 계산한 $t0 + $t1의 값을 (2)EX의 한쪽 regi 값으로 전달하는 것.

sub $t2, $s0, $t3 --> sub $t2, ($t0 + $t1), $t3 이런 느낌

* 이렇게 inst들을 쭉 늘어놓는건 conceptual한 그림이고,

실제로는 어느 cct 지점에서, 그 시점에 돌고 있는 애들끼리 지지고 볶는거(multi clock cycle)

Load-Use Data Hazard

can't always avoid stalls by forwarding

ex)

lw $s0, 20($t1)

sub $t2, $s0, $t3

뭐가 문제일까, 앞에 것과 비슷하게 맞는 $s0를 가져올 수 없다는 점이 있다.

그럼 이것도 앞에거랑 비슷하게 계산 값을 넘겨주면 되는거 아니냐?!

IF ID EX MEM WB --- (1)

IF ID EX MEM WB ---(2)

(물론 이것도 그냥 stall 2개 써서 할 수도 있으나 그건.. 안좋다)

앞 예시와 차이점은, 이 예시는 lw를 사용해서 발생한 문제라는 것이다.

$t1의 20번째 주소에 있는 값을 가져오려면, EX에서 주소값을 계산 후, MEM 에서 그 주소값에 위치한 값을 읽어야 한다.

즉, 적어도 (1)MEM이 끝난 후에야 값을 얻을 수 있는데,

(2)EX 계산 전 그 값이 필요하기 때문에, stall을 피할 수 없게 된다.

IF ID EX MEM WB --- (1)

B B B B B --- stall

IF ID EX MEM WB ---(2)

(1)MEM에서 읽은 값을 (2)EX에 넘겨주면 stall 하나로 해결 할 수 있다.

* 물론 이것도 forwarding 이긴 하다. (without forwarding-> wait two cycle(two stall))

하지만 한 사이클을 반드시 기다려야 한다 -> 하드웨어적으로 해결 못함

Code scheduling to avoid stalls

Reorder code to avoid use of load result in the next instruction

그니까 애초에 명령 순서를 내가 hazard가 일어나지 않게 최대한 효율적으로 짜란 얘기

(lw 바로 뒤에 add 나 sub -> lw 먼저 몰아서 하고 나중에 받아오기 같은 식으로)

instruction sequence 계산 방법

: number of instruction + (number of stage - 1) + number of stall

Control Hazards

Branch determines flow of control
fetching next instruction depends on branch outcome

branch는 ALU에서 register 두개의 연산 결과가 0이면 1을 ,1이면 0을 zero에서 출력한다.

(두개의 register를 빼서, 0이면 0전달 -> 결과적으로 1이 zero에서 출력,

즉, 둘이 같으면 branch target address를 한다..... 아 갑자기 헷갈린다)

stall on Branch

Wait until branch outcome determined before fetching next instruction

그니까 ALU 연산 결과에 따라 next inst가 바뀌어서, 결정 될 때까지 기다려야함.

* 이 과정은 3 stall을 발생한다고 처음에 그랬는데 어떤 로직에 의해(?) 아무튼 1stall만 소비하고 next instruction을 결정할 수 있다고 함. 책이 그렇대 몰라 설명 자세히 안해줌

Branch predicton

Only stall if prediction is wrong

In MIPS pipeline

Can predict branches not taken
fetch instruction after branch, with no delay.

그니까 일단 무조건 branch 안하고 일단 다음줄 inst를 실행하는거

맞으면 stall 없이 가는거고 ... 50% 확률이지만, 100% stall이 발생하는 것보단 낫다는 논리

More-Realistic Branch Prediction

Static branch prediction

Based on typical branch behavior(what programmer usually do)
그니까 상식적으로 반복문 쓸 때, 반복문 안에 내용이 훨씬 많게 조건을 주지, 반대로는 안할거라는 얘기.. 즉 일반적인 행동 기반으로 예측
loop -> backward branches (for문 밖으로 branch하는 예측) taken
if -> forward branches(if문 body로 branch하는 예측) not taken

Dynamic branch prediction

Hardware measures actual branch behavior
Assume future behavior will continue the trend
각 conditional branch의 과거 기반으로 예측, 하드웨어에서 이걸 기억하고 있다가 트렌드를 살려서 예측한다...

Pipeline Summary

Pipelining improves performance by increasing insturction throughput

executes multiple instructions in parallel
each instruction has the same latency

Subject to hazards

Structure, data(write-use, load-use), control

Instruction set design(ISA) affects complexity of pipeline implementation

'전공 > 컴퓨터구조' 카테고리의 다른 글

Memory Hierarchy(4) - virtual memory (0)	2023.12.28
Memory Hierarchy(3) - Cache_2, Hamming code (0)	2023.12.28
Memory Hierarchy(2) - Cache_1 (0)	2023.12.28
Memory Hierarchy(1) - Disk access time (0)	2023.12.28
Pipelining(2) (0)	2023.12.28

'전공/컴퓨터구조' Related Articles

수정입니다

Pipelining(1) 본문

Pipelining(1)

'전공 > 컴퓨터구조' 카테고리의 다른 글

티스토리툴바