【计算机组成原理】备考笔记

2020-01-12 18:55:15 阅读：452 来源： 互联网

标签：DMA 计算机中断 cache 笔记备考 mapped bit CPU

简答题：

1.硬件控制和微程序控制各自的利弊

其实就是软件和硬件的权衡：软件会成本更低些、也更灵活、但没硬件那么快

What are the advantages and disadvantages of hardwired and microprogrammed control?

Solution:

The main advantage of hardwired control is fast operation. The disadvantages include: higher cost, inflexibility when changes or additions are to be made, and longer time required to design and implement such units.

Microprogrammed control is characterized by low cost and high flexibility. Lower speed of operation becomes a problem in high-performance computers.

2.DMA和中断驱动的方法的区别：

What is the difference between the DMA and interrupt-driven methods？（from

two ways to analyze: (1) What time should CPU response DMA request or

interrupt request? (2) Which work should CPU need to do when it acknowledges

DMA request or interrupt request?）

两个方面：

CPU响应时机：

dma：当DMA controller完成其传输任务并发送中断信号时

interrupt:当IO设备发出中断信号，并且该IO设备的中断信号没有被屏蔽时

1、中断方式是在数据缓冲寄存区满后，发中断请求，CPU进行中断处理。

2、DMA方式则是以数据块为单位传输的，在所要求传送的数据块全部传送结束时要求CPU进行中断处理，这样大大减少CPU进行中断处理的次数。DMA方式不需CPU干预传送操作，不占用CPU任何资源，整个数据传输操作在一个称为"DMA控制器"的控制下进行的。CPU除了在数据传输开始和结束时做一点处理外，在传输过程中CPU可以进行其他的工作。这样，在大部分时间里，CPU和输入输出都处于并行操作。因此，使整个计算机系统的效率大大提高。中断方式是程序切换，每次操作需要保护和恢复现场，中断次数多，CPU需要花较多的时间处理中断，中断次数多也会导致数据丢失。但是DMA是必须利用中断的，否则CPU无法得到数据已经传输结束,当数据传输开始结束的时候,需要给CPU一个中断信号，CPU进行处理.这个就大大的节约了CPU的资源。

3、中断方式的数据传送方向是由设备到CPU再到内存，或者相反。

DMA方式的数据传送则是将所传输的数据由设备直接送入内存，或是由内存直接送到设备。

DMA控制器向CPU发送中断请求，CPU执行中断服务程序做DMA结束处理，包括检验送入主存的数据是否正确，测试传送过程中是否出错(错误则转入诊断程序)和决定是否继续使用DMA传送其他数据块等。

3.子程序和中断服务程序；硬件控制单元利弊

What is the difference between a subroutine and an interrupt-service routine?

Solution: A subroutine is called by a program instruction to perform a function

needed by the calling program.

An interrupt-service routine is initiated by an event such as an input operation or a

hardware error. The function it performs may not be at all related to the program

being executed at the time of interruption. Hence, it must not affect any of the data or

status information relating to that program.

2. What are the advantage(s) and disadvantage(s) of hardwired control unit?

Solution: The main advantage of hardwired control is fast operation. The

disadvantages include: higher cost, inflexibility when changes or additions are to be

made, and longer time required to design and implement such units.

4.Cache的写策略：

For a write operation in a cache memory system, what is the difference between

write back and write through?

writeback：在cache中写入，替换回去时才更新内存的值

t：同时更新

利弊分析

5.行波加法器的问题，以及解决方案

Explain the drawback of the ripple carry full adder when you need to design

64-bit CPU core. And give a solution for the drawback. (Just give a name of the《计算机组成与体系结构》试卷

第 5 页共 10 页

circuit. You don’t need to draw a circuit of solution).

6.DRAM and SRAM的不同

What is the difference between DRAM and SRAM, in terms of characteristics

such as speed, size, cost and application?

DRAM

密、便宜、慢

SRAM

没那么密、贵、快

7.overflow表达式

8.中断过程描述

Describe the process of interrupt processing.

Solution:

9.RISC的特征

10.

1、About pipelining，①hazard , ②branch penalty , ③implement pipeline

11.

https://blog.csdn.net/sxhelijian/article/details/72356083

解答：

可以通过汇编语言编写相应的程序来实现，其中比较重要的几步为

1.修改中断后的入口地址

2.定义新的中断例程

3.装入新的程序

12.

A page fault is caused by a reference to a VM word that is not in physical (main) memory.

Servicing a Page Fault

Make space in memory by writing physical page to disk

Page Frames

Replacement policy?

Load page

Loading pages could waste processor time, use DMA

DMA allows processor to do something else

OS updates the process's page table

Desired data is in memory for process to resume

简而言之，先腾地方、再载入、之后更新页表。

课后题：

Chapter8

8.2. Each column address strobe causes 8 × 4 = 32 bytes o be transferred.

(a) Latency = 5 clock cycles or 12.5 ns

Total time = 5 + 8 = 13 clock cycles, or 32.5 ns.

(b) A second column strobe is needed to transfer the second burst of 32 bytes. Therefore:

Latency = 5 clock cycles or 12.5 ns

Total time = 5 + 8 + 2 + 8 = 23 clock cycles, or 57.5 ns.

1.存储器延迟（memory Latency）：传输块的第一个字所花费的时间；（RAS信号产生到第一个字开始传输）

2.burst信号：https://blog.csdn.net/zhong_ethan/article/details/84966615；脉冲长度决定一次CAS传几个块

3.RAS,CAS都会有延迟而且不一样，第一次后面的脉冲串传输只有CAS延迟，第一次两个延迟都有。

4.能并行传输的位数即为一个块的大小

5.MHZ的M是10的6次方

6.延迟公式 = RAS延迟 + CAS延迟 + 数据传输延迟（一般一个块一个时钟周期）+根据次数重复2、3

8.3. A 16M module can be structured as 16 rows, each containing eight 1M × 4 chips. A 24-bit address is required. Address lines A19~0 should be connected to all chips. Address lines A23~20 should be connected to a 4-bit decoder to select one of the 16 rows.

道理都懂，还是得亲手画一遍比较好，很多实现的细节才能记得。。

8.8. Each block contains 128 bytes, thus requiring a 7-bit Word ﬁeld. There are 16 sets, requiring a 4-bit Set ﬁeld. The remaining 21 bits of the address constitute the tag ﬁeld.

S：

1.注意看清楚是字还是字节！地址空间是按字寻址还是按字节寻址，这会影响到编码

2.注意审题，搞清楚要计算的物理量

8.10. For the first loop, the contents of the cache are as indicated in Figures 8.21 through 8.23. For the second loop, they are as follows.

a) Direct-mapped cache

b) Associative-mapped cache

c) Set-associative-mapped cache

In all 3 cases, all elements are overwritten before they are used in the second loop. This suggests that the LRU algorithm may not lead to good performance if used with arrays that do not fit into the cache. Performance can be improved by introducing some randomness in the replacement algorithm.

1.参考答题用的图

8.10. The two least-significant bits of an address, A1-0, specify a byte within a 32-bit word. For a direct-mapped cache, bits A4-2 specify the block position in the cache. For a set-associative-mapped cache, bit A2 specifies the set.

(a) Direct-mapped cache

(b) Associative-mapped cache

S：

1.32-bit word的意思是一个字的长度为32位。

2.要说明具体哪个地址位是干啥的。

(a) Direct-mapped cache

(b) Associative-mapped cache

8.12. The two least-signiﬁcant bits of an address, A1~0, specify a byte within a 32-bit word. For a direct-mapped cache, bits A4~3 specify the block position. For a set-associative-mapped cache, bit A3 speciﬁes the set.

(a) Direct-mapped cache

(b) Associative-mapped cache

S：

8.22

(a) The maximum number of bytes that can be stored on this disk is 24×14000×400×512 = 68.8×109 bytes.

(b) The data transfer rate is (400×512×7200)/60 = 24.58×106 bytes/s.

(c) We need 9 bits to identify a sector, 14 bits for a track, and 5 bits for a surface. Thus, using a 32-bit word b31…b0, a possible scheme is to use bits b8~0 for sector, b22~9 for track, and b27~23 for surface identification. Bits b31-28 bits would not be used.

S：

1.理解物理意义

Chapter9

指令设计：

个数组合

1.n个寄存器，代表有logn个地址位

寻址模式

1.relative adressing:ea = (pc) + x

2.indexed adressing ea = (r) + x

3.套路：

先确定各个功能字段的位数：操作码，寻址模式、操作数X

画出字段分布图

各个模式下，EA的表达式及范围

CACHE:

确定字段位数：

1.搞清楚主存一共多少位，通过寻址方式及相关容量信息确定

2.根据相连方式确定

直接相连：TAG,CACHE位置,OFFSET

全相连：TAG,OFFSET

组项链：TAG,SET,OFFSET

命中率：

对一个块的地址顺序访问，只有第一次读出第一个字时会产生miss，剩下的全是hit。

直接相连映射：

要考虑数据在内存中存储位置和CAche组数的关系

细节：

注意路数是一组内的块个数

计算大题：

1.预处理：先把所有的十进制数转为二进制数

2.输入信息：

找到几个重要的物理量：

计算机按什么寻址

内存大小多少（地址一共多少位）

每个块多大（OFFSET）

Cache多大

采用什么相连方式

按上面的物理量对每个地址进行字段处理

3.分析计算：

需要维护的几张表：

十六进制与二进制对应的那张表【初始化写POS\SET时会用到】

地址-位置-pass情况表or地址-SET-pass情况表【用于快速查找及记录】

Cache目前装载情况的表【用于展现当前状态】

用于LRU的优先矩阵【用于记录LRU块】

程序：

查访问地址对应的POS\SET中是否有特定地址（TAG）

如有，HIT栏✔，并且更新LRU优先矩阵

如果没有，HITX，并且根据是否替换确定载入，更新LRU矩阵

不用替换，在Cache表中写入，并更新LRU矩阵

替换，Cache中覆盖，更新LRU矩阵

注意：

1.同一地址序列第二次循环之后就会有一定的规律。

高速缓存块

大小影响

1.太大

根据访问局部性程度，有可能好（更少的miss）也可能坏（根本用不到那么多数据）。

miss时会花费更多时间传输数据

2.太小

同上面逻辑，可能好也可能坏

miss时惩罚时间更少

中断服务程序和子程序的区别：

1.产生原因:

子程序是由当前程序调用，是为了完成当前程序所需要的功能

中断服务程序是由一些诸如输入等事件触发产生的

2.和当前程序相关性：

子程序-大

中断-很可能无关，最好不要影响

多级中断

1.只有一条中断线的实现-通过操控interrupt-enable flags来实现

2.多条中断线，优先级队列

指令的执行和硬件流程模型的关联及画表表示

1.细节：

赋值的中间寄存器只有在下一个stage时才会更新，当前stage还是上一次的数据，原因和时钟周期有关，只有在一个时钟周期结束后才会做赋值。

用*来表示上次指令时留下的数据

单条指令时不考虑并行流水线导致的更新之后的再次变化，这里没有考虑流水线，流水线并行时需要考虑。

周期开始、结束所对应的寄存器的表是不一样的！

指令集如果Re地址不固定，还想用5step模式

两种解决方式

1.放在step2中，延长时钟周期，充分解析指令并取RF

2.放在step3，在此时一定可以读到RA\RB，然后把他发给ALU，同样时钟周期也要延长。

多指令流水线

1.十六进制和二进制的转换很容易出错，特别是还涉及一些运算时，计算时要细致，小心。

2.考虑没有指令之后，寄存器的值更新为未知。

3.IR中写指令的英文缩写即可

4.PC中是加4记得，不是加1

未使用操作数转发机制时

使用stall机制：

1.依赖指令除了要等数据准备好，还要花一个时钟周期来处理存储。

2.stall会导致其下面同一时间段的操作都被拉长（等待状态），以保证同步。

使用操作数转发机制

1.其时钟指令时序图都是一样的模板，除了操作数转发的一些箭头不一样以外

2.注意转发规则题目中是否有说，没有一般就是几个寄存器之间都可以转发

标签：DMA,计算机,中断,cache,笔记,备考,mapped,bit,CPU
来源： https://www.cnblogs.com/YiXinLiu617/p/12183437.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

【计算机组成原理】备考笔记