White Papers The Design of a Highly Efficient Multitasking Adapter Chip This paper describes a highly efficient multitasking I/O adapter chip design* which supports up to 225 current requests and 14,000 scatter and gather segments with minimum request processing overhead. The design can be easily adopted to different system busses such as PCI, VL, EISA, and ISA, as well as peripheral busses such as SCSI, 1394, and IDE. The Multitasking I/O Performance of Mainframe Computers I/O bottlenecking is frequent in PCs because the popular operating systems are DOS and Windows, which are known as single-tasking operating systems because they can only start one task at a time. By processing one task at a time, only one I/O operation can be started. Since a typical disk drive can perform about 80 random reads or writes, a computer with operating systems like DOS and Windows can get at most 80 disk I/Os per second. The mainframe folks understood this problem thirty years ago. They developed multitasking operating systems such as MVS from IBM and UNIX from AT&T. A multitasking operating system can start several tasks, each of which can initiate an I/O operation. When several I/O operations are directed to separate devices, the I/O throughput is doubled or tripled with two or three disk drives. 1.Porting of Mainframe I/O Design to Personal Computer A multitasking I/O adapter must have three important attributes: 1.receiving multiple I/O requests quickly 2.minimizing the data transfer time on the PC system bus 3.managing multiple I/O requests by sending them to I/O devices quickly Using VLSI technology, it is possible today to integrate a very high-performance I/O controller into a single integrated circuit (IC) and sell it for less than $20. Advanced System Products, Inc. (AdvanSys) has developed such a highly integrated multitasking adapter chip. (DIAGRAM) The ASC-1000 chip connects a PC system bus, PCI in this case, to a peripheral bus, SCSI. It has local memory where the adapter I/O control microcode as well as outstanding I/O requests are stored. A very fast RISC is responsible for initiating requests on the peripheral bus and performing necessary protocol handshakes. The FIFO matches the speed difference between the system and peripheral busses. The DMA function performs bus master data transfer. The PCI and SCSI Interfaces are unique for specific system and peripheral busses. Finally, the chip allows additional external memory if it becomes necessary. The very unique feature of this chip is the accessibility of its local RAM by the PC processor itself. A device driver program running on the PC processor is responsible for storing new requests to the chip's local memory even when the chip is busy working with an I/O device. Therefore, the device driver can store multiple requests at the same time. The greater number of I/O requests, the more flexibility in the multitasking processing. The external RAM can be used for storing even more concurrent requests. The accessibility of local memory allows the chip to receive new requests very quickly -- 25 microseconds for the ASC-1000 vs. 500 microseconds for a conventional chip. The DMA function transfers I/O data to the system bus in burst mode without intervention from the PC processor. 2.255 Concurrent Requests and 14,000 Scatter/Gather Segments Each I/O request occupies a 64-byte RAM area, a.k.a. Command Description Block or CDB, in which an I/O requests is specified. A typical disk drive needs less than 500 microseconds to receive a new request but needs 12 milliseconds to move its disk arm in order to get to the right place before data transfer. During this 12 milliseconds, the device frees the peripheral bus by getting off of it; the action of getting off is also known as disconnecting from the bus. The free bus is then available for starting another request to another I/O device. Many advanced SCSI devices, known as SCSI-2 devices can accept multiple requests to the same device which in turn will queue and sort those requests internally. When a device is ready to transfer data, it reconnects to the peripheral bus. Most data transfers take less than one millisecond. The ASC-1000 chip can accommodate 31 concurrent requests internally. Additional external memory is needed for up to 255 concurrent requests. In modern personal computers, a design concept called virtual memory is used. the application programs are mapped to disjointed physical memory pages only when they are active. Hence, the memory occupied by the application programs is virtual. Virtual memory is very important to multitasking operating systems because not all tasks are active at the same time; some of them are waiting for the completion of I/O operations. Instead of giving each application program its own dedicated memory, the computer memory is divided into equal size pages and given only those active tasks on demand. As an application program demands more memory, additional pages are allocated. Since the memory pages are allocated at different times, they are typically disjointed. Therefore, when writing to an I/O device, data must be gathered from different memory pages; when reading from an I/O device, data must be scattered to different memory pages. Without scatter and gather functions, reading and writing of each memory page requires a separate I/O request. Most conventional adapters, if they support such a function at all, allow for only a small number of memory pages to be specified. The ASC-1000 chip allows the whole local memory to be used to specify a memory page list, up to 14,000 scatter/gather pages. 3.Results: 600% I/O Throughput Improvement There are three attributes one should look for in an I/O adapter: 1.how fast a request can be started, 2.how fast data transfer takes place, and 3.how many concurrent tasks can the controller start? The first attribute can be measured by repeatedly reading a single sector from a disk drive. The reads transfer the same 512 byte of data repeatedly from the disk drive buffer, thereby measuring the I/O request starting and handshake times. Using a 50 MHz Intel 486 processor and a Maxtor MXT-540S SCSI drive, the ASC-1000 chip completes 1,896 operations per second, or needs 555 microseconds for each operation. Since the 512 byte data transfer takes about 55 microseconds on a 10 MHz SCSI bus, the Execution overhead of each request takes 500 microseconds. The speed of data transfer can be measured by repeatedly reading the same 128 Kbytes of data from the buffer. Using the same PC and disk drive, the ASC-1000 chip completes 72 operations per second, equivalent to 13.9 milliseconds per operations. Since the 10 MHz SCSI bus needs 13.1 milliseconds to complete 128 Kbytes of data transfer, the overhead for SCSI handshake is about 800 microseconds. Multiplying the 128K of data 72 times per second indicates that the SCSI bus is transferring 9.43 MB per second which is close to the limit of the bus itself. Finally, to measure the degree of multitasking of the ASC-1000 chip, a random read and write test is used. When one disk drive is used, the test measured 105 operations; when two disk drives are used, the test measured 207 operations. Finally, when seven disk drives are used, the test measured 679 operations. 4.Conclusion The ASC-1000 multitasking function has clearly demonstrated the power of concurrent I/O operations that was very dear to the heart of mainframe folks. With increasing processor power and faster storage devices, the PC industry has long awaited faster I/O transfer technology. The ASC-1000 design will enable a PC to deliver the same I/O performance of workstations and mini-computers. In fact, when there are multiple ASC-1000 chips in one system, a PC with multiple processors could even be as fast as a mainframe. * This design has three pending patent applications: 08/111191, 08/111192, and 08/111193.