This is a pseudo-mirror of www.bores.com DSP intro at http://www.bores.com/courses/intro/program/7_mops.htm [sic] as per the policy of this site.
This mirror was created from the source on 03/08/2000 at 22:46 Central Time.
Commercial information, time-dependent information, and all the damn JAVA scripts have been removed. All "htm" extensions have been changed to the proper "html" extensions. Has also been edited for
coherence, proper-linkage, and better document flow.
Copied without permission.
The development of efficient assembly language code shows how efficient a DSP processor can be: each assembler instruction is performing several useful operations. But it also shows how difficult it can be to program such a specialised processor efficiently.
temp = *c_ptr++) * *x_ptr--); a1 = *r3++ * *r4-- for (k = 1; k < N-1; k++) do 0,r1 temp = temp + *c_ptr++ * *x_ptr--) a1 = a1 + *r3++ * *r4-- *y_ptr++ = temp *r2++ = a1
Bear in mind that we use DSP processors to do specialised jobs fast. If cost is no object, then it may be permissible to throw away processor power by inefficient coding: but in that case we would perhaps be better advised to choose an easier processor to program in the first place. A sensible reason to use a DSP processor is to perform DSP either at lowest cost, or at highest speed. In either case, wasting processor power leads to a need for more hardware which makes a more expensive system which leads to a more expensive final product which, in a sane world, would lead to loss of sales to a competitive product that was better designed.
One example shows how essential it is to make sure a DSP processor is programmed efficiently:
The diagram shows a single assembler instruction from the Lucent DSP32C processor. This instruction does a lot of things at once:
All of these operations can be done in one instruction. This is how the processor can be made fast. But if we don't use any of these operations, we are throwing away the potential of the processor and may be slowing it down drastically. Consider how this instruction can be translated into MIPS or Mflops.
The processor runs with an 80 MHz clock. But, to achieve four memory accesses per instruction it uses a modified von Neuman memory architecture which requires it to divide the system clock by four, resulting in an instruction rate of 20 MIPS. If we go into manic marketing mode, we can have fun working out ever higher MIPS or MOPS ratings as follows:
80 MHz clock 20 MIPS = 20 MOPS but 2 floating point operations per cycle = 40 Mflops = 40 MOPS and four memory accesses per instruction = 80 MOPS plus three pointer increments per instruction = 60 MOPS plus one floating point register update = 20 MOPS ====================================================================== making a grand total MOPS rating of: 200 MOPS
Which exercise serves to illustrate three things:
Of course, we omitted to include in the MOPS rating (as some manufacturers do) the possibility of DMA on serial port and parallel port, and all those associated increments of DMA address pointers, and if we had multiple comm ports, each with DMA, we could go really wild...
Apart from a cheap laugh at the expense of marketing, there is a very serious lesson to be drawn from this exercise. Suppose we only did adds with this processor? Then the Mflops rating falls from a respectable 40 Mflops to a pitiful 20 Mflops. And if we don't use the memory accesses, or the pointer increments, then we can cut the MOPS rating from 200 MOPS to 20 MOPS.
It is very easy indeed to write very inefficient DSP code. Luckily it is also quite easy, with a little care, to write very efficient DSP code.
| Last updated: 13th January 1997 | http://www.bores.com/courses/intro/program/7_mops.htm