Post by mkHi,
Does anyone have any examples of using AWARITH_MULTP, AWARITH_TREE and
AWARITH_ADD in a design ? I am trying to generate a mutliply add tree
where only one CPA is used but the documentation is not very clear on
how to connect the modules. I know that the outputs of the N MULTPs
should go through a TREE and finally come of an ADD but the
connectivity is a little confusing.
Thanks.
Hi,
I'm a Synopsys user, so I can't tell exactly for Cadence, but the math
should remain the same.
The most efficient way is to size each MULTP by it's own, if possible. Don't
try to shorten your code by using arrays of resized operands: you'll end up
with to big MULTPs and the synthesis tool may fail to reduce it's size
afterwards. Another note in Synopsys documentation is that the smallest
operand or constant should be the second operand for maximal efficiency
(small difference for small size differences, noticeable for constants).
Next step is to sum all outputs. You first need to know the maximum sum of
products your system can generate. All you need to do is resize the
(registered) MULTP outputs to the final size and pass them through the tree.
Finally, you need to add those 2 final terms and you'll have the final
result.
With the Synopsys' MULTP there is one caveat: the partial product terms are
not width1 + width2 in size, but width1 + width2 + 2. This is very important
when you need to resize!
Let me give an example:
Suppose you want to calculate the following unsigned expression:
RES = a[4:0] * b[2:0] + c[3:0] * d[4:0] + e[5:0] * 5
The ranges for the results are the following:
MULTP(a,b) : 8 (= 5 + 3) bits (10 if Synopsys)
MULTP(c,d) : 9 (= 4 + 5) bits (11 if Synopsys)
MULTP(e,5) : 9 (= 6 + 3) bits (11 if Synopsys)
RES, min : 0 (all unsigned, no offsets)
RES, max : 997 (= 31 * 7 + 15 * 31 + 63 * 5)
RES : 10 (=ceil(log2(997)) bits unsigned
The hardware would then look as following (with R? an optional register if
you want to pipeline):
+------------+ +---+
[a>->o in1 out1 o->[R?]->[resize 10]->o |
[b>->o in2 out2 o->[R?]->[resize 10]->o C |
+------------+ | O |
MULTP(5*3,U) | N |
| C | +----------+ +---+
+------------+ | A | | out1 o->[R?]->o A |
[d>->o in1 out1 o->[R?]->[resize 10]->o T o->o in | | D
o->[RES>
[c>->o in2 out2 o->[R?]->[resize 10]->o E | | out2 o->[R?]->o D |
+------------+ | N | +----------+ +---+
MULTP(5*4,U) | A | TREE(10,5*)
| T |
+------------+ | E |
[e>->o in1 out1 o->[R?]->[resize 10]->o |
(5)->o in2 out2 o->[R?]->[resize 10]->o |
+------------+ +---+
MULTP(6*3,U)
Since I don't know the Cadence tools, I don't know what this Synopsys
feature would translate to. In Synopsys, there is an option called "Module
Compiler" which will do a magnificent job on plain arithmetic operators (it
will build csa trees for you, optimize constants, merge the operands). This
comes at a price though: formal verification of such transformations is hard
to impossible (depends on how much is merged and how big the operands are).
Now, if you don't meet timing, you can resort to the MULTP-TREE solution
with registers from above. You could also decide to live on the wild side
and place some registers in your code and let the tool retime them for you
(so far, we haven't gotten stable results from this, so it's tricky; in
theory, formal verification should be able to cope with this, but didn't
try it yet).
I hope this helps.
Kind regards,
Alvin Andries.