
Magic Layout and other files to eventually become a dotools tutorial.

fa -  one-bit full adder cell implemented as compound gates,
	based on Figure 8.3(b) of Weste & Eshraghian, Principles of CMOS
	VLSI Design, 2nd edition.

add8 - assembly of fa cells to build an 8-bit ripple-carry adder

Outline of tutorial text

Designing a block of logic with the dotools.

- Decide what you want to build.  Sketch a transistor-level schematic.
For this tutorial, the schematic is taken from Weste & Eshragian, page
517.

- Do the layout in magic for the smaller cells.  Here, layout for a
full-adder cell is in fa.mag.

- Even before you've finished the layout, you may want to start
simulating the logic with irsim.  After completing the carry portion
of the fa.mag only, I extracted started simulating.  We'll create the
input file fa.carry.in to check just the carry part.  Notice how it specifies
a set of inputs and outputs, and the expected output values for all possible
combinations of input values.

- Right away, create a Makefile by copying and modifying existing
dotools Makefile.  You'll be running doirsim (and soon, dospice) a
lot.  The example Makefiles all require GNU make (gmake).  Notice how
this rule allows running any vector set against the "fa" cell: This is
just one of the reasons for abandoning all inferior makes in favor of
gmake.

fa.%.ok: fa.%.in fa.sim
	$(DOIRSIM) fa -t $* -G $(DOIRSIM_FLAGS)

- Check cell's logic by typing "gmake fa.carry.ok"

- Finish the logic, and veryify the whole thing.  fa.sc.in checks both
the sum and carry outputs of the fa cell.

At this point, I notice that the layout is logicaly correct, but very
ineficient space-wise.  As I rearrange things to reduce the area, I
periodicly extract and resimulate.  It only takes a few seconds to
verify that I haven't screwed anything up.  In practice, you may want
to defer any space-optimization until later, after you've verified
that the thing is fast enough.

Next, I assemble a row of the fa cells into a multi-bit adder.  I've
arranged that the carry input and output of the cell appear on the
left and right sides, respectively, so we just abut the cells together.

Now, we create add8.short.in to test the 8-bit ripple carry adder on a
few cases.  All you have to do is dream up enough tests to be sure
that your logic is perfect.  Since we've exhaustively tested the
full-adder cell, we don't really need to test everything here, but the
ideal test (best coverage in fewest vectors) is always a difficult
problem, which we defer to later.


So, the logic works.  But is it fast enough?  We'll use the HSPICE
analog simulator and the dospice tool to help us find out.  Unlike
logic simulation, "fast enough" depends on what other drives the
inputs to our adder and what downstream inputs our logic is expected
to drive.  Since we haven't built the rest of the system, we'll guess,
and then come back later.

Suppose that our adder is to come right after a register built out of
standard flip-flops. 

Add some ETFO timing specifications to the inputs and outputs in
add8.short.in.    For the inputs, we'll use "ETFO=4+0,f,etff"
which means:
	"Fanout of 4, 0ns extra delay, falling-edge signal named etff"

We note that each of the A and B inputs to the full adder cell must drive
8 width-8 transistors.  We'll guess that the standard flip-flop's output
has width-16 transistors, so this is a fanout of 4.  

For the outputs, we'll assume that they will drive a fanout of 2, and
we'll allow an extra nanosecond of propagation delay.


After adding these rules to our Makefile:

%.cmd: %.in
	in2cazm -P $^ > $@

add8.%.W.ok: add8.%.cmd add8.fa
	$(DOSPICE) add8 W -t $* -c -G $(DOSPICE_FLAGS)


We can simply say "gmake add8.short.W.ok" Lots goes on behind the
scenes here.  First a command file add8.short.cmd is built, which is
an abstract spice-like command syntax.  (take a look at the file)
Then, dospice does the following:
 - convert add8.short.cmd into actual spice syntax by substituting
   in actual waveforms for the inputs, adjusting them according to the
   ETFO fanout and delay parameters.
 - HSPICE is run, producing output waveforms in add8.short.W (cazm-like
   tabular format, usable with sigview) and add8.short.W.tr0 (hspice's
   goofy format, usable with gsi and metawaves)
 - a tool called cazm2out samples the analog waveforms back into digital
   bits, and notes if the signals transistioned on time.
   "on time" is defined as the time that an equivalent input waveform
   would have transitioned to its final high or low value relative to
   the system clock.   This produces a the file add8.short.W.out.
 - a tool called outchk compares add8.short.W.out with the expected
   digital values in add8.short.in, and produces add8.short.log.
   If the digital values are correct, and there were no timing
   violations,  the .log file is renamed to add8.short.ok.


In this case, I've picked wildly optimistic timing numbers (even for
rather hot hpcmos14 process), and I get a slew of timing violations,
but the data values are correct.  It takes about 6.5ns for bit S7 to
change after a change in B0 that cause the carry to ripple all the way
down.  If we're going to feed these output right into a set of
flip-flops, and the system clock is 100MHz, this is probably good
enough, so we can just make the timing numbers more realistic and
we're done.  For example, using ETFO=2+8,f,etff for the Sum and carry
outputs makes it pass, and is probably realistic.

More likely, you'll need to change something to meet the timing
specification.  Lets try speeding things up a little.  First, we'll
set up to run dospice on the full adder cell alone.  The critical
path through fa is the carry-in to carry-out delay; we'll create a
separate .in file called fa.ccrit.in to exersise just this path.

One of the hardest parts of this whole business is knowing which
transistors to tweak the size of to affect the speed of the whole
circuit.  We'll take some cues from Weste & Eshragian, and increase
the size of C transistors in the carry gate to overcome the effects of
stray capacitance.  We'll also decrease the size of the C transistors
in the sum gate to reduce their loading on the carry path.

We notice that the transition where one of A and B are high and
Carry-in falling makes carry-out fall is slow.  We'll try making
the pfet on C in the carry gate 50% larger.

Each iteration of edit-extract-dospice on the fa cell with only a few
critical-path vectors takes only a few seconds.  When we think we've
got things fast enough, we rerun add8.short.W.ok and see how we did
overall.

-----------------------------------------------------------------------------

Doverilog 

Doverilog is a tool for testing a verilog module against vectors
contained in a .in file.

The simplest usage of doverilog is illustrated with the command
"gmake aoi.vok" which typicaly ends up running this command:
	
	doverilog --verilog verilog --nopli -v aoi aoi.v

This simulates the verilog module aoi.v using the vectors in the file
aoi.in.  Note that the same vectors can be applied to extracted layout
using a command like "make aoi.ok" or "aoi.W.ok".

The command-line option "--verilog verilog" specifies to use the
underlying verilog simulator named "verilog", which is assumed to take
command-line options in the same syntax as Cadence Verilog-XL.  The
"--nopli" option specifies not to use special verilog PLI routines for
reading the .in file vectors, but instead to generate a verilog "runtest"
module that contains verilog code to apply each vector.

In doverilog, input signals (in the "inputs:" section of the .in file)
Must be input ports to the verilog module.  This is not true of
outputs and monitored signals; they can be internal wires inside the
verilog module and need not be module ports.


Verilog modules containing bus ports require special care in the .in
file.  Dospice and doirsim run on flat transistor netlists, where
there are no such thing as busses, simply netnames that may contain [
and ] characters to suggest sets wires that are related somehow.
But in verilog, busses are different kinds of objects.

This is illustrated in files bus2.v and bus2.in.  Note this line:
	inputs:
	iBus1[{1:0}] vport=iBus1

The {1:0} still means textual expansion, and the square brackets
aren't special.  This declares a set of inputs iBus1[1] and iBus[0].
When applied to dospice, these are the netnames for the inputs.  But
doverilog parses the additional attribute "vport=iBus1" which means that 
in verilog, this set of signals is in fact connected through a bus port named
iBus1.   


The vport attribute may also be specified for outputs, but is not required.
Both of these are legal:

	outputs:
	oBus1[{1:0}]  vport=oBus1
and
	outputs:
	oBus1[{1:0}]

They generate slightly different verilog instantiations of the module
under test; the former connects to the port, while the latter extracts
the outputs using heirarchical references.


The doverilog option "--powernets" declares two signals VDD and GND as
supply1 and supply0, and connects them to module ports on the
instance.  This corresponds to a local convention for certain cells. 

Doverilog currently requires that the module under test have a clock
input port.  By default, this is called Clk.  Someday this restriction
should be removed.

The -C or --clock options can be used to change the name of the clock
port.  Multiple -C options can be used to specify multiple clock input
ports that are all to receive the same clock.



-----------------------------------------------------------------------------

Still to be described:

- hand-crafting your .cmd file.  Often necessary for "analog" circuits.
see docs/spicepp.txt for a description of what the spicepp preprocessor
can do for you.

- ".loads" files

- monte-carlo analysis

-----------------------------------------------------------------------------

Some additional very simple gates in the examples directory:

aoi - and/or/invert compound gate
	aoi.in intentionaly sets a too-tight EFTO constraint on the output.

inv - simple inverter

nand - simple nand gate

nandx1 - simple nand gate, broken in an "interesting" way, to illustrate
	the test-vector generation problem.
	compare nandx1.bad.in and nandx1.good.in
	
