Skip to content

Commit 77f435f

Browse files
committed
simplify implementation and remove pvt dependency
Signed-off-by: dsengupta0628 <dsengupta@precisioninno.com>
1 parent bbd00e1 commit 77f435f

6 files changed

Lines changed: 85 additions & 73 deletions

File tree

docs/contrib/3DIC.md

Lines changed: 35 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -307,22 +307,27 @@ create_clock -name clk -period 1.0 [get_pins -of_objects [get_nets clk_top]]
307307
now produces a constrained `Path Group: clk` setup check identical to
308308
the inner-CK form. Two fixes were required on top of Stage 7:
309309

310-
**Fix A — Synthesize a LibertyCell per chip-master with a self-arc per
311-
chip-bump port.** `dbNetwork::makeTopCellForChip` builds the chip-master
312-
Cell *as* a `LibertyCell` (via `LibertyBuilder`) on a private
313-
`LibertyLibrary`. For each chip-bump bterm it creates a `LibertyPort`
314-
and a combinational self-arc (`lc->makeTimingArcSet(lp, lp, ...)`).
315-
`Graph::makeInstanceEdges` consumes the arc and creates:
316-
317-
- `bump_load → bump_bidir_drvr` (forward combinational, traversable by
318-
`SearchAdj`)
319-
- `bump_bidir_drvr → bump_load` (flagged `isBidirectInstPath`, filtered
320-
by default predicate but harmless)
321-
322-
The forward arc closes the gap between the load and driver vertices of
323-
the BIDIRECT chip-bump pin, so a clock arrival seeded on the load (or
324-
the driver) fans out via the regular wire-edge model. Zero arc delay
325-
preserves the inner-CK anchor's reported slack.
310+
**Fix A — Chip-bump pins report BIDIRECT; no synthesized timing arc.**
311+
`dbNetwork::makeTopCellForChip` builds one synthetic **`ConcreteCell`**
312+
per chip master (via the public `makeCell`/`makePort`), with a `Port`
313+
per chip-bump bterm. It has no `LibertyCell` binding, so
314+
`libertyPort(chip_bump_pin)` is null and `dbNetwork::direction(pin)`
315+
falls through to its `PortDirection::bidirect()` branch.
316+
317+
Because the pin is BIDIRECT, `Graph::makePinVertices` allocates **two**
318+
vertices for it — a load and a driver — and `create_clock` seeds the
319+
clock arrival on **both** (`Search::findClockVertices` /
320+
`seedClkArrivals` insert both). The driver vertex then fans out into the
321+
chiplet body through the fat-net wire-edge model (see Stage 7), so the
322+
clock reaches each chiplet's internal CK. **No instance/timing arc is
323+
synthesized** — the load and driver vertices do not need to be joined by
324+
an edge, since both are seeded directly.
325+
326+
(Historical note: an earlier revision synthesized a `LibertyCell` with a
327+
zero-delay combinational self-arc per bump via OpenSTA's private
328+
`LibertyBuilder`. That was dropped — the private header is not part of
329+
OpenSTA's public include API, and the arc turned out to be unnecessary
330+
for propagation. See the wire-load section for its QoR side effect.)
326331

327332
**Fix B — Per-block discriminator in `getDbNwkObjectId`.** Each chiplet
328333
`dbBlock` numbers its iterms/bterms/insts/nets from 1. Without
@@ -353,11 +358,22 @@ STA does **not** build a per-segment drvr → load chain. `Graph::makeWireEdgesF
353358

354359
1. **Cross-chiplet data edge has no intermediate bump-pin hop.** The wire edge `chipA/buf/Z → chipB/buf/A` is created in one shot when `chipA/q_bump` (the BIDIRECT driver iterated from chipA's chip_inst pin walk) is processed. The visitor descends into chipA's inner q dbNet (yielding `chipA/buf/Z`, `chipA.q_bterm`) and chipB's inner d dbNet (yielding `chipB/buf/A`, `chipB.d_bterm`). `FindNetDrvrLoads` then classifies — `chipA/buf/Z` joins `drvrs`, `chipB/buf/A` joins `loads` — and the pairwise loop emits the cross-chip wire edge.
355360

356-
2. **Every BIDIRECT chip-bump appears in BOTH drvrs and loads.** `isDriver` and `isLoad` both return true for `direction == bidirect`. So `chipA/q_bump` and `chipB/d_bump` show up as loads in the aggregated set too, and `chipA/buf/Z` gets extra outgoing wire edges to their **load-side vertices** in addition to the real load `chipB/buf/A`. These edges are harmless for path search — the bump load vertex's only outgoing arc is the synthesized `load → bidir_drvr` self-arc, which then re-enters the same chipnet, and `SearchPred` (forward search) only emits paths via the `bidir_drvr_vertex`. No spurious paths form.
361+
2. **Every BIDIRECT chip-bump appears in BOTH drvrs and loads.** `isDriver` and `isLoad` both return true for `direction == bidirect`. So `chipA/q_bump` and `chipB/d_bump` show up as loads in the aggregated set too, and `chipA/buf/Z` gets extra outgoing wire edges to their **load-side vertices** in addition to the real load `chipB/buf/A`. These edges are harmless for path search — a bump load vertex has no outgoing arc (it is a dead-end load), and `SearchPred` (forward search) only emits paths via the `bidir_drvr_vertex`. No spurious paths form.
357362

358-
3. **Wire-delay calc sees fanout count.** STA's default wire-load lookup uses fanout count (or summed pin caps; chip-bump LibertyPort cap defaults to 0). Every distinct load on a fat net contributes to the count. If two loads dedup on identity (`PinSet`/`NetSet` sort by `id()` — see the per-block discriminator section above), the fanout under-counts and wire delay drops by one tier.
363+
3. **Wire-delay calc sees fanout count.** STA's default wire-load lookup uses fanout count (or summed pin caps; chip-bump port cap defaults to 0). Every distinct load on a fat net contributes to the count. If two loads dedup on identity (`PinSet`/`NetSet` sort by `id()` — see the per-block discriminator section above), the fanout under-counts and wire delay drops by one tier.
359364

360-
The Stage 7 golden `slack 0.83` was produced under exactly that under-count: cross-block iterm/bterm/net id collisions in `visited_drvrs` / `visited_nets` silently dropped some chip-side loads. After the `block_disc_` fix the load set is correctly sized, fanout grows, and the wire edge `chipA/buf/Z → chipB/buf/A` picks up an extra +0.01 ns, landing at `slack 0.82`. The regen captures the physical-topology-correct value.
365+
The `block_disc_` fix is what keeps the three cross-block loads on the
366+
`chipA/buf/Z` fat net distinct — `chipB/buf/A` (the far CMOS input) plus
367+
the two physical microbumps `chipA/q_bump` and `chipB/d_bump`. Fanout = 3,
368+
giving the constrained `slack 0.83`. Verify with
369+
`report_checks -fields {fanout}`: the `chipA/buf/Z` row shows `Fanout 3`.
370+
371+
An earlier revision reported `slack 0.82` (fanout 4). That extra load was
372+
an artifact of the now-removed synthesized self-arc (Fix A history): its
373+
`load → bidir_drvr` edge let a bump's load vertex **re-enter** the chipnet
374+
during `visitConnectedPins`, so the same bond was counted twice. A bond is
375+
a single physical load, so `0.83` / fanout 3 is the physically-correct
376+
value; `0.82` was a one-tier over-count.
361377

362378
## term(Pin*) history (Stage 4–6.5)
363379

docs/contrib/3DIC_TODO.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -210,13 +210,14 @@ TSV, microbump) contribute non-trivial RC.
210210
211211
### Problem
212212
213-
v1 always synthesizes a stub `LibertyCell` per chiplet master with a
214-
zero-delay self-arc per chip-bump port. That works when the chiplet
215-
ships only as DEF + bump map. When the chiplet vendor instead ships an
216-
**Extracted Timing Model (ETM)** — a real `.lib` whose `cell` matches
217-
the chiplet name and whose ports match the chip-bump bterm names —
218-
the stub is wrong: it hides the ETM's real clock-to-q, setup/hold,
219-
and internal arcs.
213+
v1 builds a stub **`ConcreteCell`** (no `LibertyCell` binding) per
214+
chiplet master, with a plain `Port` per chip-bump bterm and no timing
215+
arc — cross-chiplet delay is cell logic plus (zero) bond wire delay only.
216+
That works when the chiplet ships only as DEF + bump map. When the
217+
chiplet vendor instead ships an **Extracted Timing Model (ETM)** — a real
218+
`.lib` whose `cell` matches the chiplet name and whose ports match the
219+
chip-bump bterm names — the stub is wrong: it hides the ETM's real
220+
clock-to-q, setup/hold, and internal arcs.
220221
221222
3DBlox already supports this via `external.liberty_file:` under
222223
`ChipletDef:`. dbSta just doesn't consult it.
@@ -229,9 +230,9 @@ and internal arcs.
229230
LibertyCell* etm = network_->findLibertyCell(master->getName());
230231
if (etm) {
231232
chip_master_cells_[master] = reinterpret_cast<Cell*>(etm);
232-
continue; // skip stub synthesis
233+
continue; // skip stub ConcreteCell synthesis
233234
}
234-
// no ETM — fall through to LibertyBuilder stub.
235+
// no ETM — fall through to the public makeCell/makePort stub.
235236
```
236237
- The ETM `LibertyCell`'s `LibertyPort`s must match the chip-bump
237238
bterm names. Validate via a sanity pass:
@@ -241,15 +242,15 @@ and internal arcs.
241242
}
242243
```
243244
244-
2. **Suppress the BIDIRECT direction override when ETM is present.**
245-
- `dbNetwork::direction(chip_bump_pin)` currently returns BIDIRECT
246-
unconditionally so wire-edge formation runs on every bump. With an
247-
ETM, port direction should come from the ETM's `LibertyPort`
248-
(which encodes the real INPUT/OUTPUT/INOUT semantics). The ETM
249-
model itself drives clock-edge propagation through real arcs, no
250-
BIDIRECT trick needed.
245+
2. **Suppress the BIDIRECT direction fallback when ETM is present.**
246+
- For a stub `ConcreteCell` chip-bump pin, `libertyPort(pin)` is null,
247+
so `dbNetwork::direction(pin)` returns BIDIRECT (dual vertices, both
248+
clock-seeded — how v1 propagates a clock through a bump). With an
249+
ETM the bump pin resolves to a real `LibertyPort`, so `direction`
250+
returns its INPUT/OUTPUT/INOUT and the ETM's own arcs drive
251+
clock-edge propagation; no BIDIRECT fallback is taken.
251252
252-
3. **Skip the `chip_bump_lib_` private LibertyLibrary entirely** when
253+
3. **Skip the `chip_master_lib_` stub Library entirely** when
253254
every chiplet master has an ETM.
254255
255256
4. **Diagnostic.**

src/dbSta/include/db_sta/dbNetwork.hh

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -516,11 +516,12 @@ class dbNetwork : public ConcreteNetwork
516516
// so iterms/bterms/insts/nets from different chiplet blocks (each
517517
// numbered from 1) don't collide in NetSet/PinSet keys.
518518
odb::PtrMap<odb::dbBlock, uint32_t> block_disc_;
519-
// Synthetic LibertyLibrary for chip-master Cells. Each carries a
520-
// self-arc per chip-bump port so Graph::makeInstanceEdges builds the
521-
// internal load<->bidir_drvr edges that let create_clock anchors on
522-
// chip-bump pins propagate into the chiplet body.
523-
LibertyLibrary* chip_bump_lib_ = nullptr;
519+
// Synthetic (non-Liberty) Library owning the chip-master Cells built in
520+
// makeTopCellForChip. The Cells have no LibertyCell binding; a clock
521+
// anchored on a chip-bump pin propagates because the pin is BIDIRECT
522+
// (dual load/driver vertices, both clock-seeded) and the driver fans out
523+
// via the fat-net wire-edge model — no synthesized timing arc.
524+
Library* chip_master_lib_ = nullptr;
524525
Instance* top_instance_;
525526
Cell* top_cell_ = nullptr;
526527
std::set<dbNetworkObserver*> observers_;

src/dbSta/src/dbNetwork.cc

Lines changed: 15 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,6 @@ Recommended conclusion: use map for concrete cells. They are invariant.
6161
#include <vector>
6262

6363
#include "dbEditHierarchy.hh"
64-
#include "liberty/LibertyBuilder.hh"
6564
#include "odb/PtrSetMap.h"
6665
#include "odb/db.h"
6766
#include "odb/dbObject.h"
@@ -80,8 +79,6 @@ Recommended conclusion: use map for concrete cells. They are invariant.
8079
#include "sta/PortDirection.hh"
8180
#include "sta/Search.hh"
8281
#include "sta/StringUtil.hh"
83-
#include "sta/TimingArc.hh"
84-
#include "sta/TimingRole.hh"
8582
#include "sta/VertexId.hh"
8683
#include "utl/Logger.h"
8784
#include "utl/algorithms.h"
@@ -1008,45 +1005,39 @@ void dbNetwork::makeTopCellForChip(dbChip* chip)
10081005
Library* old_lib = library(top_cell_);
10091006
deleteLibrary(old_lib);
10101007
}
1011-
if (chip_bump_lib_) {
1012-
deleteLibrary(reinterpret_cast<Library*>(chip_bump_lib_));
1013-
chip_bump_lib_ = nullptr;
1008+
if (chip_master_lib_) {
1009+
deleteLibrary(chip_master_lib_);
1010+
chip_master_lib_ = nullptr;
10141011
}
10151012
chip_master_cells_.clear();
10161013
const char* design_name = chip->getName();
10171014
Library* top_lib = makeLibrary(design_name, "");
10181015
top_cell_ = makeCell(top_lib, design_name, false, "");
1019-
// Per-master LibertyCell with a self-arc per chip-bump port. Without
1020-
// the self-arc Graph::makeInstanceEdges builds no load<->bidir_drvr
1021-
// edge for the BIDIRECT bump pin, blocking create_clock propagation
1022-
// through the chip-bump.
1023-
chip_bump_lib_ = makeLibertyLibrary("3dic_chip_bump_lib", "");
1024-
LibertyBuilder builder(debug_, report_);
1016+
// One plain Cell per chip master with a Port per chip-bump bterm,
1017+
// backing cell()/port()/name()/direction() for chip-inst and chip-bump
1018+
// pins. No LibertyCell, so chip-bump pins read as BIDIRECT and a clock
1019+
// anchored on one propagates through the fat-net wire model on its own
1020+
// (no timing arc, no OpenSTA-private LibertyBuilder).
1021+
chip_master_lib_ = makeLibrary("3dic_chip_master_lib", "");
10251022
for (dbChipInst* chip_inst : chip->getChipInsts()) {
10261023
dbChip* master = chip_inst->getMasterChip();
10271024
if (master == nullptr || chip_master_cells_.contains(master)) {
10281025
continue;
10291026
}
1030-
LibertyCell* lc
1031-
= builder.makeCell(chip_bump_lib_, master->getName(), "3dic-synth");
1032-
Cell* master_cell = reinterpret_cast<Cell*>(lc);
1027+
Cell* master_cell
1028+
= makeCell(chip_master_lib_, master->getName(), false, "");
10331029
chip_master_cells_[master] = master_cell;
10341030
for (odb::dbChipRegion* region : master->getChipRegions()) {
10351031
for (odb::dbChipBump* bump : region->getChipBumps()) {
10361032
odb::dbBTerm* bterm = bump->getBTerm();
10371033
if (bterm == nullptr) {
10381034
continue;
10391035
}
1040-
LibertyPort* lp = builder.makePort(lc, bterm->getConstName());
1041-
lp->setDirection(dbToSta(bterm->getSigType(), bterm->getIoType()));
1042-
registerConcretePort(reinterpret_cast<Port*>(lp));
1043-
auto attrs = std::make_shared<TimingArcAttrs>();
1044-
attrs->setTimingType(TimingType::combinational);
1045-
lc->makeTimingArcSet(
1046-
lp, lp, nullptr, TimingRole::combinational(), attrs);
1036+
Port* port = makePort(master_cell, bterm->getConstName());
1037+
setDirection(port, dbToSta(bterm->getSigType(), bterm->getIoType()));
1038+
registerConcretePort(port);
10471039
}
10481040
}
1049-
lc->finish(false, report_, debug_);
10501041
}
10511042
}
10521043

@@ -1092,7 +1083,7 @@ void dbNetwork::clear()
10921083
chip_bump_vertex_ids_.clear();
10931084
block_to_chip_inst_.clear();
10941085
block_disc_.clear();
1095-
chip_bump_lib_ = nullptr;
1086+
chip_master_lib_ = nullptr;
10961087
}
10971088

10981089
Instance* dbNetwork::topInstance() const

src/dbSta/test/3dic_cross.ok

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,10 @@ Path Type: max
3232
0.09 0.09 ^ chipA/ff/Q (DFF_X1)
3333
0.01 0.09 v chipA/inv/ZN (INV_X1)
3434
0.02 0.11 v chipA/buf/Z (BUF_X1)
35-
0.02 0.14 v chipB/buf/Z (BUF_X1)
36-
0.01 0.15 ^ chipB/inv/ZN (INV_X1)
37-
0.00 0.15 ^ chipB/ff/D (DFF_X1)
38-
0.15 data arrival time
35+
0.02 0.13 v chipB/buf/Z (BUF_X1)
36+
0.01 0.14 ^ chipB/inv/ZN (INV_X1)
37+
0.00 0.14 ^ chipB/ff/D (DFF_X1)
38+
0.14 data arrival time
3939

4040
1.00 1.00 clock clk (rise edge)
4141
0.00 1.00 clock network delay (ideal)
@@ -45,9 +45,9 @@ Path Type: max
4545
0.97 data required time
4646
---------------------------------------------------------
4747
0.97 data required time
48-
-0.15 data arrival time
48+
-0.14 data arrival time
4949
---------------------------------------------------------
50-
0.82 slack (MET)
50+
0.83 slack (MET)
5151

5252

5353
Summary 2 / 2 (100% pass)

src/dbSta/test/3dic_cross.tcl

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,11 @@ check "bridge spans both chiplets" {
3232
} {chipA chipB}
3333

3434
# Anchor the clock on the chip-bump pins of clk_top — the natural form
35-
# users will write. Propagation through the BIDIRECT bump relies on the
36-
# synthesized LibertyCell self-arc built in makeTopCellForChip.
35+
# users will write. Each chip-bump pin reports BIDIRECT (dbNetwork::
36+
# direction), so Graph::makePinVertices gives it both a load and a driver
37+
# vertex and create_clock seeds the arrival on both. The driver vertex then
38+
# fans out into the chiplet body via the fat-net wire-edge model, so the
39+
# clock reaches each chiplet's internal CK with no synthesized timing arc.
3740
create_clock -name clk -period 1.0 \
3841
[get_pins -of_objects [get_nets clk_top]]
3942
report_checks -path_delay max -group_path_count 4

0 commit comments

Comments
 (0)