Bug 270 - investigate nmigen clock gating
Summary: investigate nmigen clock gating
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2020-03-28 14:14 GMT by Luke Kenneth Casson Leighton
Modified: 2020-03-28 16:20 GMT (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Luke Kenneth Casson Leighton 2020-03-28 14:34:31 GMT
> The principle is that you save power by not clocking the parts of the circuit
> that don't have to do any computing. I think this could be a more
> general way to only enable the stages in your pipeline who actually 
> are doing computation.

ok so if i understand this correctly:

* the clock still runs at 1600mhz
* the clock runs a cyclic shift-register of length equal to the
  number of stages, at 1600 mhz.
* only every *alternate* one of those elements in the shift register
  is enabled (or, if you want full speed, all of them). 
* through EnableInserter each stage is clocked by a *different* bit
  in the shifted-register

> That said I think this feature does not fit in the MVP scope of the October
> prototype so that chip should IMO not use clock gating nor the pass-through
> register feature from the original discussion. 

no, i agree, and, more to the point, we don't need it for the 180nm ASIC
(except perhaps to test the concept).

one thing that we have is, the use of OO python has the entirety of the
stages themselves *completely* separated firmly behind a general-purpose
API, where the construction of pipelines, from those stages, using entirely
different pipeline techniques, is *literally* a one-line change.

so we could conceivably do the *entire* suite of pipelines - convert them
to use this clock gating technique - *literally* in well under a day,
after first experimenting with EnableInserter and a quick and simple unit
test.

re-running the IEEE754 FP unit tests on the other hand... *sigh* :)
Comment 2 Staf Verhaegen 2020-03-28 16:20:53 GMT
(In reply to Luke Kenneth Casson Leighton from comment #1)
> > The principle is that you save power by not clocking the parts of the circuit
> > that don't have to do any computing. I think this could be a more
> > general way to only enable the stages in your pipeline who actually 
> > are doing computation.
> 
> ok so if i understand this correctly:
> 
> * the clock still runs at 1600mhz
> * the clock runs a cyclic shift-register of length equal to the
>   number of stages, at 1600 mhz.
> * only every *alternate* one of those elements in the shift register
>   is enabled (or, if you want full speed, all of them). 
> * through EnableInserter each stage is clocked by a *different* bit
>   in the shifted-register

Correct, the clock is the pipeline clock. In theory other parts of the CPU could for example run at half the clock frequency. This will then naturally automatically only committing a new operation every other cycle at maximum.

I did not test it but EnableInserter should work in simulation and FPGA. Depending on FPGA you likely won't see the full power improvements as I think that the enabling is implemented as an enable input to each FF and not with gating parts of the clock tree. It will still guarantee that the output of FFs don't change.
As said implementing clock gating for ASICs will not be a simple task.