186 – Create decoder for SOC: Power ISA

Bug 186 - Create decoder for SOC: Power ISA

Summary: Create decoder for SOC: Power ISA

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	PC Windows

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:	211 189 261 271
Blocks:	22 383
	Show dependency tree / graph

Reported:	2020-02-24 17:25 GMT by Michael Nolan
Modified:	2021-04-20 14:41 BST (History)
CC List:	4 users (show)

See Also:	187 212 269
NLnet milestone:	NLNet.2019.10.043.Wishbone
total budget (EUR) for completion of task and all subtasks:	500
budget (EUR) for this task, excluding subtasks' budget:	500
parent task for budget allocation:	383
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl = {amount=200,paid=2020-10-21} donated={amount=200,paid=2020-12-06} jacob={amount=100,paid=2020-12-09}

Attachments
libvirt xml file (648 bytes, text/xml) 2020-03-26 05:40 GMT, Luke Kenneth Casson Leighton	Details
copy of python libvirt domain start example (1.27 KB, text/x-python) 2020-03-26 05:41 GMT, Luke Kenneth Casson Leighton	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Nolan 2020-02-24 17:25:49 GMT

Needs to handle POWER instructions

https://libre-riscv.org/3d_gpu/architecture/decoder/

Comment 1 Michael Nolan 2020-02-24 17:33:57 GMT

The POWER ISA includes instructions to load and store multiple registers to/from memory (`lmw` and `smw`). 

`lmw rt, D(rA)` loads registers rt, r(t+1), r(t+2), ..., r31 from subsequent addresses starting at rA + signed(D). `smw` behaves similarly but for stores

These instructions, if implemented, would significantly complicate the decoder by forcing it to either generate $31-n$ load or store ops while stalling instruction fetch. Alternatively it could generate a single op that after taking a trip through the pipeline would be modified (i.e. incrementing rt) and being placed back into the issue queue. I suspect the latter would not play nicely with the OOO machinery because it wouldn't place a reservation on a register until it was time to store that register, so subsequent instructions could modify the registers before they are stored.

Comment 2 Michael Nolan 2020-02-24 17:39:39 GMT

(In reply to Michael Nolan from comment #1)
> The POWER ISA includes instructions to load and store multiple registers
> to/from memory (`lmw` and `smw`). 
> 

A similar situation exists with the load and store string instructions which are currently "Phased Out" (I assume we wouldn't be implementing these, or emulating them with an OS trap)

Comment 3 Luke Kenneth Casson Leighton 2020-02-24 18:04:12 GMT

(In reply to Michael Nolan from comment #1)
> The POWER ISA includes instructions to load and store multiple registers
> to/from memory (`lmw` and `smw`). 
> 
> `lmw rt, D(rA)` loads registers rt, r(t+1), r(t+2), ..., r31 from subsequent
> addresses starting at rA + signed(D). `smw` behaves similarly but for stores

ah funnily enough this is basically identical to LD.MULTI which is covered
by the out-of-order execution engine and by SimpleV.

> These instructions, if implemented, would significantly complicate the
> decoder by forcing it to either generate $31-n$ load or store ops while
> stalling instruction fetch. Alternatively it could generate a single op that
> after taking a trip through the pipeline would be modified (i.e.
> incrementing rt) and being placed back into the issue queue. I suspect the
> latter would not play nicely with the OOO machinery because it wouldn't
> place a reservation on a register until it was time to store that register,
> so subsequent instructions could modify the registers before they are stored.

the total opposite is true: it plays very nicely with the OoO engine and
plays merry hell with an in-order design.

no stalling is required.

all that happens is: we translate this into a SimpleV instruction
(VL=32) and hit the SV hardware-loop system with it.

Comment 4 Luke Kenneth Casson Leighton 2020-02-24 18:31:20 GMT

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Michael Nolan from comment #1)

> no stalling is required.

okok that's not quite true :)

if the internal multi-issue width is say only 4 64-bit wide, then yes,
the SimpleV VL-hardware-loop engine (which takes over at this point,
the lmw decode is "done" by that point) will have to issue 4 instructions
at a time.

and, if there are not enough LOAD/STORE Function Units "completed" yet,
then _yes_, the issue engine has to stall.

however this is just day-to-day run-of-the-mill as far as the 6600 engine
is concerned.

if there are no Function Units free to accept the current instruction,
then there are simply too many outstanding instructions being executed
for it to be possible to drop more into the Dependency Matrices.

so... *then* you stall.  that's how it worked back in 1966, the only difference
being now that we will have a multi-issue engine and so can accept more
instructions per clock.

Comment 5 Michael Nolan 2020-02-24 18:34:10 GMT

(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to Luke Kenneth Casson Leighton from comment #3)
> > (In reply to Michael Nolan from comment #1)
> 
> > no stalling is required.
> 
> okok that's not quite true :)
> 
> if the internal multi-issue width is say only 4 64-bit wide, then yes,
> the SimpleV VL-hardware-loop engine (which takes over at this point,
> the lmw decode is "done" by that point) will have to issue 4 instructions
> at a time.
> 
> and, if there are not enough LOAD/STORE Function Units "completed" yet,
> then _yes_, the issue engine has to stall.

Huh ok. So there's a SimpleV translator between the decoder and the dependency matrices/issue queue?

Comment 6 Luke Kenneth Casson Leighton 2020-02-24 18:42:18 GMT

(In reply to Michael Nolan from comment #5)

> Huh ok. So there's a SimpleV translator between the decoder and the
> dependency matrices/issue queue?

yes.  that's how SimpleV works - it's what it is.

think about it: SimpleV is a hardware for-loop.  the Program Counter
pauses, and a *SUB-EXECUTION* loop occurs (for i in range(VL))

however each output from that sub-execution loop goes...

... into the instruction issue.

ta-daaa :)

therefore, lmw can quite literally be replaced with a SimpleV
"SVPrefix" instruction.  SVP{VL=32}.LD r1, r2#D.

done.

Comment 7 Michael Nolan 2020-02-24 19:20:26 GMT

Oh no. RLWINM & friends is going to be fun to do with the partitioned shifter. Maybe it warrants its own module?

Comment 8 Luke Kenneth Casson Leighton 2020-02-24 19:42:06 GMT

(In reply to Michael Nolan from comment #7)
> Oh no. RLWINM & friends is going to be fun to do with the partitioned
> shifter. Maybe it warrants its own module?

deefinitely.  there is actually a way to "construct" it from *two*
PartitionedShifters:

* create an "inversion" (bitwise inversion) of both the mask *and* A-input
* run through the SAME (existing) DynamicPartitionedShifter

this is how you do "right shift"

then, you basically do a hybrid combination of left-shift ORed with
right-shift, where the right-shift part is done (32-B) and the left-shift
part is (B)

however... because of the dynamic partitioning, obviously that's (16-B)
for 16-bit-wide partitions etc. etc.

now, we *could* hypothetically do that as micro-ops, exactly as we discussed
a few days ago:

* put the data first through the left-shifter and store it as a partial result
* put the data next through the left-shifter (after inversion)
* do the OR (and ANDing) operation as a third phase

i am tempted to suggest trying that, first, rather than writing a
"special dedicated RLWINM" operation.

do you want to give that a shot?  see if it's first possible to create a
Dynamic Right-Shifter out of a Dynamic Left-Shifter?

Comment 9 Jacob Lifshay 2020-02-24 19:42:48 GMT

(In reply to Luke Kenneth Casson Leighton from comment #6)
> therefore, lmw can quite literally be replaced with a SimpleV
> "SVPrefix" instruction.  SVP{VL=32}.LD r1, r2#D.

One thing to consider: lmw and similar is probably defined to read the address register before writing to any registers, whereas SimpleV may not be defined that way (but probably should be), this matters when the load overwrites the address register part way through.

Comment 10 Michael Nolan 2020-02-24 19:54:50 GMT

(In reply to Luke Kenneth Casson Leighton from comment #8)

> * put the data first through the left-shifter and store it as a partial
> result
> * put the data next through the left-shifter (after inversion)
> * do the OR (and ANDing) operation as a third phase
> 
> i am tempted to suggest trying that, first, rather than writing a
> "special dedicated RLWINM" operation.
> 
> do you want to give that a shot?  see if it's first possible to create a
> Dynamic Right-Shifter out of a Dynamic Left-Shifter?

I can. We're definitely going to need a right shift instruction regardless of how we do rlwinm. Should I create a separate bug report for that or update progress under either this one or 173?

Comment 11 Michael Nolan 2020-02-24 19:56:51 GMT

(In reply to Jacob Lifshay from comment #9)
> 
> One thing to consider: lmw and similar is probably defined to read the
> address register before writing to any registers, whereas SimpleV may not be
> defined that way (but probably should be), this matters when the load
> overwrites the address register part way through.

The POWER ISA declares a LMW like that (or a LMW including register 0) to be invalid because of this. Glad you brought it up though

Comment 12 Luke Kenneth Casson Leighton 2020-02-24 20:23:15 GMT

(In reply to Jacob Lifshay from comment #9)
> (In reply to Luke Kenneth Casson Leighton from comment #6)
> > therefore, lmw can quite literally be replaced with a SimpleV
> > "SVPrefix" instruction.  SVP{VL=32}.LD r1, r2#D.
> 
> One thing to consider: lmw and similar is probably defined to read the
> address register before writing to any registers, whereas SimpleV may not be
> defined that way (but probably should be), this matters when the load
> overwrites the address register part way through.

that's handled by the LD Function Unit.  Reservations are put on the
registers (and on the memory address... *when calculated*).

you need to read mitch alsup's book chapters to fully understand the process.

and yes, the Dependency Matrix will notice that there is a write hazard
on the address being overwritten, and will correctly "hold up" the
instruction that depends on the register.

from what you're saying, michael, this is "too complicated" for POWER ISA
to cope with, so they declare it "invalid"?  if so this does not surprise me
at all because you absolutely cannot have an ISA that specifically depends
on an OoO execution model.

Comment 13 Luke Kenneth Casson Leighton 2020-02-24 20:24:03 GMT

(In reply to Michael Nolan from comment #10)

> I can. We're definitely going to need a right shift instruction regardless
> of how we do rlwinm. Should I create a separate bug report for that or
> update progress under either this one or 173?


yes make it a separate one because this bugreport is about instruction
decode, not ALU implementation.  do cross-reference this bugreport though.

Comment 14 Jacob Lifshay 2020-02-24 20:43:22 GMT

(In reply to Luke Kenneth Casson Leighton from comment #12)
> (In reply to Jacob Lifshay from comment #9)
> > (In reply to Luke Kenneth Casson Leighton from comment #6)
> > > therefore, lmw can quite literally be replaced with a SimpleV
> > > "SVPrefix" instruction.  SVP{VL=32}.LD r1, r2#D.
> > 
> > One thing to consider: lmw and similar is probably defined to read the
> > address register before writing to any registers, whereas SimpleV may not be
> > defined that way (but probably should be), this matters when the load
> > overwrites the address register part way through.
> 
> that's handled by the LD Function Unit.  Reservations are put on the
> registers (and on the memory address... *when calculated*).

My issue is not that it can't be implemented correctly in HW, but that having the instruction switch the address used halfway through makes it much harder to use in a compiler due to the inputs being overwritten part way through execution.

It's the difference between load1 and load2 in:

int regs[];

// what I think SimpleV should be defined to do
void load1(int address_reg, int dest_reg, int N)
{
    int address = regs[address_reg];
    for(int i = 0; i < N; i++)
        regs[dest_reg + i] = *(int *)(address + sizeof(int) * i);
}

// what SimpleV is currently defined to do
void load2(int address_reg, int dest_reg, int N)
{
    for(int i = 0; i < N; i++)
        regs[dest_reg + i] = *(int *)(regs[address_reg] + sizeof(int) * i);
}

where load2 overwrites the address register halfway through so all the memory accesses after that use the newly loaded value as the base address.

Comment 15 Luke Kenneth Casson Leighton 2020-02-24 20:49:22 GMT

(In reply to Jacob Lifshay from comment #14)

> My issue is not that it can't be implemented correctly in HW, but that
> having the instruction switch the address used halfway through makes it much
> harder to use in a compiler due to the inputs being overwritten part way
> through execution.

oh good point.  hmmm, that makes much more sense as to why it would be
considered invalid to do.

Comment 16 Jacob Lifshay 2020-02-24 20:51:28 GMT

(In reply to Jacob Lifshay from comment #14)
> My issue is not that it can't be implemented correctly in HW, but that
> having the instruction switch the address used halfway through makes it much
> harder to use in a compiler due to the inputs being overwritten part way
> through execution.
> 
> It's the difference between load1 and load2 in:

note that switching to load1 will require a CSR to store the address if the instruction is interrupted in the middle because the address register may have already been overwritten.

The load1 semantics should apply to all instructions with scalar/subvector inputs and vector outputs, where all scalar inputs are read before any outputs are written. vector (VL-length) inputs are still read and written one subvector/scalar at a time so we won't need giant temporary buffers.

Comment 17 Luke Kenneth Casson Leighton 2020-02-24 21:21:01 GMT

(In reply to Jacob Lifshay from comment #14)

> // what I think SimpleV should be defined to do

(... it's not)

> void load1(int address_reg, int dest_reg, int N)
> {
>     int address = regs[address_reg];
>     for(int i = 0; i < N; i++)
>         regs[dest_reg + i] = *(int *)(address + sizeof(int) * i);
> }
> 
> // what SimpleV is currently defined to do

(... it's not)

> void load2(int address_reg, int dest_reg, int N)
> {
>     for(int i = 0; i < N; i++)
>         regs[dest_reg + i] = *(int *)(regs[address_reg] + sizeof(int) * i);
> }

it has support for both.  the first is called 'indirect mode', the
second is 'stride mode'.

Comment 18 Luke Kenneth Casson Leighton 2020-02-24 21:26:48 GMT

(In reply to Jacob Lifshay from comment #16)
> (In reply to Jacob Lifshay from comment #14)
> > My issue is not that it can't be implemented correctly in HW, but that
> > having the instruction switch the address used halfway through makes it much
> > harder to use in a compiler due to the inputs being overwritten part way
> > through execution.
> > 
> > It's the difference between load1 and load2 in:
> 
> note that switching to load1 will require a CSR to store the address if the
> instruction is interrupted in the middle because the address register may
> have already been overwritten.

the modifications by mitch alsup to the 6600 system do not require that to
happen.

specifically: the "shadow" mechanism.

the "shadow" mechanism allows for "damaging" consequences to be *calculated*
but *prevented and prohibited* from actually hitting memory, register file,
or other critical non-reversible resource.

if there is anything that could raise an exception which could
cause "damage", it raises a "shadow".

ONLY when the result of the operation is GUARANTEED to complete is the
shadow released.

there *is* no possibility, therefore, of damage occurring by an instruction being "interrupted in the middle".

the operation therefore either completes atomically... or its results discarded entirely, and, because the shadow is also cast across *subsequent* instructions as well, all instructions in sequence after the one being discarded are also discarded.

Comment 19 Jacob Lifshay 2020-02-24 21:38:56 GMT

(In reply to Luke Kenneth Casson Leighton from comment #17)
> (In reply to Jacob Lifshay from comment #14)
> 
> > // what I think SimpleV should be defined to do
> 
> (... it's not)
> 
> > void load1(int address_reg, int dest_reg, int N)
> > {
> >     int address = regs[address_reg];
> >     for(int i = 0; i < N; i++)
> >         regs[dest_reg + i] = *(int *)(address + sizeof(int) * i);
> > }
> > 
> > // what SimpleV is currently defined to do
> 
> (... it's not)
> 
> > void load2(int address_reg, int dest_reg, int N)
> > {
> >     for(int i = 0; i < N; i++)
> >         regs[dest_reg + i] = *(int *)(regs[address_reg] + sizeof(int) * i);
> > }
> 
> it has support for both.  the first is called 'indirect mode', the
> second is 'stride mode'.

umm...

strided mode is conventionally where the addresses loaded follow the sequence base + i * increment where increment can be much bigger than sizeof(int)

indirect mode is conventionally where the addresses are taken from successive vector elements so each address can be independently specified.

what load2 is is where the address register is a scalar that doesn't change (address_reg is never reassigned), but regs[address_reg] is assigned to halfway through, changing the address used for subsequent loads. This is what's specified for compact loads (for lack of a better term).

Comment 20 Luke Kenneth Casson Leighton 2020-02-24 22:04:33 GMT

ehhm.. ehmehmehm.... i am getting confused, sorry, i was looking for certain patterns, spotted half of what i expected to be there and thought it was the whole lot.

load2 is... oh wait! it's... yes they are both strided, yes i see the address computation is in the loop.

ok yes got it, yes load2 is how it's done, with the Dependency Matrices protecting both the memory reads (from writes) *and* protecting the register reads and writes... in full sequence.

Comment 21 Luke Kenneth Casson Leighton 2020-02-27 17:49:20 GMT

ok the right-shift looks great, michael, i was pleasantly surprised that the bit-
swap technique worked so well, particularly for the dynamic-partitioning.

can i make a suggestion, of adding tables of the ISA here:
https://libre-riscv.org/openpower/isatables/

i have found that it helps enormously.  basically, Appendix F from
Power Version 2.07B?

Comment 22 Luke Kenneth Casson Leighton 2020-02-29 02:48:05 GMT

um... :)
https://github.com/antonblanchard/chiselwatt/blob/master/src/main/scala/Control.scala

Comment 23 Luke Kenneth Casson Leighton 2020-02-29 02:50:04 GMT

and, err...
https://github.com/antonblanchard/chiselwatt/blob/master/src/main/scala/Instructions.scala

err anyone got 5 minutes to write a python script to turn that into nmigen? :)

Comment 24 Luke Kenneth Casson Leighton 2020-02-29 08:22:49 GMT

https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl

y'know... autogenerating these from a straight csv file which is included directly into the wiki via the "data" tag should be pretty easy...

Comment 25 Luke Kenneth Casson Leighton 2020-02-29 18:23:25 GMT

https://libre-riscv.org/openpower/isatables/
using anton's decode1.vhdl, just for laughs i extracted the major opcodes
into a csv file.

Comment 26 Michael Nolan 2020-02-29 19:50:32 GMT

(In reply to Luke Kenneth Casson Leighton from comment #25)
> https://libre-riscv.org/openpower/isatables/
> using anton's decode1.vhdl, just for laughs i extracted the major opcodes
> into a csv file.

I've got something in http://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/power_major_decoder.py;h=0e0f1f22e5f9d1a92d8362f49cdb6b62c42b0ec4;hb=HEAD

I had to modify your table a little bit to remove the spaces after the commas. Would it break the display if I did the same thing to the wiki one?

Comment 27 Luke Kenneth Casson Leighton 2020-02-29 20:09:43 GMT

(In reply to Michael Nolan from comment #26)
> (In reply to Luke Kenneth Casson Leighton from comment #25)
> > https://libre-riscv.org/openpower/isatables/
> > using anton's decode1.vhdl, just for laughs i extracted the major opcodes
> > into a csv file.
> 
> I've got something in
> http://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/
> power_major_decoder.py;h=0e0f1f22e5f9d1a92d8362f49cdb6b62c42b0ec4;hb=HEAD

that was quick and pretty ridiculously easy, wasn't it? :)

> I had to modify your table a little bit to remove the spaces after the
> commas. Would it break the display if I did the same thing to the wiki one?

heck no, the whole point is to be able to do wget on the wiki version
(rather than have two out-of-date copies)

Comment 28 Michael Nolan 2020-02-29 20:14:27 GMT

(In reply to Luke Kenneth Casson Leighton from comment #27)
oder.py;h=0e0f1f22e5f9d1a92d8362f49cdb6b62c42b0ec4;hb=HEAD
> 
> that was quick and pretty ridiculously easy, wasn't it? :)

yup

> heck no, the whole point is to be able to do wget on the wiki version
> (rather than have two out-of-date copies)

It doesn't look like I have write access to the wiki git, and I can't figure out how to edit the CSV on the wiki.  What do I need to do to update that one?

Comment 29 Luke Kenneth Casson Leighton 2020-02-29 20:27:45 GMT

(In reply to Luke Kenneth Casson Leighton from comment #27)

> heck no, the whole point is to be able to do wget on the wiki version
> (rather than have two out-of-date copies)

ok bear in mind i just added the heading "comment" which was missing.
also added minor_19.

btw remember that in python int("0b100", 2) will convert to the value 4
so we can leave the values in minor_19 in binary, even detect the bit-length

all the minor_xx tables are exactly the same format as the major one.

Comment 30 Luke Kenneth Casson Leighton 2020-02-29 20:29:46 GMT

Comment 31 Luke Kenneth Casson Leighton 2020-02-29 20:31:03 GMT

(In reply to Michael Nolan from comment #28)

> It doesn't look like I have write access to the wiki git,

you do now.  git clone gitolite3@git.libre-riscv.org:922/libreriscv.git
i think.  check HDL_workflow.

Comment 32 Michael Nolan 2020-02-29 20:40:15 GMT

I'm moving the enums for each field in the table to a separate file. Would it be acceptable to do a `from power_enums import *` in the decoder and testbench?

Comment 33 Michael Nolan 2020-02-29 20:43:33 GMT

(In reply to Luke Kenneth Casson Leighton from comment #31)
> (In reply to Michael Nolan from comment #28)
> 
> > It doesn't look like I have write access to the wiki git,
> 
> you do now.  git clone gitolite3@git.libre-riscv.org:922/libreriscv.git
> i think.  check HDL_workflow.

Thanks, updated.

Comment 34 Luke Kenneth Casson Leighton 2020-02-29 21:12:19 GMT

(In reply to Michael Nolan from comment #32)
> I'm moving the enums for each field in the table to a separate file.

great.

> Would it be acceptable to do a `from power_enums import *` in the decoder and
> testbench?

no, it's really not a good idea.  grep'd "class power_enum.py >f ",
did a bit of vim voodoo, created this:

from soc.decode.power_enums import (Function,
InternalOp,
In1Sel,
In2Sel,
In3Sel,
OutSel,
LdstLen,
RC,
CryIn,
)

Comment 35 Luke Kenneth Casson Leighton 2020-02-29 21:15:16 GMT

hmm we need to sort out the soc namespace, move the entire subdirectories
to something that goes "from soc.decoder.power_decoder import xxxx"

btw relative imports are generally to be avoided: full import namespaces
otherwise things get messed up.

Comment 36 Luke Kenneth Casson Leighton 2020-02-29 21:39:09 GMT

like the http get in the decode, michael. i would have used a Makefile and wget, httplib works just as well.

Comment 37 Luke Kenneth Casson Leighton 2020-02-29 21:56:43 GMT

btw for RISCV we don't need to do one, as there is at least one migen RISCV processor out there.
https://github.com/lambdaconcept/minerva/blob/master/minerva/units/decoder.py

dang that took far too long to find!

entire search space "riscv nmigen" is polluted with irrelevant items.

anyway found it eventually.

so.

question for you, illustrating why "import *" is problematic.

i have never seen the function "matcher" before.

can you tell me, immediately, without doing a google search, if it is a standard python function, or if it is part of nmigen?

it's impossible to tell, isn't it?

even if it was a keyword, the very fact that there is even *one* import * DESTROYS all confidence because nmigen *might* have an overriden function named "matcher".

anyway.

with some adaptation and checking the license file first, we should be able to use that code.

Comment 38 Luke Kenneth Casson Leighton 2020-02-29 21:59:35 GMT

https://github.com/lambdaconcept/minerva/blob/master/minerva/units/decoder.py#L140

found it :)

Comment 39 Jacob Lifshay 2020-03-01 03:20:21 GMT

(In reply to Luke Kenneth Casson Leighton from comment #27)
> (In reply to Michael Nolan from comment #26)
> > I had to modify your table a little bit to remove the spaces after the
> > commas. Would it break the display if I did the same thing to the wiki one?
> 
> heck no, the whole point is to be able to do wget on the wiki version
> (rather than have two out-of-date copies)

Wouldn't it be better to just link the wiki to the source code in soc.git (or other repo)? that way you can build the SoC from source without needing to connect to our server. We should try to have all the required code in the code repos rather than the wiki repo.

Alternatively, if ikiwiki can manage autoupdating it, have soc be a submodule of the wiki git repo.

Comment 40 Luke Kenneth Casson Leighton 2020-03-01 10:39:17 GMT

(In reply to Jacob Lifshay from comment #39)
> (In reply to Luke Kenneth Casson Leighton from comment #27)
> > (In reply to Michael Nolan from comment #26)
> > > I had to modify your table a little bit to remove the spaces after the
> > > commas. Would it break the display if I did the same thing to the wiki one?
> > 
> > heck no, the whole point is to be able to do wget on the wiki version
> > (rather than have two out-of-date copies)
> 
> Wouldn't it be better to just link the wiki to the source code in soc.git
> (or other repo)? that way you can build the SoC from source without needing
> to connect to our server. We should try to have all the required code in the
> code repos rather than the wiki repo.

good point... is it easy to do?

> Alternatively, if ikiwiki can manage autoupdating it, have soc be a
> submodule of the wiki git repo.

hmm.. hmm... 

tempted to suggest "nuts to it" and just have a copy, marking it as
"canonical source: http://....."

Comment 41 Luke Kenneth Casson Leighton 2020-03-01 14:32:27 GMT

i saw you added minor_30.csv (great), i added 58 and 62.
https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl

hmm i think perhaps we could do instead with a class rather than
explicit members of PowerDecoder, i moved them into PowerOps.
perhaps later PowerOps might best be derived from Record, although
whitequark is being quite intransigent about how Record should be
created (and used), which causes us quite a lot of problems in using
it.  leaving that aside and moving on...

when doing nop, attn and SIM_CONFIG, these are explicit single-ops
that need assignment in a derivative of PowerDecoder

https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl#L391

in VHDL, "&" appears not to be the logical "AND" operator, it appears to be
"append together".  line 375 for example.

Comment 42 Luke Kenneth Casson Leighton 2020-03-01 16:23:18 GMT

hiya michael,
http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004661.html

ok the minor_31 opcode is just absolutely ridiculous, so i took a look
at pages 1385-1386 of Version 2.07B and noticed that there are patterns
in the columns.  add occupies the entire LSB "01010" column for example.

sooOooo... what i propose is a two-level hierarchy as follows:

* pre-process the rows by
* first identifying the first 5 bits as keys (let's say N are identifed)
* splitting the rows into N groups (by 5-bit key)

then it becomes possible to create a case-of-case-statements, each with
a separate PowerDecoder(), "automatically" from those groups.

in fact, i suspect that it *might* be possible to specify how to decode
*all* of the instructions in this fashion, with some sort of "specification"
that says "these bits get decoded with this table", then "these bits get
decoded with this table" etc. etc. in a cascade.

i am quite tempted to suggest instead to split minor_31.csv into separate minor_31_5LSBs.csv files rather than do it automatically.

what's your thoughts there?

Comment 43 Luke Kenneth Casson Leighton 2020-03-01 23:08:52 GMT

ok i committed some experimental code (if False:) which gives
the general idea, where PowerDecoder would iterate recursively
through sub-encoders that have the bit-width specified that
they are to look into.

minor_19 i altered from anton's original so that the "19_valid"
check is not needed.  however, again, like 31, because it is
10 bits wide, it now needs a "pre-processing" stage, taking
the bottom 5 bits, first, and creating a list-of-lists which
act as *another* level of recursive hierarchy.

unfortunately for Power ISA, it looks like they have something
like four, five or even six levels of these decoders, i.e.
5 separate areas of the opcode that you must decode sequentially
in order to work out the "true" operation.

if you try to "flatten" any of those, you end up with the horrible
129-bit OR gates (!)

anyway.  do let me know what you think, michael.

Comment 44 Jacob Lifshay 2020-03-01 23:23:41 GMT

(In reply to Luke Kenneth Casson Leighton from comment #43)
> if you try to "flatten" any of those, you end up with the horrible
> 129-bit OR gates (!)

That may not be that big of a problem, since yosys should change that to a 4-level tree of 3 or 4-input OR gates.

One other option is to use one or more ROMs for parts of the decoding process, since they are optimized for those cases where you need dense lookup tables. Something like a 7-bit address would work fine.

Comment 45 Luke Kenneth Casson Leighton 2020-03-01 23:57:53 GMT

(In reply to Jacob Lifshay from comment #44)
> (In reply to Luke Kenneth Casson Leighton from comment #43)
> > if you try to "flatten" any of those, you end up with the horrible
> > 129-bit OR gates (!)
> 
> That may not be that big of a problem, since yosys should change that to a
> 4-level tree of 3 or 4-input OR gates.

when it comes to synthesis: true. you can tell however that this ISA was not "designed", it organically grew.



> One other option is to use one or more ROMs for parts of the decoding
> process, since they are optimized for those cases where you need dense
> lookup tables. Something like a 7-bit address would work fine.

i think... these are effectively Consts being assigned in a suite of Case statements, i have no idea if a ROM lookup would be more efficient, the address decode tree might actually be more gates, whereas these minor opcodes, they are a sparsely populated table with a considerable number of blanks, and a number of "early outs" if you know what i mean.

so i suspect that a full address lookup cascade would be a lot more gates.

i kinda like the code compactness of the Switch and Case system that Michael created, loading from csv files.  it's kinda hilarious - reminds me of when i took the .txt file version of Microsoft's SMB Spec and literally spewed out c code structs and associated network parser from it :)

Comment 46 Michael Nolan 2020-03-02 15:37:40 GMT

(In reply to Luke Kenneth Casson Leighton from comment #43)
> ok i committed some experimental code (if False:) which gives
> the general idea, where PowerDecoder would iterate recursively
> through sub-encoders that have the bit-width specified that
> they are to look into.
> 
> 
> anyway.  do let me know what you think, michael.

So I tried this in 8af7da, and it works... Current minor_31_decoder (split 5 bits and 5 bits):
   Number of cells:                787
Longest topological path in top (length=16)


Old way: 
   Number of cells:                972
Longest topological path in top (length=15)

So it's a little smaller, but a little deeper

Comment 47 Luke Kenneth Casson Leighton 2020-03-02 16:21:13 GMT

(In reply to Michael Nolan from comment #46)

> So I tried this in 8af7da, and it works... 

excellent!

> Current minor_31_decoder (split 5
> bits and 5 bits):
>    Number of cells:                787
> Longest topological path in top (length=16)

ok that's surprisingly not a lot smaller, i would have expected it to drop
quite a lot.

should work great on minor_19 as well.

anyway: with the ability to do "sub-decoding" the idea that i had was to
literally chain the entire lot together, in a tree of decoders, specified
at the top level like in that "if False:" block.

the idea being, we just have to write a tree-like array of specifications,
pass that into *one* PowerDecoder, and it will (recursively) pass the next
part of the spec-array to more PowerDecoder instances.

otherwise we need to set that up "by hand"... with a top-level hand-coded
Switch / Case statement, which, once done, i guarantee you will go,
"hmm, this looks pretty much exactly like a PowerDecoder instance" :)

you want to have a go at that?

Comment 48 Luke Kenneth Casson Leighton 2020-03-03 18:44:46 GMT

hiya michael i've just spotted that minor_19.csv isn't quite ready yet,
i'm working on getting it up to scratch so that it's in a position
to do the hierarchical tree idea.  i may just "cheat" and create
a minor_19_00000.csv table.

Comment 49 Michael Nolan 2020-03-04 15:45:55 GMT

(In reply to Luke Kenneth Casson Leighton from comment #47)

> 
> the idea being, we just have to write a tree-like array of specifications,
> pass that into *one* PowerDecoder, and it will (recursively) pass the next
> part of the spec-array to more PowerDecoder instances.
> 
> otherwise we need to set that up "by hand"... with a top-level hand-coded
> Switch / Case statement, which, once done, i guarantee you will go,
> "hmm, this looks pretty much exactly like a PowerDecoder instance" :)
> 
> you want to have a go at that?

I have something like that semi-working in b2bb2b, but it doesn't include extra.csv or minor_19_00000. 

(In reply to Luke Kenneth Casson Leighton from comment #48)
> hiya michael i've just spotted that minor_19.csv isn't quite ready yet,
> i'm working on getting it up to scratch so that it's in a position
> to do the hierarchical tree idea.  i may just "cheat" and create
> a minor_19_00000.csv table.

I get what extra.csv is for, though I'm not entirely sure how to integrate it with the decoder. What is minor_19_00000.csv for though?

Comment 50 Luke Kenneth Casson Leighton 2020-03-04 16:43:29 GMT

(In reply to Michael Nolan from comment #49)
> (In reply to Luke Kenneth Casson Leighton from comment #47)
> 
> > 
> > the idea being, we just have to write a tree-like array of specifications,
> > pass that into *one* PowerDecoder, and it will (recursively) pass the next
> > part of the spec-array to more PowerDecoder instances.
> > 
> > otherwise we need to set that up "by hand"... with a top-level hand-coded
> > Switch / Case statement, which, once done, i guarantee you will go,
> > "hmm, this looks pretty much exactly like a PowerDecoder instance" :)
> > 
> > you want to have a go at that?
> 
> I have something like that semi-working in b2bb2b, 

nice, will take a look.

> but it doesn't include
> extra.csv or minor_19_00000. 

ok.


> I get what extra.csv is for, though I'm not entirely sure how to integrate
> it with the decoder.

yeah it is effectively at the same "level" as the major opcodes, and needs to be done.. mmmm... before them?

in effect it is extra case statements that need to go into major.csv.

the only "problem" is, the widths are different.  therefore, we need to have *two* switch statements.

i *think* this is doable by making all args of PowerDecode into lists, disabling the "default" switch, and initialising the "default" at the beginning.

> What is minor_19_00000.csv for though?

just as extra.csv is to be added in at the same level as major.csv, minor_19_00000 is likewise needing to be added to minor_19 ad a 2nd set of switches, but with *different* widths.

hence the idea of making PowerDecode args into lists then putting a for-loop around the switch generation.

what do you think?

Comment 51 Luke Kenneth Casson Leighton 2020-03-04 16:50:00 GMT

so a *list* of subdecoders.  comments inline. git pull.

Comment 52 Michael Nolan 2020-03-04 17:17:35 GMT

(In reply to Luke Kenneth Casson Leighton from comment #50)
> (In reply to Michael Nolan from comment #49)
> > (In reply to Luke Kenneth Casson Leighton from comment #47)
> > 
> > > 
> > > the idea being, we just have to write a tree-like array of specifications,
> > > pass that into *one* PowerDecoder, and it will (recursively) pass the next
> > > part of the spec-array to more PowerDecoder instances.
> > > 
> > > otherwise we need to set that up "by hand"... with a top-level hand-coded
> > > Switch / Case statement, which, once done, i guarantee you will go,
> > > "hmm, this looks pretty much exactly like a PowerDecoder instance" :)
> > > 
> > > you want to have a go at that?
> > 
> > I have something like that semi-working in b2bb2b, 
> 
> nice, will take a look.
> 
> > but it doesn't include
> > extra.csv or minor_19_00000. 
> 
> ok.
> 
> 
> > I get what extra.csv is for, though I'm not entirely sure how to integrate
> > it with the decoder.
> 
> yeah it is effectively at the same "level" as the major opcodes, and needs
> to be done.. mmmm... before them?
> 
> in effect it is extra case statements that need to go into major.csv.
> 
> the only "problem" is, the widths are different.  therefore, we need to have
> *two* switch statements.
> 
> i *think* this is doable by making all args of PowerDecode into lists,
> disabling the "default" switch, and initialising the "default" at the
> beginning.
> 
> > What is minor_19_00000.csv for though?
> 
> just as extra.csv is to be added in at the same level as major.csv,
> minor_19_00000 is likewise needing to be added to minor_19 ad a 2nd set of
> switches, but with *different* widths.
> 
> hence the idea of making PowerDecode args into lists then putting a for-loop
> around the switch generation.
> 
> what do you think?

I see now. So instead of passing in the bit selection and csv name, we'd pass in a list of bit selections and csvs, and it would cascade to the next csv if the first one doesn't match?

Comment 53 Luke Kenneth Casson Leighton 2020-03-04 17:36:14 GMT

whoops i realised that it's the arguments to PowerDecoder that need to
be passed in as a *list*, via either an object... actually, Subdecoder
itself would almost do, if it also had "width".

however i'd suggest creating something new called "Decoder".

Comment 54 Luke Kenneth Casson Leighton 2020-03-04 17:46:29 GMT

(In reply to Michael Nolan from comment #52)

> I see now. So instead of passing in the bit selection and csv name, we'd
> pass in a list of bit selections and csvs,

yes.  and that includes width, and the *list* becomes the (almost only)
argument to PowerDecoder.

the width per PowerDecoder list element (a new namedtuple called Decoder?)
has to be added because it specifies the width of the Switch statement.

> and it would cascade to the next
> csv if the first one doesn't match?

or, just... they're all done in parallel, and we assume that there are no overlaps.  actually, hmmm, probably a cascade is safer, althought it would introduce _another_ gate delay *sigh*.

i think we can get away with assuming that the list-of-matches are unique
even though they would be in separate switch statements.

Comment 55 Luke Kenneth Casson Leighton 2020-03-04 19:13:50 GMT

hm even subdecoders needs to be one of the arguments to the
"Decoder" namedtuple.  i put in a Decoder namedtuple in
commit 7da4ec503badf

Comment 56 Luke Kenneth Casson Leighton 2020-03-05 03:10:40 GMT

doh.  no on width being part of Decoders namedtuple. yes however on subdecoders...
frickin 3am am just gonna do it, csn you check it when you have a mo?

Comment 57 Luke Kenneth Casson Leighton 2020-03-05 03:59:59 GMT

oof.  ok.  couldn't help it, was staring at the code instead of sleep.
sorry, michael :)  can you do a review?

Comment 58 Luke Kenneth Casson Leighton 2020-03-05 12:32:25 GMT

nngggh mmoooooo wtf why is the Power manual saying major opcodes are in bits 0-5 however microwatt and chiselwatt all specify the fields in *reverse* order??
moooo?
https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl
majorop := unsigned(f_in.insn(31 downto 26));

page 15 of Power V2.07B

0      6    11    16     21    31
  OPCD   RT    RA    ///    XO   /
  OPCD   RT    RA    RB     XO   /

Comment 59 Michael Nolan 2020-03-05 14:06:11 GMT

(In reply to Luke Kenneth Casson Leighton from comment #58)
> nngggh mmoooooo wtf why is the Power manual saying major opcodes are in bits
> 0-5 however microwatt and chiselwatt all specify the fields in *reverse*
> order??
> moooo?
> https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl
> majorop := unsigned(f_in.insn(31 downto 26));
> 
> page 15 of Power V2.07B
> 
> 0      6    11    16     21    31
>   OPCD   RT    RA    ///    XO   /
>   OPCD   RT    RA    RB     XO   /

Right, endianness. Ugh.

Comment 60 Luke Kenneth Casson Leighton 2020-03-05 14:39:52 GMT

(In reply to Michael Nolan from comment #59)

> > page 15 of Power V2.07B
> > 
> > 0      6    11    16     21    31
> >   OPCD   RT    RA    ///    XO   /
> >   OPCD   RT    RA    RB     XO   /
> 
> Right, endianness. Ugh.

ah, maaan - it's byte-endian, isn't it?  so it's not quite as straightforward as bit-reversal.  *sigh*.

i'm tempted to suggest complying with what's in the spec rather than
following exactly what anton did - then again that implies a lot of
work.

or having a close comparative look at chiselwatt.

Comment 61 Jacob Lifshay 2020-03-05 15:46:33 GMT

We should check the spec to see what its conventions are -- some use bit 0 as MSB, some as LSB. Also, the memory interface may or may not byte swap.

Comment 62 Luke Kenneth Casson Leighton 2020-03-05 16:24:40 GMT

(In reply to Jacob Lifshay from comment #61)
> We should check the spec to see what its conventions are -- some use bit 0
> as MSB, some as LSB. Also, the memory interface may or may not byte swap.

i *think* it's a combination of these two things, from V2.07B, P35, section
1.3.2 "Notation":

- Bits in instructions, fields, and bit strings are
  numbered from left to right, starting with bit 0

- The leftmost bit of a sequence of bits is the
  most significant bit of the sequence.

which is about as obtuse and technically correct as it's possible to get.
having *separate* numbering for the ordering of the bits then specifying
that the bits are ordered in MSB??

wtf?? :)

Comment 63 Luke Kenneth Casson Leighton 2020-03-05 18:03:43 GMT

https://libre-riscv.org/openpower/isatables/fields/
(accessible as the straight .txt file from the git repo)

i literally cut/paste that from the pdf (xpdf) and tidied it up.
it would have been much better to have the source of the original
document, particularly if it was in .tex format, but oh well.

i've also added the "form" onto the end of the CSV files. some of
them are missing because i think they're from POWER 3.0 (whoops)

Comment 64 Jacob Lifshay 2020-03-05 18:07:34 GMT

(In reply to Luke Kenneth Casson Leighton from comment #63)
> i literally cut/paste that from the pdf (xpdf) and tidied it up.
> it would have been much better to have the source of the original
> document, particularly if it was in .tex format, but oh well.

They might be willing to give us the source, try asking :)

Comment 65 Luke Kenneth Casson Leighton 2020-03-05 18:11:06 GMT

make sure to do form-lookup for:

* attn
* sim_cfg

done (found in V3.0B)

* bperm
* cmpeqb
* darn
* dzbz
* extswsli
* mcrxrx
* modud
* moduw
* modsd
* modsw
* setb

Comment 66 Luke Kenneth Casson Leighton 2020-03-05 18:11:32 GMT

(In reply to Jacob Lifshay from comment #64)

> They might be willing to give us the source, try asking :)

true! :) 

(mailing list server's offline at the moment though)

Comment 67 Luke Kenneth Casson Leighton 2020-03-05 22:06:08 GMT

ok i've written a parser for the fields.txt file, the idea being that
it will create a series of objects that can be walked inside the code
to access fields by an appropriate name, depending on the context
(the instruction, basically).

those data structures should not need to go actually into the HDL
(i don't think), they are there to be used in the appropriate code
(the load function and so on) or perhaps as a case/switch statement
like in PowerDecoder to get the appropriate register entries and
other fields.

have to see how it goes, i think.

Comment 68 Luke Kenneth Casson Leighton 2020-03-06 10:19:41 GMT

*face-palm*.  i went to all the trouble of putting the fields into
a text file yesterday, thinking that there was missing information,
wrote a parser for it... then realised that the 3.0B specification
had everything that's needed.

example:

EO (11:15)
    Expanded opcode field
    Formats: VX, X, XX2

that was the information i thought was missing (the Formats).
the idea was: parse the entirety of the opcode-format tables,
extract the Formats that they apply to.  then in the nmigen
code, it's possible to create structures where we can do:

(start, end) Formats.VX.EO[0:1]

but... duhhh, that's already in section 1.7 of 3.0B duhhh, you
can see above.

*sigh*.

well, at least we have a consistency checker.

Comment 69 Luke Kenneth Casson Leighton 2020-03-06 21:32:52 GMT

ok i now have the other section of the specification decoded (1.7 Instruction Fields in 3.0B, 1.6.8 Instruction Fields in 2.07B)

this basically will allow us to construct python objects that, when used, will allow the following:

FormX.FRSp[0:-1]

this will return bits 6 thru 10 of the instruction (*sigh* actually bits 31-6 thru 31-10) because all the indices are appropriately recorded in memory.

this even works on multi-bit fields such as this:

dc,dm,dx (25,29,11:15)
    Immediate fields that a
    Data Class Mask.
    Formats: XX2

which would be accessed as:

FormXX2.dc_dm_dx[0:7] where 

FormXX2.dc_dm_dx[0] would return bit 25 (okok 31-25)
FormXX2.dc_dm_dx[1] would return bit 29 (okok 31-29)

and so on.

it's still a bit of work to do.

i think this will be a heck of a lot easier than messing about trying to memorise bit-field numbering, constantly having to refer back to the PDF spec.

Comment 70 Luke Kenneth Casson Leighton 2020-03-08 15:31:26 GMT

https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/power_decoder2.py;h=34ac66836870324a55840d890ffc741a98845fb5;hb=HEAD

ok made a start on the register / immediate / spr / etc. decoder.  it will
need careful double-checking (help appreciated) not just against anton's
original work (decode2.vhdl and other files), against the PDF V3.0B manual
as well.

Comment 71 Luke Kenneth Casson Leighton 2020-03-08 18:18:07 GMT

https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/power_decoder2.py;h=5d706d9386e2f65e666d4a06408d43c1547c3913;hb=HEAD#l315

pulling it all together, some pieces still missing, needs a bit of rework:
classes that return "register num plus register valid" as an object, that
sort of thing, otherwise there's a lot of things to copy, laboriously.

Comment 72 Luke Kenneth Casson Leighton 2020-03-08 20:31:37 GMT

(In reply to Luke Kenneth Casson Leighton from comment #71)

> classes that return "register num plus register valid" as an object, that

sorted.

Comment 73 Michael Nolan 2020-03-18 15:49:56 GMT

Huh, apparently the register numbers need to be bit reversed in our decoder.

Comment 74 Luke Kenneth Casson Leighton 2020-03-18 16:07:16 GMT

(In reply to Michael Nolan from comment #73)
> Huh, apparently the register numbers need to be bit reversed in our decoder.

eeeverything needs to be bit-reversed. see 1.3.2 "notation"

- Bits in instructions, fields, and bit strings are
  numbered from left to right, starting with bit 0
- The leftmost bit of a sequence of bits is the
  most significant bit of the sequence.

*gibber*.

so i put the reversing into.. err... where was it...
SignalBitRange.__getitem__() - you can see (i just commented it)

if you can double-check that, it would be really good.  i tend
to get these things wrong.

ultimately, we'll find out pretty damn quick when it comes to
executing actual code.

Comment 75 Michael Nolan 2020-03-18 16:12:33 GMT

(In reply to Luke Kenneth Casson Leighton from comment #74)
> (In reply to Michael Nolan from comment #73)
> > Huh, apparently the register numbers need to be bit reversed in our decoder.
> 
> eeeverything needs to be bit-reversed. see 1.3.2 "notation"

Well yes... But I thought we did that already, evidently not. 

> if you can double-check that, it would be really good.  i tend
> to get these things wrong.

Ok

Comment 76 Jacob Lifshay 2020-03-18 17:14:12 GMT

From what I understand, it's just the way bits are numbered that's reversed, the MSB is still the MSB. So, just the field order is reversed, individual fields are not since they are treated as a multi-bit number and the MSB is still the MSB (and other bits also keep their place in a number).

Example:
Power bit order:
bit # 0 1 2 3 4 5 6 7
      0 1 0 1 1 1 1 0

fields: A:0-3, B:4, C:5-7

field A 0-3: 0101 = 5
field B 4: 1 = 1
field C 5-7: 110 = 6

switched bit numbering order:

nmigen/RISC-V bit order:
bit # 7 6 5 4 3 2 1 0
      0 1 0 1 1 1 1 0

fields: C:0-2, B:3, A:4-7

fields have same value but different order:
field C 0-2: 110 = 6
field B 3: 1 = 1
field A 4-7: 0101 = 5

Comment 77 Luke Kenneth Casson Leighton 2020-03-18 19:27:22 GMT

(In reply to Jacob Lifshay from comment #76)
> From what I understand, it's just the way bits are numbered that's reversed,
> the MSB is still the MSB. So, just the field order is reversed, individual
> fields are not since they are treated as a multi-bit number and the MSB is
> still the MSB (and other bits also keep their place in a number).

ngggh mindbender.  so by doing width-idx-1 it has reversed the bitorder of each bit in the field.

ye gods.

can you and michael take a look at this properly, suggest reviewing the pearpc and gem5 source code, and chiselwatt.

Comment 78 Jacob Lifshay 2020-03-18 19:31:31 GMT

In any case, what matters is that the instructions are decoded correctly.

Some example disassembly from objdump (edited to convert decimal to hex):

test.o:     file format elf64-powerpcle


Disassembly of section .text:

0000000000000000 <f>:
   0:   01 00 20 3d     addis   r9,0,1
   4:   45 23 29 61     ori     r9,r9,0x2345
   8:   d2 49 84 7c     mulld   r4,r4,r9
   c:   d2 19 64 7c     mulld   r3,r4,r3
  10:   20 00 80 4e     bclr    20,lt

Note how the immediate values for ori and addis are not bit-reversed (they are in little-endian byte order, though). I haven't checked, but I'm quite sure the register fields are also not bit-reversed.

Comment 79 Michael Nolan 2020-03-18 20:51:20 GMT

Ok, so I've got an instruction, for example: 
add 4, 16, 7
which gets assembled to:
0x7c903a14
or:
value:   0111 1100 1001 0000 0011 1010 0001 0100
field:    oooo oott ttta aaaa       mmm mmmm mmm
number:  0123 4567 8901 2345 6789 0123 4567 8901

o: Opcode = 31
m: Opcode minor = 266/0b0100001010 - ADD - form XO
t: RT = 0b00100 = 4

This all looks right, right?

Somehow the decoder is getting write_reg=8 (write_reg should be RT) out of this, and I'm not sure how. 0b00100 is 4 regardless of bit ordering...

I double checked, and pdecoder.dec.RT is 
SignalBitRange([(0, 6), (1, 7), (2, 8), (3, 9), (4, 10)])
which seems right. Any idea of where to look?

Comment 80 Luke Kenneth Casson Leighton 2020-03-18 21:23:43 GMT

rright.  ok.  so i think, to "fix" this, in SignalBitRange, the slice indices - start, end (and direction) all need to be reversed *as well*.

oink.

Comment 81 Luke Kenneth Casson Leighton 2020-03-18 21:28:29 GMT

https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/power_fieldsn.py;h=a5e03a114ec54d915deb13f20c99e8ab59078dc5;hb=HEAD#l31

try subtracting start and end from from len-1, and sign inverting step, at line 31

Comment 82 Michael Nolan 2020-03-18 21:38:08 GMT

(In reply to Michael Nolan from comment #79)
> Ok, so I've got an instruction, for example: 
> add 4, 16, 7
> which gets assembled to:
> 0x7c903a14
> or:
> value:   0111 1100 1001 0000 0011 1010 0001 0100
> field:    oooo oott ttta aaaa       mmm mmmm mmm
> number:  0123 4567 8901 2345 6789 0123 4567 8901
> 
> o: Opcode = 31
> m: Opcode minor = 266/0b0100001010 - ADD - form XO
> t: RT = 0b00100 = 4
> 
> This all looks right, right?
> 
> Somehow the decoder is getting write_reg=8 (write_reg should be RT) out of
> this, and I'm not sure how. 0b00100 is 4 regardless of bit ordering...
> 
> I double checked, and pdecoder.dec.RT is 
> SignalBitRange([(0, 6), (1, 7), (2, 8), (3, 9), (4, 10)])
> which seems right. Any idea of where to look?

Oh I see. It's bit reversing the whole word

Comment 83 Michael Nolan 2020-03-18 22:13:35 GMT

(In reply to Luke Kenneth Casson Leighton from comment #81)
> https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/
> power_fieldsn.py;h=a5e03a114ec54d915deb13f20c99e8ab59078dc5;hb=HEAD#l31
> 
> try subtracting start and end from from len-1, and sign inverting step, at
> line 31

It's giving a key error because the index (31) isn't in the "dict" (I printed out the object and it gave SignalBitRange([(0, 31)]), and I assume it treats that as {0: 31}).

What's the purpose of line 1?
                k = OrderedDict.__getitem__(self, t)
                res.append(self.signal[k]) # reverse-order here
I think it'd work ok if I just grabbed the bit from the signal directly, like so:
                res.append(self.signal[t])

Comment 84 Luke Kenneth Casson Leighton 2020-03-18 23:20:08 GMT

(In reply to Michael Nolan from comment #83)
> (In reply to Luke Kenneth Casson Leighton from comment #81)
> > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/
> > power_fieldsn.py;h=a5e03a114ec54d915deb13f20c99e8ab59078dc5;hb=HEAD#l31
> > 
> > try subtracting start and end from from len-1, and sign inverting step, at
> > line 31
> 
> It's giving a key error because the index (31) isn't in the "dict" (I
> printed out the object and it gave SignalBitRange([(0, 31)]), and I assume
> it treats that as {0: 31}).
> 
> What's the purpose of line 1?
>                 k = OrderedDict.__getitem__(self, t)
>   

gets the remapping of what bit you want in the field onto what ACTUAL bit you want in the instruction.


              res.append(self.signal[k]) # reverse-order here
> I think it'd work ok if I just grabbed the bit from the signal directly,
> like so:
>                 res.append(self.signal[t])

nope.

that will get you bit 0 of the instruction when you request bit 0 ofa **FIELD** in the instruction.

so, apologies it was reverse the order of t not k

Comment 85 Luke Kenneth Casson Leighton 2020-03-18 23:27:01 GMT

line 32
for t in range(lenself - stop - 1, lenself - start - 1)

except this is wrong because bits are not stepped properly


so it is probably

l = list(range(start, stop, step)
l = l.reverse() # probably
for t in l:
   t = len(self) - t - 1
   carry on

for the else bit just do the t sub-from-self before lookup, too

Comment 86 Luke Kenneth Casson Leighton 2020-03-19 00:21:06 GMT

really like the gnu gas test idea. could you also plan for covering BE as well as LE?
but... later, we have enough brainmelting going on

Comment 87 Luke Kenneth Casson Leighton 2020-03-19 00:24:42 GMT

(In reply to Michael Nolan from comment #82)

> Oh I see. It's bit reversing the whole word

yes.  that is what width-k-1 is for.

 and then for the field which is local that needs to be reversed *back*.
this is what we are missing.

k is into the instruction (0..31)
t is into the FIELD and just to make life rreally interesting it can cover MULTIPLE fields.

joy.

Comment 88 Luke Kenneth Casson Leighton 2020-03-19 03:00:55 GMT

michael, fired up laptop brielfly when stopped git commited field-index reverse please check it, should work.

Comment 89 Luke Kenneth Casson Leighton 2020-03-19 06:56:07 GMT

ok i fixed some dumb off-by-one errors in decode_fieldsn.py and the gas example works (yay!)  am tempted to suggest throwing a ton more random stuff at it.

Comment 90 Michael Nolan 2020-03-19 15:29:55 GMT

(In reply to Luke Kenneth Casson Leighton from comment #89)
> ok i fixed some dumb off-by-one errors in decode_fieldsn.py and the gas
> example works (yay!)  am tempted to suggest throwing a ton more random stuff
> at it.

Yep, looks like you fixed the register thing. 
I'll get to adding more instructions later today

Comment 91 Luke Kenneth Casson Leighton 2020-03-19 15:33:27 GMT

(In reply to Michael Nolan from comment #90)

> Yep, looks like you fixed the register thing. 

dang that was a good guess

> I'll get to adding more instructions later today

i love this idea.  btw did we raise a specific bugreport for using gnu-as?

Comment 92 Michael Nolan 2020-03-19 17:11:21 GMT

(In reply to Luke Kenneth Casson Leighton from comment #91)

> 
> i love this idea.  btw did we raise a specific bugreport for using gnu-as?

Yes: http://bugs.libre-riscv.org/show_bug.cgi?id=212

Comment 93 Luke Kenneth Casson Leighton 2020-03-21 15:44:32 GMT

y'know what, michael... you see in soc experiments comp6600 there is the beginnings of a Simulator?

i have an idea, it might be worthwhile to run with that to its logical obvious implication, namely to actually have a simulator, in soc, in python, which interprets the InternalOps.

we then have a reference implementation which we can document properly etc. with links to the PDF.

also it means we have to write the code and so understsnd it.

speed is not crucial as the nmigen simulator will be screamingly slow. we can use cocotb later.

also at some point we can drop-in power-gem5.

what does everyone think?

Comment 94 Michael Nolan 2020-03-21 18:59:45 GMT

(In reply to Luke Kenneth Casson Leighton from comment #93)
> y'know what, michael... you see in soc experiments comp6600 there is the
> beginnings of a Simulator?
> 
> i have an idea, it might be worthwhile to run with that to its logical
> obvious implication, namely to actually have a simulator, in soc, in python,
> which interprets the InternalOps.
> 
> we then have a reference implementation which we can document properly etc.
> with links to the PDF.
> 
> also it means we have to write the code and so understsnd it.
> 
> speed is not crucial as the nmigen simulator will be screamingly slow. we
> can use cocotb later.
> 
> also at some point we can drop-in power-gem5.
> 
> what does everyone think?

Yeah, seems reasonable. This wouldn't be cycle accurate or anything right, just a simulator for our eventual backend?

Comment 95 Luke Kenneth Casson Leighton 2020-03-21 20:10:32 GMT

(In reply to Michael Nolan from comment #94)

> Yeah, seems reasonable. This wouldn't be cycle accurate or anything right,

well, cycle accurate just means that everything on each clock is exactly as it would be after one clock

it's actually harder to do a nonaccurate simulator as it involves JIT translation etc.

cycle accurate means doing TLBs in the simulator, too, and that means we have something to test the hardware against, there, too.

> just a simulator for our eventual backend?

yes.  however adding a power frontend on it as we found ain't so bad.  can use the same code.

if you look here you can see how basic the test simulator is
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/score6600.py;h=babce2439233f84c262c08795b873a6167197ec9;hb=HEAD#l859

adjusting that so it uses InternalOp is... er... not hard? :)

Comment 96 Michael Nolan 2020-03-23 20:12:13 GMT

(In reply to Luke Kenneth Casson Leighton from comment #95)

> yes.  however adding a power frontend on it as we found ain't so bad.  can
> use the same code.
> 
> if you look here you can see how basic the test simulator is
> https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/score6600.
> py;h=babce2439233f84c262c08795b873a6167197ec9;hb=HEAD#l859
> 
> adjusting that so it uses InternalOp is... er... not hard? :)

Luke, looks like you and I are doing different things...? It looks to me like score6600 is actual hardware, not a simulator. I have basic simulator I'm working on here https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/internalop_sim.py;h=813dd19c40e21283d000fd597074a2cb38595a8a;hb=e95b4372c5163bb6f753c92d8445d4af7f117457 - I took some of the elements from score6600 (like the register and memory file) and modified them to get a basic simulator, which I'm testing along side a real power machine. Were you going for something different...?

Comment 97 Luke Kenneth Casson Leighton 2020-03-23 21:28:58 GMT

(In reply to Michael Nolan from comment #96)
> (In reply to Luke Kenneth Casson Leighton from comment #95)
> 
> > yes.  however adding a power frontend on it as we found ain't so bad.  can
> > use the same code.
> > 
> > if you look here you can see how basic the test simulator is
> > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/score6600.
> > py;h=babce2439233f84c262c08795b873a6167197ec9;hb=HEAD#l859
> > 
> > adjusting that so it uses InternalOp is... er... not hard? :)
> 
> Luke, looks like you and I are doing different things...? It looks to me
> like score6600 is actual hardware, not a simulator.

i'd literally just moved RegSim and MemorySim into their own folder, you put internalop_sim.py in the same location.

they should be pretty much identical in function.


> I have basic simulator
> I'm working on here
> https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/
> internalop_sim.py;h=813dd19c40e21283d000fd597074a2cb38595a8a;
> hb=e95b4372c5163bb6f753c92d8445d4af7f117457 - I took some of the elements
> from score6600 (like the register and memory file) and modified them to get
> a basic simulator, which I'm testing along side a real power machine. Were
> you going for something different...?

take a look at sim.py it is near identical to what you did :)

although you have, i think, also added a decoder on it?  so what you wrote can take actual power instructions, is that right?

sim.py "tracks" what the real hardware does.  so the idea is that after running the exact same instructions through each, and waiting for the instruction queue to indicate "no longer busy processing", we can do a memory-memory compare and a regfile-regfile compare and they should be *exactly* the same.

i wrote a function which does exactly that, one for mem and one for regfile.

unfortunately because this is a multiissue  OoO design we cannot take a snapshot at a given time and say "yeah this will be exactly the same", you have to pause, wait for outstanding ops, *then* snapshot.

that "pause" means unfortunately that it affects internal state, because FUs are emptying rather than being filled with new operations.

but, it is the best we can do.

oh btw, one really important thing to remember: at sone point the sim will do dynamic SIMD just like in the PartitionedSignal.  code to reuse in InternalOpSim has already been written... in the PartitionedSignal unit tests!

and, actually, hm, we could probably just lift that code wholesale into a class then rework the psig unit test to use it, *and* use that same code here.

Comment 98 Luke Kenneth Casson Leighton 2020-03-24 01:28:01 GMT

https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;hb=HEAD

y'know... the comment "checked this against qemu" got me thinking. this looks like it could be used to actually run any of:

* qemu
* the simulator
* the score6600 engine
* pearpc
* gem5

i know qemu can be run in bare metal mode, console only, for example.  it would not be cycle accurate however would give us a jumpstart on running really basic programs.

Comment 99 Luke Kenneth Casson Leighton 2020-03-24 01:49:41 GMT

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_administration_guide/sect-vish-dump

crashdump --memory only would get a .elf file with memory and registers.

executing a bare metal .bin file under qemu with a small memory would give something not too massive to parse

https://github.com/eliben/pyelftools/blob/master/examples/dwarf_decode_address.py

hmm it seems there are at least 2 different python elf parsers.

the only thing we have to watch, these are big dependencies we're pulling in (i'm on a mobile 4G connection). luckily i have qemu already installed

Comment 100 Jacob Lifshay 2020-03-24 06:14:55 GMT

one issue with qemu is that it doesn't properly implement everything -- for example, it doesn't appear to implement the FPSCR.FR bit (I've been reading its source as an additional reference for simple-soft-float).

Comment 101 Luke Kenneth Casson Leighton 2020-03-24 08:50:51 GMT

(In reply to Jacob Lifshay from comment #100)
> one issue with qemu is that it doesn't properly implement everything -- for
> example, it doesn't appear to implement the FPSCR.FR bit

yuck

> (I've been reading
> its source as an additional reference for simple-soft-float).

ok so when it comes to FP we can't expect that to work.  by then we should be a long way forward at least with running something.

Comment 102 Luke Kenneth Casson Leighton 2020-03-25 12:49:36 GMT

(In reply to Luke Kenneth Casson Leighton from comment #98)
> https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.
> py;hb=HEAD
> 
> y'know... the comment "checked this against qemu" got me thinking. this
> looks like it could be used to actually run any of:
> 
> * qemu

what do you think, michael: does firing off qemu to run those same
(tiny) binaries sound practical?

Comment 103 Michael Nolan 2020-03-25 16:21:08 GMT

(In reply to Luke Kenneth Casson Leighton from comment #102)
> (In reply to Luke Kenneth Casson Leighton from comment #98)
> > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.
> > py;hb=HEAD
> > 
> > y'know... the comment "checked this against qemu" got me thinking. this
> > looks like it could be used to actually run any of:
> > 
> > * qemu
> 
> what do you think, michael: does firing off qemu to run those same
> (tiny) binaries sound practical?

It doesn't seem like a bad idea, especially since we don't have another power simulator to compare to yet. Getting the instructions/data into qemu is pretty easy, just compile and link a bare metal kernel. I've got a folder with my manual method for this already, and I can add it to the git repo. 
However, getting the data (registers/memory) out of qemu might be a bit harder. Probably the best way would be to use the gdb socket that qemu makes available for reading registers and memory. I know GDB is written with python in mind, so it *should* be doable to control it from python, but I haven't investigated that yet.

Comment 104 Luke Kenneth Casson Leighton 2020-03-25 17:17:09 GMT

just taking a closer look at python-libvirt, and trying to find the examples i saw which turn qemu params into xml files

also this:

https://www.programcreek.com/python/example/28110/libvirt.open
canCreateVM - is able to run virsh, which will get a coredump
(including regfile as well as memory).

see earlier comments, yes do drop the qemu example into a subfolder.

be with you in a sec, just investigating.

Comment 105 Michael Nolan 2020-03-25 17:27:03 GMT

(In reply to Luke Kenneth Casson Leighton from comment #104)

> see earlier comments, yes do drop the qemu example into a subfolder.

It's now uploaded here: https://git.libre-riscv.org/?p=soc.git;a=tree;f=src/soc/simulator/qemu_test;h=50cde50b45e9af13b3c070c8b9618a956f63d3b5;hb=40c5ec1f0df4bed1d78b1d0d019b399450d87ced

Comment 106 Luke Kenneth Casson Leighton 2020-03-25 17:29:51 GMT

https://libvirt.org/drvqemu.html

shows how to convert qemu to xml if you have some args already ("virsh domxml-from-native")

Comment 107 Luke Kenneth Casson Leighton 2020-03-25 19:43:24 GMT

currently attempting to get python-libvirt to "start" launch.sh.
wooow is this painful.  obtuseness beyond obtuseness.

/usr/bin/qemu-system-ppc64 \
-name guest=Linux-0.2,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-Linux-0.2/master-key.aes \
-machine powernv9,accel=tcg,usb=off,dump-guest-core=off \
-m 1024 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid ce1326f0-a9a0-11e3-a5e2-0800200c9a66 \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=33,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-kernel /home/lkcl/testdisk.raw \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2020-03-25T19:38:07.721499Z qemu-system-ppc64: error creating device tree: (fdt_property_string(fdt, "system-id", buf)): FDT_ERR_BADSTATE
2020-03-25 19:38:07.758+0000: shutting down, reason=failed


https://bugzilla.redhat.com/show_bug.cgi?id=1624539

nggggggh.

attempting to downgrade qemu-system-ppc so that the older version of libfdt can be used

apt-get install libfdt1=1.4.2-1 qemu-system-ppc=1:3.1+dfsg-8+deb10u3 qemu-system=1:3.1+dfsg-8+deb10u3 qemu-system-misc=1:3.1+dfsg-8+deb10u3

Comment 108 Luke Kenneth Casson Leighton 2020-03-25 19:58:30 GMT

https://bugzilla.redhat.com/show_bug.cgi?id=1585991

woooow....

Comment 109 Michael Nolan 2020-03-25 20:03:28 GMT

I'm going to try using https://sourceware.org/gdb/wiki/PythonGdb to see if I can't get something working that way.

Comment 110 Michael Nolan 2020-03-25 21:20:35 GMT

I've got a rudimentary qemu-gdb interface working in qemu.py: https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=70e29b0d76b65227a06d3ba3bd95315cf5af37e8;hb=3f7e6c771c7f8df65e67e6dc350964a54aee8066

It's able to load a kernel into qemu, set up a breakpoint at the start of it, single step it, and dump the register contents.

Comment 111 Luke Kenneth Casson Leighton 2020-03-26 05:40:30 GMT

deep breath....

install dependencies (saves messing about) with
apt-get build-dep qemu

git clone from here:

[remote "origin"]
    url = https://github.com/legoater/qemu.git
    fetch = +refs/heads/*:refs/remotes/origin/*

git checkout this branch:
* origin/powernv-next

configure/build with this:
./configure --enable-system --target-list=ppc64-softmmu,ppc64-linux-user,ppc64le-linux-user

edit hw/ppc/pnv.c and comment out these lines:

        /*
    if (qemu_uuid_set) {
        _FDT((fdt_property_string(fdt, "system-id", buf)));
    }
    */

run python3 domstart.py (files to be attached shortly)

dump the state to a file:

virsh # dump Linux-0.2 mem.hex --memory-only --crash


objdump the results with this:

objdump -x mem.hex

which gives the following (including regfile and memory):

architecture: powerpc:common64, flags 0x00000000:

start address 0x0000000000000000

Program Header:
    NOTE off    0x00000000000000b0 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**0
         filesz 0x0000000000000710 memsz 0x0000000000000710 flags ---
    LOAD off    0x00000000000007c0 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**0
         filesz 0x0000000040000000 memsz 0x0000000040000000 flags ---

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 note0         00000710  0000000000000000  0000000000000000  000000b0  2**0
                  CONTENTS, READONLY
  1 .reg/0        00000180  0000000000000000  0000000000000000  00000134  2**2
                  CONTENTS
  2 .reg          00000180  0000000000000000  0000000000000000  00000134  2**2
                  CONTENTS
  3 .reg2/0       00000108  0000000000000000  0000000000000000  000002d0  2**2
                  CONTENTS
  4 .reg2         00000108  0000000000000000  0000000000000000  000002d0  2**2
                  CONTENTS
  5 load1         40000000  0000000000000000  0000000000000000  000007c0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY

and thus can be parsed with python elf libraries mentioned in previous
comments.

Comment 112 Luke Kenneth Casson Leighton 2020-03-26 05:40:53 GMT

Created attachment 47 [details]
libvirt xml file

Comment 113 Luke Kenneth Casson Leighton 2020-03-26 05:41:30 GMT

Created attachment 48 [details]
copy of python libvirt domain start example

obtained from here:
https://github.com/libvirt/libvirt-python/tree/master/examples

Comment 114 Luke Kenneth Casson Leighton 2020-03-26 05:54:13 GMT

actually, i like what you wrote a whooole lot better, michael:
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=c9b1b573dd22d3da0adb0415a0a6b8bb005c99c6;hb=4b50a170c413f47e0527c843be49babc493c684b

the only modification i needed:

set architecture powerpc:common64
target remote localhost:1234
layout asm
b *0x20000000
c

and to use debian package "gdb-multilib" which *sigh* had to come from
debian/unstable.

building cross-compilers etc. from source has always stressed me out :)

if we do that, we need a proper documented reproducible process.

how did you get powerpc64-linux-gnu-gdb?

Comment 115 Luke Kenneth Casson Leighton 2020-03-26 11:02:24 GMT

i enabled export PYTHONTRACEMALLOC=25
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONTRACEMALLOC

/usr/lib/python3.7/subprocess.py:858: ResourceWarning: subprocess 1551551 is still running
  ResourceWarning, source=self)
Object allocated at (most recent call last):
  File "simulator/test_sim.py", lineno 106
    unittest.main()
  File "/usr/lib/python3.7/unittest/main.py", lineno 101
    self.runTests()
  File "/usr/lib/python3.7/unittest/main.py", lineno 271
    self.result = testRunner.run(self.test)
  File "/usr/lib/python3.7/unittest/runner.py", lineno 176
    test(result)
  File "/usr/lib/python3.7/unittest/suite.py", lineno 84
    return self.run(*args, **kwds)
  File "/usr/lib/python3.7/unittest/suite.py", lineno 122
    test(result)
  File "/usr/lib/python3.7/unittest/suite.py", lineno 84
    return self.run(*args, **kwds)
  File "/usr/lib/python3.7/unittest/suite.py", lineno 122
    test(result)
  File "/usr/lib/python3.7/unittest/case.py", lineno 663
    return self.run(*args, **kwds)
  File "/usr/lib/python3.7/unittest/case.py", lineno 615
    testMethod()
  File "simulator/test_sim.py", lineno 58
    self.run_test_program(program, [1, 2, 3, 4])
  File "simulator/test_sim.py", lineno 95
    with run_program(prog) as q:
  File "/home/lkcl/src/libreriscv/soc/src/soc/simulator/qemu.py", lineno 54
    q = QemuController(program.binfile.name)
  File "/home/lkcl/src/libreriscv/soc/src/soc/simulator/qemu.py", lineno 16
    stdin=subprocess.PIPE)
simulator/test_sim.py:58: ResourceWarning: unclosed file <_io.BufferedWriter name=4>
  self.run_test_program(program, [1, 2, 3, 4])


this gets rid of it:

        self.qemu_popen.stdout.close()
        self.qemu_popen.stdin.close()

except one of the qemus is still running, thissays you have
to call qemu_popen.communicate():
https://docs.python.org/3/library/subprocess.html

fixed with a quick commit.

Comment 116 Luke Kenneth Casson Leighton 2020-03-26 11:10:39 GMT

i've also added pygdbmi to the dependencies in setup.py and added a way
to get hold of and compile up powerpc64 gdb.  michael (and anyone else)
could you take a look?
https://git.libre-riscv.org/?p=soc.git;a=blob;f=README.md;hb=HEAD

Comment 117 Michael Nolan 2020-03-26 14:13:20 GMT

(In reply to Luke Kenneth Casson Leighton from comment #115)

> 
> except one of the qemus is still running, thissays you have
> to call qemu_popen.communicate():
> https://docs.python.org/3/library/subprocess.html
> 
> fixed with a quick commit.

Oh, thank you! I was struggling to figure out where this was coming from.

Comment 118 Michael Nolan 2020-03-26 14:14:13 GMT

(In reply to Luke Kenneth Casson Leighton from comment #116)
> i've also added pygdbmi to the dependencies in setup.py and added a way
> to get hold of and compile up powerpc64 gdb.  michael (and anyone else)
> could you take a look?
> https://git.libre-riscv.org/?p=soc.git;a=blob;f=README.md;hb=HEAD

Yep, that looks right

Comment 119 Michael Nolan 2020-03-26 14:19:40 GMT

>coooool.
>
>confirmed able to duplicate that, here.
>
>i did encounter KeyError, register-values not present but it is spurious
>and unreliably occurring.

Huh. It sounds like that's from the register read code, where it reaches in to a largeish dict to grab the register values. I'll have to see if I can replicate it on my end.

>
>i added an assert to investigate and, sigh, was no longer seeing failures
>even after running 20 times.
>
>the other really useful addition would be memory-dump / memory location
>reading.

That shouldn't be too hard to add. Similar to the register comparison code, it'll need to only do it to specific locations as comparing all of memory would not only take too long, but disagree because qemu loads the kernel and some other stuff into memory and the simulator does not. 

>btw change of topic, i noticed, just like in the original RegSim, OP_ADD is
>not joined by OP_SUB because Anton chose to do "add1 to op1" as well as
>"invert op1" in the InternalOp csv.
>
>therefore we need to pass the whole of the decode row to the
>execute_alu_sim function, rather than just the enum InternalOp value.

Ah ok.

Comment 120 Luke Kenneth Casson Leighton 2020-03-26 14:43:54 GMT

(In reply to Michael Nolan from comment #118)
> (In reply to Luke Kenneth Casson Leighton from comment #116)

> >the other really useful addition would be memory-dump / memory location
> >reading.
> 
> That shouldn't be too hard to add. Similar to the register comparison code,

yes just the right voodoo gdb incantation.  i think it allows byte word dword selection.

> it'll need to only do it to specific locations as comparing all of memory
> would not only take too long, but disagree because qemu loads the kernel and
> some other stuff into memory and the simulator does not. 

urk :)

> 
> >btw change of topic, i noticed, just like in the original RegSim, OP_ADD is
> >not joined by OP_SUB because Anton chose to do "add1 to op1" as well as
> >"invert op1" in the InternalOp csv.
> >
> >therefore we need to pass the whole of the decode row to the
> >execute_alu_sim function, rather than just the enum InternalOp value.
> 
> Ah ok.

yeh :)

should be very straightforward (you want to do it?) a separate if elif elif, "if op1.value == Op1.ONE op1 += 1"

can skip carry for now, add as TODO.

then also if row['op1neg'] op1 ^= 0xffffffffffffffff

something like that.

or op1 = (~op1) & 0xffffffffffffffff

that _should_ be sufficient to get sub, subi and neg all working, with not a lot of effort.

Comment 121 Michael Nolan 2020-03-26 14:56:30 GMT

(In reply to Luke Kenneth Casson Leighton from comment #120)

> 
> should be very straightforward (you want to do it?) a separate if elif elif,
> "if op1.value == Op1.ONE op1 += 1"
> 
> can skip carry for now, add as TODO.
> 
> then also if row['op1neg'] op1 ^= 0xffffffffffffffff
> 
> something like that.
> 
> or op1 = (~op1) & 0xffffffffffffffff
> 
> that _should_ be sufficient to get sub, subi and neg all working, with not a
> lot of effort.

Working now. Need to sort out setting the carry and overflow flags at some point though, that'll be more fun...

Comment 122 Luke Kenneth Casson Leighton 2020-03-26 16:52:54 GMT

(In reply to Michael Nolan from comment #121)

> > that _should_ be sufficient to get sub, subi and neg all working, with not a
> > lot of effort.
> 
> Working now.

yay!  oh wait...

F
======================================================================
FAIL: test_sub (__main__.DecoderTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "simulator/test_sim.py", line 98, in test_sub
    self.run_test_program(program, [1, 2, 3, 4, 5])
  File "simulator/test_sim.py", line 105, in run_test_program
    qemu_register_compare(simulator, q, reglist)
  File "simulator/test_sim.py", line 111, in qemu_register_compare
    simulator.regfile.assert_gpr(reg, qemu_val)
  File "/home/lkcl/src/libreriscv/soc/src/soc/simulator/internalop_sim.py", line 74, in assert_gpr
    assert reg_val == val, msg
AssertionError: reg r3 got 10000000000004444, expecting 4444

----------------------------------------------------------------------
Ran 5 tests in 15.696s

FAILED (failures=1)

how many bits is that
0x 1 0000 0000 0000 4444

ha, that's amusing.  the result is a roll-over from 64-bit arithmetic

> Need to sort out setting the carry and overflow flags at some
> point though, that'll be more fun...

yehyeh.  a SPR storing the carry, first.  take a look at pearpc,
it should show a way to do it (that's easy to interpret)
https://github.com/sebastianbiallas/pearpc

Comment 123 Luke Kenneth Casson Leighton 2020-03-26 16:57:47 GMT

i hacked masking the write to take out the upper 1 in write_reg.

Comment 124 Michael Nolan 2020-03-26 17:15:44 GMT

(In reply to Luke Kenneth Casson Leighton from comment #122)

> how many bits is that
> 0x 1 0000 0000 0000 4444
> 
> ha, that's amusing.  the result is a roll-over from 64-bit arithmetic
> 
> > Need to sort out setting the carry and overflow flags at some
> > point though, that'll be more fun...
> 
> yehyeh.  a SPR storing the carry, first.  take a look at pearpc,
> it should show a way to do it (that's easy to interpret)
> https://github.com/sebastianbiallas/pearpc

Ooops, some of my experiments with carry must have snuck in, sorry. Speaking of, I'm getting some weird behavior with carry on qemu. If I run some instructions that generate a carry or overflow, then have gdb read the xer register, it returns 0. But if I execute a `mfxer r5` or the like and print out that register, it gives me the correct value for xer. I'm trying to confirm whether this is a bug with qemu or not.

Comment 125 Luke Kenneth Casson Leighton 2020-03-26 17:41:30 GMT

(In reply to Michael Nolan from comment #124)
> (In reply to Luke Kenneth Casson Leighton from comment #122)
> 
> > how many bits is that
> > 0x 1 0000 0000 0000 4444
> > 
> > ha, that's amusing.  the result is a roll-over from 64-bit arithmetic
> > 
> > > Need to sort out setting the carry and overflow flags at some
> > > point though, that'll be more fun...
> > 
> > yehyeh.  a SPR storing the carry, first.  take a look at pearpc,
> > it should show a way to do it (that's easy to interpret)
> > https://github.com/sebastianbiallas/pearpc
> 
> Ooops, some of my experiments with carry must have snuck in, sorry. 

no worries

> Speaking
> of, I'm getting some weird behavior with carry on qemu. If I run some
> instructions that generate a carry or overflow, then have gdb read the xer
> register, it returns 0. But if I execute a `mfxer r5` or the like and print
> out that register, it gives me the correct value for xer. I'm trying to
> confirm whether this is a bug with qemu or not.

hmmm.  if it's a bug in qemu it's one that, i suspect, would have been detected
a loooong time ago.  but... you never know...

let's look at the spec:

p70 3.0B "addc":

RT <= (RA) + (RB)

The sum (RA) + (RB) is placed into register RT.
Special Registers Altered:
    CA CA32
    CR0                                       (if Rc=1)
    SO OV OV32                               (if OE=1)

ok so SO (summary overflow), OV (overflow) and OV32 (overflow 32-bit)
all get set, in XER.  these are listed on p45 3.2.2

however that's not the "carry" flag: that's *CR0*.  which is in... errr...
the Condition Register p45 2.3.1

let's check the pearpc source (yes it's only 32-bit).  ppc_alu.c
whereisitwhereisit...  ok!

https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_alu.cc#L56

ahh, then as a *separate* concern, if you see the implementation of addcx,
that updatex XER_CA *and* updates CR0:
https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_alu.cc#L86

Comment 126 Luke Kenneth Casson Leighton 2020-03-26 17:51:08 GMT

here's ppc_update_cr0:

https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_opc.h#L26

#define CR_CR0_LT (1<<31)
#define CR_CR0_GT (1<<30)
#define CR_CR0_EQ (1<<29)
#define CR_CR0_SO (1<<28)

* clears the CR-CR0 bits
* result is zero: sets bit 29
* result top bit is set (indicating -ve as signed): sets bit 31
* otherwise (+ve signed): sets bit 30
* after that (in *addition*): if the XER SO bit was set, set bit 28 of CR0

interestingly, then, it looks like you have to check the XER flag *before*
checking the carry.  and XER is set based on... err... comparing operand A
with the result.  how odd.
https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_alu.cc#L86

Comment 127 Luke Kenneth Casson Leighton 2020-03-26 17:56:17 GMT

oh - i just spotted this in p30, section 2.3.1.  it seems to concur with
ppc_update_cr0 *except* of course, it's been adapted for 32/64-bit modes.

if (64-bit mode)
   then M <= 0
   else M <= 32
if       (target_register)[M:63] < 0 then c <= 0b100  # bit 31
else if (target_register)[M:63] > 0 then c <= 0b010   # bit 30
else                                      c <= 0b001  # bit 29
CR0 <= c || XERSO                                     # bit 28

Comment 128 Michael Nolan 2020-03-26 18:01:26 GMT

(In reply to Luke Kenneth Casson Leighton from comment #125)

> hmmm.  if it's a bug in qemu it's one that, i suspect, would have been
> detected
> a loooong time ago.  but... you never know...
> 

> ok so SO (summary overflow), OV (overflow) and OV32 (overflow 32-bit)
> all get set, in XER.  these are listed on p45 3.2.2
> 

Sorry, was actually overflow that I was checking: 
	li 1, 1
	sldi 1, 1, 63
	or 2, 1, 1
	addco. 3, 2, 1
	mfxer 5

That should add -0x80000000000000000 to itself and overflow, and the "o" means that OV and SO should actually be set. Like I said before, running it in qemu and reading the xer register gives a result of 0, and reading r5 gives a result of 0xe0000000.

> however that's not the "carry" flag: that's *CR0*.  which is in... errr...
> the Condition Register p45 2.3.1

It does also list CA and CA32, which should correspond to the flags in xer as well.

Comment 129 Luke Kenneth Casson Leighton 2020-03-26 19:39:46 GMT

https://github.com/qemu/qemu/blob/master/target/ppc/translate.c#L950

eurrrgh... macros-on-macros-on-macros... *sigh*.

ok so we're looking at a trail that goes through GEN_HANDLER, GEN_OPCODE, GEN_INT_ARITH_ADD but ultimately lands at geo_op_addic which in turn
lands in gen_op_arith_add.

here's where addco sets the flags compute_ca=1, compute_ov=1 *but* sets
add_ca=0 whereas addeo *does* set all three (line #926)

https://github.com/qemu/qemu/blob/master/target/ppc/translate.c#L924

the flag "cpu_ca" comes in globally (initialised at line 132).

looking at the spec, the difference between addco and addeo is supposed to
be:

addco

RT <= (RA) + (RB)
The sum (RA) + (RB) is placed into register RT.

addeo

RT <= (RA) + (RB) + CA
The sum (RA) + (RB) + CA is placed into register RT.

hmmm need to look at microwatt, will do that in a minute.

Comment 130 Luke Kenneth Casson Leighton 2020-03-26 20:29:41 GMT

https://github.com/antonblanchard/microwatt/blob/master/ppc_fx_insns.vhdl#L110
ppc_adde():
return 65-bit RA + 65-bit RB + carry

https://github.com/antonblanchard/microwatt/blob/master/ppc_fx_insns.vhdl#L140
ppc_add():
return 64-bit RA + 64-bit RB

hmm ppc_add() doesn't appear to be used (at all) in execute1.vhdl

decode_input_carry(ic, XERC) is:
if ic=ZERO return 0
if ic=ONE return 1
if ic=CA return XERC.ca

returning XERC.ca might be a significant difference, there... ah!  yes,
if you look in the "cry_in" column, adde and addeo are supposed to take
cry_in=CA *not* cry_in=ZERO.

that's likely to be it.

Comment 131 Luke Kenneth Casson Leighton 2020-03-26 20:34:34 GMT

btw don't be afraid to commit unit tests that don't "quotes work quotes".

when it comes to unit tests, the whole idea is to have
"things-that-demonstrate-a-need-to-fix-code" rather than "have everything
absolutely perfect".

so do go ahead and commit some unit tests that actually break, but you
expect them to pass at some point in the future.

i say that because in the past i noticed that you commented out some unit
tests which "failed".  just leave them to fail, as a reminder.  also
there's a way to tag them with bugreports (with decorators).

i'll document this in HDL_workflow.

Comment 132 Jacob Lifshay 2020-03-26 21:41:16 GMT

(In reply to Luke Kenneth Casson Leighton from comment #131)
> btw don't be afraid to commit unit tests that don't "quotes work quotes".
> 
> when it comes to unit tests, the whole idea is to have
> "things-that-demonstrate-a-need-to-fix-code" rather than "have everything
> absolutely perfect".
> 
> so do go ahead and commit some unit tests that actually break, but you
> expect them to pass at some point in the future.
> 
> i say that because in the past i noticed that you commented out some unit
> tests which "failed".  just leave them to fail, as a reminder.  also
> there's a way to tag them with bugreports (with decorators).
> 
> i'll document this in HDL_workflow.

Actually, I'd decorate them with something like:
import unittest

@unittest.skip("not working yet")
def test_fn():
    body-here

or

@unittest.expectedFailure # FIXME: not working yet
def test_fn():
    body-here


That way, once I add support for GitLab CI we will get a message when one of the commits fails any of the tests, allowing us to more quickly detect that something unexpectedly broke. If it's filled with a whole bunch of tests that we know will fail, then that makes the CI nearly worthless because it will always be filled with a bunch of failures obscuring the ones we care about.

Comment 133 Tobias Platen 2020-04-09 14:46:31 BST

I've tried to run the tests that use GDB, they still fail.

Traceback (most recent call last):
  File "test_sim.py", line 98, in test_sub
    self.run_tst_program(program, [1, 2, 3, 4, 5])
  File "test_sim.py", line 105, in run_tst_program
    qemu_register_compare(simulator, q, reglist)
  File "test_sim.py", line 110, in qemu_register_compare
    qemu_val = qemu.get_register(reg)
  File "/path/to/soc/src/soc/simulator/qemu.py", line 35, in get_register
    res = self.gdb.write('-data-list-register-values x {}'.format(num))
  File "/usr/local/lib/python3.7/dist-packages/pygdbmi/gdbcontroller.py", line 247, in write
    timeout_sec=timeout_sec, raise_error_on_timeout=raise_error_on_timeout
  File "/usr/local/lib/python3.7/dist-packages/pygdbmi/gdbcontroller.py", line 285, in get_gdb_response
    "Did not get response from gdb after %s seconds" % timeout_sec
pygdbmi.gdbcontroller.GdbTimeoutError: Did not get response from gdb after 1 seconds

Comment 134 Luke Kenneth Casson Leighton 2020-04-09 15:01:31 BST

(In reply to Tobias Platen from comment #133)
> I've tried to run the tests that use GDB, they still fail.

working fine here tobias - you'll need to work out what's missing,
what's going on, by using appropriate print / debug statements.
when you find out please do document it.

Comment 135 Yehowshua 2020-05-02 05:00:41 BST

We shouldn't be pulling the the ISA from the wiki.
This is because I can edit the Wiki and break the codebase.

Comment 136 Luke Kenneth Casson Leighton 2020-05-02 06:28:25 BST

if someone starts doing that i will ban them, recover the wiki with a git reversion, then add a password to the frontend to prevent it happening again.

ikiwiki was specifically chosen because of the use of git as a backend.

i prefer to trust people.

Comment 137 Luke Kenneth Casson Leighton 2020-08-18 12:59:59 BST

closing this, completed (POWER9, not doing RISCV).
EUR 200 michael, EUR 200 lkcl

Comment 138 Luke Kenneth Casson Leighton 2020-08-18 13:04:14 BST

increasing to 500 because EUR 100 for jacob's design insights