Bug 72 - verilog to nmigen converter (full or partial) needed
Summary: verilog to nmigen converter (full or partial) needed
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Tobias Platen
URL: https://git.libre-soc.org/?p=sv2nmige...
Depends on: 147
Blocks:
  Show dependency treegraph
 
Reported: 2019-04-22 17:29 BST by Luke Kenneth Casson Leighton
Modified: 2022-06-16 14:23 BST (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 1000
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation: 191
child tasks for budget allocation: 147
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2019-04-22 17:29:14 BST
a *lot* of verilog code is being manually converted to nmigen,
it's getting boring.  time to write a tool that helps.  simplest
one initially is just a straight string/pattern-matcher...
more sophisticated (later) can use python-ply and find a BNF
lex/yacc format for it [python-ply can auto-convert c-style
lex/yacc BNF into a stub python module]

it needs to be a language translator, one of the features (requirements)
of which is to preserve as much of the original sv structure as
possible (code comments, code order etc.)


https://git.libre-soc.org/?p=sv2nmigen.git;a=shortlog
Comment 1 Jacob Lifshay 2019-04-24 01:08:32 BST
what do you think of writing a tool in Rust that uses yosys to convert the verilog to ilang, then yosys prints out json, which we can use the serde library to parse and then print python and finally use a python code formatter to format it?
If we do it that way, we can have yosys, serde, and the python formatter do all the hard parts
Comment 2 Luke Kenneth Casson Leighton 2019-04-24 01:49:59 BST
been up investigating this overnight (sigh.. :) )

if the system verilog syntax wasn't monstrously large, writing
a parser from scratch would be a reasonable proposition.

if yosys was up to the task, i agree it would be a good idea to use it.

i considered ilang as an intermediary, however yosys has a habit
of destroying if-elif-elif constructs and replacing them with
casez statements, splitting out individual variables into their
own parallelised state machine, and much other weirdness that would
make it virtually impossible to recognise the output.


not only that: this is a file from e.g. the axi_rab rtl:

yosys> read_verilog fsm_expand.sv 
1. Executing Verilog-2005 frontend.
Parsing Verilog input from `fsm_expand.sv' to AST representation.
Lexer warning: The SystemVerilog keyword `logic' (at fsm_expand.sv:21) is not recognized unless read_verilog is called with -sv!
fsm_expand.sv:21: ERROR: syntax error, unexpected TOK_ID, expecting ',' or '=' or ')'

it's using *Cadence* system verilog syntax, that even iverilog has
some capabilities missing, due to Cadence violating the verilog standard
and iverilog conforming to it.

yosys doesn't stand a chance: it is missing far too much of system verilog.


i've used it a couple of times: dabeaz wrote an example that was capable
of actually understanding yacc-formatted files, searching for their BNF
strings, and outputting an actual python-ply program that python-ply could
understand.

it's a two-step process that is in no way fully automated, however i have
an 8 *THOUSAND* line LALR parser - that i didn't write - that's
come *directly* out of icarus verilog source code (parse.y pushed through
yply.py).

generated in under a second, needed review, found some bugs in parse.y,
fixed, moved on.


currently i am munging the icarus verilog lex file into python (that
was this morning's successful task)

so that's phase (1)


the phase after that, i have had a lot of success in the past using
lib2to3's AST code, as it was designed to include whitespace (where
the standard python AST library does not), and has some additional
nice features including pattern-matching node-visitors that are
extremely comprehensive and also extremely well-documented.

what's particularly important about lib2to3's AST code is that it
has a dead-accurate pretty-printer.  after all, it *was* written to
do python2-to-python3 and vice-versa code-conversion.


phase 2 is to replace all of the print statements in the auto-generated
code with python lib2to3 AST statements.


phase 3 - which can be done as-and-when - will be to create some
node-visitors that *MODIFY* the python AST, searching for the kinds
of patterns that are expected in verilog, however are silly to keep
in nmigen.

one example is the practice by the eth-zurich team of following a
convention varname_n, varname_q then assigning the varname_n to
the initial input, over-riding it later, and then in a sync block
assigning varname_q <= varname_n - something like that, at least.

lib2to3's pattern-matching node-visitor/walker is a good match for
removing (reworking) the AST to be much more along the lines of
nmigen conventions.



the alternative is to write a simple line-by-line code-converter doing
basic pattern-matching.  i've done that before when converting massive
amounts of java to python.  two weeks to convert 20,000 lines of code,
i was very very bored by the end :)


i'm going to give this maybe... 2-3 days, max, to see if it's a
viable approach (python-ply plus lib2to3's AST+pretty-printer).

if it's not making very *very* rapid progress, i'll re-evaluate
the "string-matcher" version as a way to remove *most* of the
drudge work.  from experience though, i know that such line-by-line
string-matchers are "WORN" - write once, read never :)

irony is, normally this would be considered a major, major software
project in its own right.  it's a good candidate for a NLnet milestone,
which is why i'd like to take it a bit more seriously than just
a "string-matcher"
Comment 3 Jacob Lifshay 2019-04-24 01:55:28 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)
> yosys> read_verilog fsm_expand.sv 
> 1. Executing Verilog-2005 frontend.
> Parsing Verilog input from `fsm_expand.sv' to AST representation.
> Lexer warning: The SystemVerilog keyword `logic' (at fsm_expand.sv:21) is
> not recognized unless read_verilog is called with -sv!
> fsm_expand.sv:21: ERROR: syntax error, unexpected TOK_ID, expecting ',' or
> '=' or ')'
did you try `read_verilog -sv fsm_expand.sv`?
Comment 4 Luke Kenneth Casson Leighton 2019-04-24 08:06:15 BST
hm good point, let's see...

yosys> read_verilog -sv fsm_expand.sv 
1. Executing Verilog-2005 frontend.
Parsing SystemVerilog input from `fsm_expand.sv' to AST representation.
fsm_expand.sv:59: ERROR: syntax error, unexpected TOK_TYPEDEF

line 59:

  typedef enum logic           {IDLE, WAIT} state_t;

so that would be information lost (iverilog supports the typedef keyword)

yosys> help read_verilog

    -sv
        enable support for SystemVerilog features. (only a small subset
        of SystemVerilog is supported)



yosys> read_verilog -sv load_unit.sv 
1. Executing Verilog-2005 frontend.
Parsing SystemVerilog input from `load_unit.sv' to AST representation.
load_unit.sv:22: ERROR: syntax error, unexpected TOK_ID, expecting ',' or '=' or ')'

(there's a type there):
    input  lsu_ctrl_t                lsu_ctrl_i,

if i put the import back in (which iverilog barfs on) yosys still barfs:
https://github.com/steveicarus/iverilog/issues/102

import ariane_pkg::*; // <<----

module load_unit (

this is a known issue in icarus verilog... and i can *guarantee* it will
be easier to fix that in the python-ply BNF than it will be to try to
patch the (c-based) iverilog source code, first.


so...

* yosys isn't up to the job and it would be months possibly years
  until any feature requested is added to support *Cadence* undocumented
  systemverilog features

* iverilog likewise would be months to add the same... and the developer
  would require to go in a different direction anyway

* yosys would destroy valuable information, performing hardware-suitable
  topological translations

* we need a language *translator* where yosys is a language *compiler*.

  a language translator's focus is to preserve as much of the original
  language's features (such as code comments, structure, order of
  the original code and so on)

* extracting the BNF syntax from iverilog is done already (automated)

* modifying the BNF syntax will be a heck of a lot easier without the
  primary purpose of either yosys or iverilog being in the way
Comment 5 Luke Kenneth Casson Leighton 2019-04-24 08:24:57 BST
any other ideas? (will update comment 1 to clarify the requirement
to preserve as much of the original code as possible)
Comment 6 Jacob Lifshay 2019-04-24 08:35:33 BST
we might be able to use slang: https://github.com/MikePopoloski/slang
there is a godbolt-style website for trying it out: http://sv-lang.com

If we have to write the systemverilog parser ourselves, I think it would be less work to just manually translate the systemverilog code if we have less than 10kloc or so of code to translate.
Comment 7 Luke Kenneth Casson Leighton 2019-04-24 08:52:52 BST
https://github.com/MikePopoloski/slang/blob/master/scripts/grammar.txt
https://github.com/MikePopoloski/slang/blob/master/scripts/syntax_gen.py

iinteresting! good find!

i like the approach, split out the BNF into straight text files and write
a syntax/grammar-generator that spews out code-fragments.  it reminds me
of the approach i took with the direct python-webkit bindings.

mike has however jumped direct to c.  that would be the cut-off point
for adaptation (extraction of grammar.txt, syntax_gen.py etc)
unless slang can cope / be a basis in c++ for some form of intermediary
translation... removing features of Cadence systemverilog...

tried out sv-lang.com, i appear to have crashed it, whoops ;)

typed in "import ariane_pkg::*;" and it reaaally didn't like it :)
Comment 8 Luke Kenneth Casson Leighton 2019-04-24 09:18:30 BST
(In reply to Jacob Lifshay from comment #6)

> If we have to write the systemverilog parser ourselves, I think it would be
> less work to just manually translate the systemverilog code if we have less
> than 10kloc or so of code to translate.

i'm starting to get RSI again, so "less typing" is a high priority.

ariane's 16,000 lines, axi_rab is 6,000 - both include some valuable
worked examples of axi4.  it easily takes me... a day to do 300-400
lines of verilog / sv manual translation...

we still have the jon dawson IEEE754 code to do (FCVT from 32-64 and 64-32)...

as a subproject, just for those alone it's easily justifiable on the
time it would save.
Comment 9 Luke Kenneth Casson Leighton 2019-04-24 16:23:05 BST
https://git.libre-riscv.org/?p=sv2nmigen.git;a=summary

that's where it's at, so far.  the code's an absolute dog's dinner-looking
mess, however, incredibly, it actually walked one of axi_rab.sv's files

i've had to comment out much of the lexer for a first iteration, just to
get it up and running, rapidly.  some of that will have consequences
such as disabling the lexer's ability to detect types and imports,
which can be reintroduced incrementally.

also the timestamp recognition isn't working yet, plus the number
formats need some regex's / conversion (c code from the lexer replaced
with python that does the same job) etc. etc.
Comment 10 Luke Kenneth Casson Leighton 2019-04-25 15:00:11 BST
grep "def p_" parse_sv.py | wc
   1095    2190   31280

*shocked*!!  that's one mmmmaaaaasive number of parser states!

luckily it is not necessary to do all of them.  UDP can be entirely
skipped, for example.  also, many of them will be incredibly
simple: "return one of the things that came through from a previous
state".

i've done a couple of the states, to see what they look like.
this is "lpvalue '=' expression ';':

expr = Node(syms.expr_stmt, [p[1], Leaf(token.EQUAL, p[2]), p[3] ])
p[0] = expr

p[2] needed doing (the lpvalue), p[3] likewise, and it comes out like this:

Node(expr_stmt, [Leaf(1, 'port1_accept_SN'), Leaf(22, '='),
                 Leaf(2, "1:'b0")])')

which when instead of doing repr, the lib2to3 "Node" class *already*
has the capability to print out the python code:

'port1_accept_SN=1:'b0'

which is, apart from the spaces, and that i haven't completed the
number-system, is exactly what's needed.


so the AST gets recursively constructed, from the leaf-nodes down,
end-result, python code!
Comment 11 Luke Kenneth Casson Leighton 2019-04-25 17:54:43 BST
> Hendrik Boom hendrik@topoi.pooq.com via lists.libre-riscv.org 
> 5:44 PM (1 minute ago)

> > *shocked*!!  that's one mmmmaaaaasive number of parser states!

> It makes me suspect that either the language isn't well-designed,
> or that the grammar formalism isn't a good match for the language.

to be fair, i used an auto-conversion tool (dabeaz yply.py) which
instead of keeping groups of ORed BNF syntax-rules together (which
would appear to keep numbers down), the tool split them out as
individual functions.

this does make the code less costly to write (no need to test
the length of the list of tokens), however it kiiinda gives
the false impression that the *syntax* is faulty.

udp_initial_expr_opt
    : '=' expression { $$ = $2; }
    |                { $$ = 0; }
    ;

becomes:

def p_udp_initial_expr_opt_1(p):
    '''udp_initial_expr_opt : '=' expression '''
    print('udp_initial_expr_opt_1', list(p))
    p[0] = p[2]

def p_udp_initial_expr_opt_2(p):
    '''udp_initial_expr_opt :  '''
    print('udp_initial_expr_opt_2', list(p))
    # { $$ = 0; }

yes that was me working out that "{ $$ = $2 }" can be global/search/replaced
with "p[0] = p[2]".

that's about 20% of the 1,000 rules done, right there.
Comment 12 Luke Kenneth Casson Leighton 2019-04-25 17:55:54 BST
module fsm
  #(
    parameter AXI_M_ADDR_WIDTH = 40,
    parameter AXI_S_ADDR_WIDTH = 32,
    parameter AXI_ID_WIDTH     = 8,
    parameter AXI_USER_WIDTH   = 6
  )

-->

class fsm:
    def __init__(AXI_M_ADDR_WIDTH=40, AXI_S_ADDR_WIDTH=32, AXI_ID_WIDTH=8, AXI_USER_WIDTH=6):

woo!
Comment 13 Luke Kenneth Casson Leighton 2019-04-25 22:52:37 BST
parameters sort-of done:

    input  logic                        prefetch_i,
    input  logic [AXI_S_ADDR_WIDTH-1:0] in_addr_i,
    input  logic     [AXI_ID_WIDTH-1:0] in_id_i,
    input  logic                  [7:0] in_len_i,


class fsm:
    def __init__(AXI_M_ADDR_WIDTH=40,
                 AXI_S_ADDR_WIDTH=32,
                 AXI_ID_WIDTH=8,
                 AXI_USER_WIDTH=6):
        self.prefetch_i = Signal() # input
        self.out_addr_i = Signal(AXI_M_ADDR_WIDTH) # input
        self.in_id_i = Signal(AXI_ID_WIDTH) # input
        self.in_len_i = Signal(8) # input

i am however wondering if the use of python AST is interfering
with the pace at which this code could be written, or whether
it could turn out to be useful.

it's actually really hard to tell if information would be lost
by choosing to drop down to ad-hoc data structures and plain
text-strings.

i am starting to get used to examining the c++ code (from
icarus verilog parse.y) and using it for not just guidance
but as *actual code* that works, after some form of regular
pattern-match substitution from c++ to python.

this is extremely weird to involve *three* simultaneous
languages... python, verilog, and c++ ....
Comment 14 Yehowshua 2020-01-06 19:15:57 GMT
I recently came across pyverilog which will parse verilog(not sure how robust its sv capabilities are).

https://github.com/PyHDI/Pyverilog

Why are we converting verilog into nMigen?
Comment 15 Yehowshua 2020-01-06 19:16:48 GMT
Pyverilog spits out a nice and tasty python AST too.
Comment 17 Luke Kenneth Casson Leighton 2020-01-06 20:36:08 GMT
(In reply to Yehowshua from comment #14)
> I recently came across pyverilog which will parse verilog(not sure how
> robust its sv capabilities are).
> 
> https://github.com/PyHDI/Pyverilog

interesting.  i started with python-ply because i know it is extremely good (i do a *lot* of language translation), so what they have done, is already taken care of.

as i know lex syntax from my time in university i added the required sv support (which was primarily the ability to use types in module interface declarations) very quickly.

> Why are we converting verilog into nMigen?

because the code being targetted for conversion (the ariane project) requires significant modification, and, more than that, absolutely nobody in the software libre world - because it is specifically and critically dependent on a *proprietary* verilog toolchain - can use it.
Comment 18 Tobias Platen 2020-01-22 16:52:38 GMT
I have tried pyverilog. It seems to work for the Xilinx dialect, but it seems to fail for the SystemVerilog dialect that ariane uses.
Comment 19 Luke Kenneth Casson Leighton 2020-01-22 22:44:06 GMT
(In reply to Tobias Platen from comment #18)
> I have tried pyverilog. It seems to work for the Xilinx dialect, but it
> seems to fail for the SystemVerilog dialect that ariane uses.

yes.  it is a mentor graphics augmented nonstandard SV that allows structs in the module parameters.

i had to modify the parser to get it to work.

actually it is incredibly sensible what they did, otherwise module declarations can have hundreds of parameters, which is extremely tedious and errorprone.
Comment 20 Luke Kenneth Casson Leighton 2020-02-17 22:07:47 GMT
Tobias: we need to do FP Exception Flags and rounding.  however it's
sufficiently complex, after looking at various implementations, that
i think it's probably best if we use sv2nmigen on Hardfloat-1.zip
http://www.jhauser.us/arithmetic/HardFloat.html

can you take a look at HardFloat_rawFN.v and add support for "parameters"?

i found that i had to modify HardFloat_rawFN.v as follows:
module recFNToRawFN # (par

note the extra space in between recFNtoRawFn and # and (

also i had to remove the "includes" (because i believe they're
pre-processed) and i am not sure about support for "`define".

can you take a look at that and we'll assign a new bugreport under here
plus some budget for it?

this will be a lot more reliable than trying to write an exception/rounder
from scratch.
Comment 21 Luke Kenneth Casson Leighton 2020-03-24 10:50:57 GMT
actually the top priority is mulRecFN, however it needs some of the other
macros / functions to work.

see mulRecFN.v - at the link in comment #20, no there is no online git repo, there is only a .zip file.
Comment 22 Luke Kenneth Casson Leighton 2020-03-24 14:24:22 GMT
another couple of pieces of code which need "parameterisation":
https://ascslab.org/research/briscv/index.html

L1cache.v and Lxcache.v - the cache-coherence protocol used there
looks particularly good, and it would be nice to be able to see
this code in nmigen