Bug 659 - SVP64 demo / unit test of CRM data-dependent ffirst mode implementing insertion sort
Summary: SVP64 demo / unit test of CRM data-dependent ffirst mode implementing inserti...
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 952 953
  Show dependency treegraph
 
Reported: 2021-07-21 11:03 BST by Luke Kenneth Casson Leighton
Modified: 2024-02-25 02:45 GMT (History)
1 user (show)

See Also:
NLnet milestone: NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks: 2000
budget (EUR) for this task, excluding subtasks' budget: 2000
parent task for budget allocation: 953
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-07-21 11:03:09 BST
it should be possible to do this using 1<<r3 predicate
for selection and insertion, data-dependent failfirst
to do the inner loop (Reverse REMAP schedule) and a compare
xchg instruction.

failfirst will drop out with VL set to the failed element,
getvl can copy this into r3 and use 1<<r3 as the predicate
to insert key_item.

excluding LDs it should be possible to do in registers
in about 10 instructions.

def insertion_sort(array):
    for i in range(1, len(array)):
        key_item = array[i]
        j = i - 1
        while j >= 0 and array[j] > key_item:
            array[j + 1] = array[j]
            j -= 1
        array[j + 1] = key_item
    return array
Comment 1 Luke Kenneth Casson Leighton 2021-07-21 13:15:40 BST
sublists are small enough to do with insertionsort
medians can use a (new) treereduce mode, with a predicate
of 0b00100 or 1<<r3 r3=2
low/high can use Vector cmp and VCOMPRESS with CR.gt/lt
k can be computed from cntlz

sublists once calculated can be part of timsort.

def median_of_medians(A, i):

    sublists = [A[j:j+5] for j in range(0, len(A), 5)]
    medians = [sorted(sublist)[len(sublist)/2] for sublist in sublists]
    if len(medians) <= 5:
        pivot = sorted(medians)[len(medians)/2]
    else:
        pivot = median_of_medians(medians, len(medians)/2)

    low = [j for j in A if j < pivot]
    high = [j for j in A if j > pivot]

    k = len(low)
    if i < k:
        return median_of_medians(low,i)
    elif i > k:
        return median_of_medians(high,i-k-1)
    else: #pivot = k
        return pivot

#Here are some example lists you can use to see how the algorithm works
#A = [1,2,3,4,5,1000,8,9,99]
#B = [1,2,3,4,5,6]
#print median_of_medians(A, 0) #should be 1
#print median_of_medians(A,7) #should be 99
#print median_of_medians(B,4) #should be 5
Comment 2 Luke Kenneth Casson Leighton 2021-07-21 13:20:47 BST
this should be possible to do with vectorised CMP creating
a Vector of CRs.  followed by 3 CR-Predicated VCOMPRESSed
mvs, one with CR.eq, one with CR.gt, the other with CR.lt

from random import randint

def quicksort(array):
    if len(array) < 2:
        return array
    low, same, high = [], [], []
    pivot = array[randint(0, len(array) - 1)]
    for item in array:
        if item < pivot:
            low.append(item)
        elif item == pivot:
            same.append(item)
        elif item > pivot:
            high.append(item)
    return quicksort(low) + same + quicksort(high)
Comment 3 Luke Kenneth Casson Leighton 2021-07-21 16:35:46 BST
actual core of timsort looks very similar to a butterfly
schedule. note that min_run is picked to ensure merges are
as close to equal size power 2 as possible.

def timsort(array):
    min_run = 32
    n = len(array)
    for i in range(0, n, min_run):
        insertion_sort(array, i, min((i + min_run - 1), n - 1))
    size = min_run
    while size < n:
        for start in range(0, n, size * 2):
            midpoint = start + size - 1
            end = min((start + size * 2 - 1), (n-1))
            merged_array = merge(
                left=array[start:midpoint + 1],
                right=array[midpoint + 1:end + 1])
            array[start:start + len(merged_array)] = merged_array
        size *= 2

    return array
Comment 4 Jacob Lifshay 2021-09-23 02:43:38 BST
changing milestone to match parent task for budget allocation
Comment 5 Luke Kenneth Casson Leighton 2022-10-05 20:20:03 BST
inverting the order:

def insertion_sort(array):
    lim = len(array)-1
    for i in range(lim,-1,-1):
        key_item = array[i]
        j = i + 1
        while j <= lim and array[j] > key_item:
            array[j - 1] = array[j]
            j += 1
        array[j - 1] = key_item
    return array

    ctr = alen-1
    li r10, 1 # prepare mask
    sld r10, alen, r10
    addi r10, r10, -1 # all 1s. must be better way
loop:
    setvl r3, ctr
    sv.mv/m=1<<r3 key, *array   # get key item
    sld r10, 1  # shift in another zero MSB
    sv.cmp/ff=GT/m=~r10 *0, *array, key # stop cmp at 1st GT fail
    sv.mv/m=GT *array-1, *array    # after cmp and ffirst
    getvl r3
    sub r3, 1                   # reduce by one
    sv.mv/m=1<<r3 *array, key   # put key into array
    bc 16, loop                 # dec CTR, back around
Comment 6 Luke Kenneth Casson Leighton 2022-10-07 00:24:49 BST
https://github.com/thatsOven/pdqsort-python/blob/main/pdqsort.py
Comment 7 Luke Kenneth Casson Leighton 2024-02-25 02:45:12 GMT
(In reply to Luke Kenneth Casson Leighton from comment #0)
> it should be possible to do this using 1<<r3 predicate
> for selection and insertion, 

or with unary masks like in maxloc. see bug #676