Skip to content
Snippets Groups Projects
Select Git revision
  • 9da89eec9cf38546d284fc0ffde4683ee8082fd1
  • main default protected
2 results

2024-03_rp2040-uart.md

Blame
  • 2024 01 03

    Today I'd like to check out the RP2040's PIO via the UART example, and see how fast we can sendy UART frames between two devices, as a preliminary speed test.

    So - we're wired up, we should hit the scope and then fire up some test code: display (?) uart PIO, blinky, etc ...

    The first thing I'm learning here and as shown in the pio rx and pio tx examples is that each state machine can only do one half of a UART... TX or RX, not both as we are accustomed to with a UART peripheral. So, since we are interested in building our little router thing, we actually would only be able to max out at 6 total UARTS there: 2x the-og-peripheral, and 4x PIOs.

    The second thing I'm learning is that working with PIO in C is not that simple; we write a PIO block uart_tx.pio and then use a pio-assembler to write uart_tx.pio.h that we can include in our sketch. There is an online pioasm instance. Doing this on windows is a little bit of a pain - and means that we will have two things to call before we can upload code, but not a major-major roadblock.

    So, as for reasonable goals for today, I should basically just try to throw-and-catch a block, really simple-like, to test baseline perf.

    Well actually, fk it, I will use the online pioasm for the time being...

    UART PIO TX'ing

    And we're up with a test, I will find the BAUD limit next, and check if that is affected by changes to f_cpu...

    This should be simple: we have output is

    float div = (float)clock_get_hz(clk_sys) / (8 * baud);

    So we should be able to to clk/8 : 16MBit/s, and indeed we can see things working up to 15MBit/sec. To get to 30, we can crank the CPU to 250MHz.

    Mbit/s Traces
    1 img
    2.5 img
    10 img
    15 img
    25 img
    30 img

    But this is not terribly interesting: we want to see that we can catch words fast enough: the ISR on the RX side is normally where we meet our limits.

    UART PIO RX'ing with PIO Example

    So, I should spin up an RX line now and see about firing an interrupt there... I'll get one chip TX'ing at a fixed rate and then start in on no. 2

    ... doing this using the blocking example catches some bytes, but not that many (at only 1mbaud), and it's perhaps only latching when we just catch the byte in time, i.e. if we poll just as the last bit has arrived - or perhaps we're only-sometimes catching in time, and the thing is not receiving next bytes etc etc..

    So, the interrupt version... works at a similar quality: some bytes are captured, many are not. It tends to happen in phases. In traces below, CH1 goes lo-then-hi whenever a new byte is loaded into the TX chip, CH2 is the UART trace (TX/RX), and CH4 flips state whenever the UART RX IRQ fires.

    irq irq

    So, this is all kind of bad news for our project, and I suspect I would have to get into some of the PIO depths to figure out what's going wrong, which I don't really have the time for at the moment. I can try slowing it down to see if this is a chunking error or something else... it's the same even at 115200 BAUD.

    So - for troubleshooting, I am perhaps missing some pin config? But that looks to be handled in the example's setup.

    Not totally sure, but I'm going to move on to try out Earle's software serial PIO, which I have some prior experience with, but also found some bugs in (?) IIRC.

    UART PIO RX'ing with Earle's Software Serial PIO

    the earle commit
    the earle pio_uart.h
    the earle pio softwareserial.h
    the earle pio softwareserial.cpp

    So - let's try this out.

    OK, I have this up and running: I am streaming at a fixed BAUD, then counting the ratio of bytes we miss in a stream. The uc's are counting missed and proper bytes (reading monotonic sequence numbers) and then checking also intervals between transmissions...

    Mbit/s Misses (errs / total-bytes) Expected Byte Time (us) Avg Byte Time
    0.1 nil 100 110.2
    0.5 nil 20 22.2
    1.0 0.025 (1/40) (!) 10 13.4
    2.5 0.497 4 11.0
    5.0 0.755 2 11.0

    So, I'm not convinced that I haven't fuxd anything here... it seems wild that we would have such a bottleneck to performance, and there are a few red flags here; namely that we have a lower bound of 11us between transmits, which would also explain our increasing error rate when we surpass 1 Mbit/sec (as the byte-wise period there is 10us or so).


    Not in Flash

    I'm also noticing this pattern to do:

    void __not_in_flash_func(){}

    Around some handlers. I wonder if this is a missing step on lots of these codes... This is discussed in this forum post and also shows up in the sdk here - for fast shit. Maybe important...

    Also points to a larger red flag for me about the system... maybe this is not actually the microcontroller for hardo realtime stuff...