2023-12_ethernet-real-data-rates.md
2023 12 27
OK, I'd like to do something similar to this prior test and to the usb-rates in this repo, but using the W5500-EVB-Pico, an RP2040 Pico with a WizNet Ethernet chip bolted onto the end of it.
For these tests I am interested in maximum data ingest and histograms.
So let's see, this notes that we need to do Ethernet.init(17) to set the CS pin properly, but otherwise we can just use the arduino Ethernet library straight up, that's nice.
To connect, we can also wrap the wiz up in a websocket - maybe - or just write transmission-loss-catching UDP link layer stuff, TBD.
There's also, ofc, some messy earle core stuff - they use a different Ethernet stack, it seems. I'm gunsta try the Arduino library first.
There is also some confusion about max frame sizes (make sure to read to the end of that post) which I should just test myself.
2023 12 28
OK, this is looking alive-ish? I should see about getting a UDP ping with Python and then I can record the setup / etc.
Ethernet Hookup Guide
- Laptop and Device are booth hooked up to a small switch
- Switch is not hooked to any other higher-level-internet-devices.
- SET Ethernet Interface (in laptop OS) to use fixed IP,
- i.e. I am using 192.168.1.178
- SET the same to use Subnet Mask 255.255.255.0
- Assign Static IP in Arduino to i.e. 192.168.1.177
OK, now I can ping a message down and up, god bless... speed testy time.
Ping-Polling Speed Test
AFAIK, UDP can source and sink packets at either end of the pipe - it's just a straight IP datagram. It seems like the normal pattern is to use it "transactionally" - i.e. send-req-get-res, and I will need to bootstrap delivery guarantees, etc. I suspect that I will ultimately want TCP here to get those tasty delivery guarantees etc.
So, ping times look good: centered around 500us, but this initial packet exchange is mad small:
Let's try it with some increasing size packets:
This code seems to fail at 1024 bytes per packet, and keep in mind that these are echo times now - not the same as previous tests. This is a single packet down, a flip, and a packet back up.
So, I want to see, for 64 bytes, what the turn-around time is in embedded (I'll watch the CS line and flip a debug pin as well), and I suppose I should measure something in the python as well.
Then I should see about flow control options, and how to just straight dump data upstream / downstream.
Embedded Turnaround Time
OK, I instrumented this a little, looking at SPI lines and a debug pin on the scope:
- the SPI looks to be running at 12.5MHz, that's nice and fast (but could probably be increased?)
- transactions (packet-in-out) take ~ 400us each in embedded time, so most of our bottleneck is just there
- it looks to be non blocking: SPI is operational even outside of our calls to the thing
So if we see ~ 800us (first plot in the quad-tet above) average round-trip-time, and 400us of that is in the embedded hardware, we know that improving the embedded side would be worthwhile... and perhaps suggests also that there is not gobs of improvement to be had (i.e. going fast is turning out to be difficult).
The wiznet datasheet supposes that the SPI can get up to 80MHz, but it seems that this is also limited in the Arduino library.
How to Improve
On closer inspection, this thing looks like it's blocking, but it calls yield()
internally. Basically I don't want to fuck with this too much, forreal.
I'll admit, actually, I'm a little stumped. I was expecting ethernet to be a magic bullet, but we are up against a very similar limit, and the troubles-at-be seem to be in these hidden layers.
Things to try would include... taking the Ethernet library offline (into-repo) and fiddling it up to that 80MHz and instrumenting it with some amount of non-blocking flow-control action (this is, actually, probably the move), but I could also i.e. bust out an RPI and see how fast I can un-frame a UART packet into python there... that might be, after all, the answer - or SPI.
So, for quicksies, and to settle this current debate, I should try not-pinging with this code, just straight up receiving hella UDP upstream...
Non-Pinging Speed Tests
- setup Arduino to, after one packet rx (to get an IP to tx-back-to) just free-form wrips packets up north, and occasionally prints rates to the OLED
- setup an async (?) version of the python inheritor, and collect them data
So... this does improve the rate substantially:
I am seeing on the scope that each transmit takes about 350us, which is reflected in the histogram above... so, while I am also wondering how much of this is to do with the ethernet-to-usb adapter I am using on the laptop side, I am basically convinced that the bottleneck here is the 12MHz SPI.
However, that would mean we have ~ 10/12 MBits/sec of overhead. But, also looking at the scope, I see significant air gaps between each byte on the SPI line, around 2us per byte - meaning that this is not entirely unlikely (each byte is just 0.8us-ish) so we already have 2/3rds overhead just in those air gaps, and then we have also whatever WIZ-To-Micro overhead to contend with as well.
So, scope trace: blue arrows highlight the length of one packet-write in embedded, with SPI transactions on CH2 (CS) and CH3 (CLK) - this shows me it's blocking writes, basically, and the width of the write is essentially the same as we measure in the histogram above.
Then we zoom in to see all of these "air gaps" in the SPI CLK line (CH3) - no data is transferred without a CLK pulse, so, this is big overhead loss, and IDK WTF the MCU is doing in these gaps, but it ain't efficient.
This is the relevant CPP and Python...
void loop() {
// just tx, forever, see if we break flowcontrol
digitalWrite(DEBUG_PIN, HIGH);
EUDP.beginPacket(txIP, txPort);
EUDP.write(replyBuffer, 64);
EUDP.endPacket();
txCount ++;
digitalWrite(DEBUG_PIN, LOW);
}
Then I am rx'ing that in a python asyncio structure...
import socket, time, asyncio
from plot_stamps import plot_stamps
import numpy as np
# Arduino's network settings
arduino_ip = '192.168.1.177'
arduino_port = 8888
# Create a UDP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# bind the socket to our local addr and port,
# 0.0.0.0 is a mask: listens on all interfaces (?)
sock.bind(('0.0.0.0', 8888))
# test stats
stamp_count = 10000
stamps = np.zeros(stamp_count)
stamp_counter = 0
pck_len = 64
# ingest packets to test app
def handle_pck(data):
global stamps, stamp_counter, pck_len
if (stamp_counter >= stamp_count):
return
stamps[stamp_counter] = time.perf_counter() * 1e6
stamp_counter += 1
if (stamp_counter >= stamp_count):
plot_stamps(stamps, stamp_count, pck_len)
# let's setup an async receiver:
async def receiver(socket, handler):
while True:
data = socket.recv(1024)
handler(data)
await asyncio.sleep(0)
async def main():
task = asyncio.create_task(receiver(sock, handle_pck))
await asyncio.gather(task)
asyncio.run(main())
But the take-away is that... we can probably do this (this being getting-past 5MBit/sec) if we take our wiznet code seriously and crank the SPI rates.
However, I would like also to test my suspicion about the raspberry-pi-spi uplink into some python code... I might take an aside down that path, before returning here to see about crikety crankity-ing the SPI rates in a copy-pasta'd ethernet library implementation.