Quality time with a Raspberry Pi
Thanks to flaushy I got access to a Raspberry Pi. Model B, 256mb+ethernet, running Debian 6, gcc 4.4.5, soft-float.
I ran some benchmarks and got a feel of the system. Only server/command-line use was considered.
All benches were ran in the default config, 192mb of RAM for the system and 64mb reserved for the GPU.
As usual for these ARM boards, SD cards suck, and the readers suck. The Pi is no exception, you’re going to have 70-80% of cpu time spent in io (waiting for the card) if you dare to touch it.
For this reason all the benches were done in tmpfs (ram).
The Pi’s slow cpu showed up in particular in the heavy cpu time spent in sys, 5-30% for most tasks. This means two things, both that the cpu is slow, but also that the device will greatly benefit from an optimized kernel.
Whether the custom Debian spin from the Foundation’s site has a well-optimized kernel I can’t say. The image was rather big at 3.8mb, but the only module it had was fuse.
The CPU is an ARMv6 Broadcom one, running at 700MHz. The specific core is arm1176jzf-s.
Supported features from cpuinfo:
swp half thumb fastmult vfp edsp java tls
It has 128kb of L2 cache, but as far as I know no way to detect whether it is enabled. There were no benchmark differences after adding “enable_l2cache=1″ to /boot/config.txt and rebooting.
Comparing the core to my Phenom II X6, depending on the benchmark, it is about 1/20th core-vs-core. 1/4th of the frequency (700Mhz vs 2.8GHz) and 1/5th the IPC (instructions per cycle). I don’t have any devices of the same class around for comparison.
I could not find the memory spec for the integrated chip, but I measured it could be written to at about 137 MB/s (4kb blocksize – at 1MB blocksize it was 75MB/s). This would put it about at PC100 SDRAM. (tmpfs overhead etc.)
One use-case I’m quite interested in is a small-scale server: NAS, www, ftp. I recall reading that the Ethernet on the Pi is connected via USB internally. This is a rather bad sign: USB as a protocol had terrible cpu overhead.
To measure the ethernet, and the ethernet only, another machine in the LAN was set up to serve zeros up as fast as possible.
nc -l -p 7777 < /dev/zero
On the upside, the Pi reached 11.7 MB/s. At 94% of the theoretical max, the speed is good and comparable to many common ethernet cards.
The downside was that the cpu use was huge: 50% sys!
This was only using the ethernet fully. Any protocol and server overhead would come on top of that, and since the ethernet is shared with the usb, if your storage would be usb, expect bad performance. If using a more advanced file system, such as ext4, jfs or xfs, this would only be exacerbated.
In short: don’t use for a server/nas.
The box needs blobs from Broadcom to boot, use the GPU, and for some other smaller functions.
The only supported RAM splits are 128/128, 192/64, 224/32. You can’t give all the RAM to the CPU.
Testing the optimization of the shipped GCC (4.4.5), I measured how long it took to compress an 1gb file of zeros with gzip.
time dd if=/dev/zero bs=1M count=1024 | gzip -9 > /dev/null
The times of “-march=armv5 -O2″, “-march=armv6 -O3″, and “-mcpu=arm1176jzf-s -O3″ in order of increasingly better optimization, were identical within a second or two, well within variance. 2min 50s.
The same on the Phenom took 8.8s, using an older binary (not well optimized), for a rough yardstick comparison.
I can conclude that a GCC that old didn’t benefit from targeted optimizations. I read that Linaro’s GCC 4.7 has more work for ARMv6, it would be interesting to try that later. However compiling gcc on the Pi is something I don’t want to do, understandably.
Comparison against another ARM board
openssl speed rsa4096
gave the Pi a score of 1.2 signs/sec.
Comparing it to soft-float results from Phoronix:
Pandaboard ES (dual Cortex-A9), 2.7
Atom N270, 2.8
Phoronix also measured hard-float:
Pandaboard ES, 4.6
Given these results, the Pi did surprisingly well against the soft-float Pandaboard.
Web server tests
Finally, I compared three web servers. All were benchmarked in their default configurations sans the port number and document directory. All served the same static page:
All were under siege over localhost, with ten clients hammering away:
siege -c10 -t 10S -b localhost:1234
The tested versions were:
- Busybox httpd, git 576b1d3c417ddea79481063401837ec0bdb91658
- Monkey, git 484f819cf5a65d8f26add14243c8ffcee6293cc1
- nginx 1.2.0
All tested servers successfully completed the test. Results:
Of these, busybox is a forking server (a new process for each connection), while nginx and monkey use an event-driven model.
I found the Pi to be fairly unsuitable for the uses I’m interested in. It may run fine as a media player or a teach-yourself-coding box, but it has some serious limitations for low-powered server use.
While it’s cheap on its own, perf/$ is not that good. Performance per watt is decent, given a maximum TDP of 3.5W.
For myself, I look forward to the Rhombus Tech A10 board. Until next time ;)