Bit of a techie post, this. Mainly for my records, although it might help anyone who has a very similar problem
I have had some fun over the last week or so becoming a bit more familiar with the software side of the Raspberry Pi. Having reproduced our hackathon system and added a rather nicer front end (see a future post for nice video etc.), I decided it was time that I exercised the Pi a bit more. So, I thought I would get a web camera connected up to it and allow people to control the house via the web and observe the change via a video stream. This was all very good and, after a bit of tinkering and some really good hints from this post from SirLagz, and this (which seems to augment the former) I had the following.
- Compile ffmpeg and ffserver for the Pi (done on the Pi – don’t do that – use a cross-compiler)
- Get ffmpeg to grab video from the Camera (a / Microsoft / Lifecam VX-1000 that was lying around)
- Get ffserver to stream video out to a nominated port in mjpeg format
This was all running on the Raspbian OS install as provided on the NOOBS SD card (I’d run an apt-get update
and an apt-get upgrade
previously). This all worked fine and I was able to present a nice interface. Happy in that knowledge, I left it over the weekend and went home.
When I returned on Monday, all was not well. The web page was no longer working. I ssh’d to the Pi no problem and began trying to find the problem. When I couldn’t even use the tab key to autocomplete directory names – I realised that the Pi’s file system was absolutely full. No disk space left at all. A quick look at /var/log/syslog
showed it to be rather large (~3GB!), I then discovered that kern.log was almost exactly the same size. I looked through them and discovered that at some point at the weekend, this message had started to be output roughly every 10 micro seconds!
gspca_main: ISOC data error: [11] len=0, status=-71
Ouch. I truncated the logs and got the thing going again, but wanted to diagnose the bug. So, I googled a bit, and the consensus was that this was a bug in some kernel driver code and the way to overcome it was to upgrade the kernel. I checked, and my current kernel version was 3.6.11+
I used rpi-update, this brought the kernel version up to 3.10.19+. Unfortunately, this made the situation worse – the error line appeared immediately as I started the server, rendering last week’s work useless. I decided to downgrade the kernel again and try to debug more methodically. I used rpi-update’s ability to go back to a specific github raspberry pi firmware commit and went back to the latest version of 3.6.11+ before the version bump. Good, I thought, back to square one. But no, the problem was still there as soon as I started the server. Next, I thought I would go back even further, so picked an arbitrary 3.6.11+ commit somewhere around May (unfortunately I didn’t record exactly which one). This was even worse – the Pi wouldn’t boot at all now! Nightmare – I didn’t really want to lose all the work on getting ffmpeg compiled and set up, but as I was prototyping I hadn’t backed it up (d’oh – have now). So, I thought I’d have a go at repairing… Having never worked at low level (i.e. swapping kernels) before – I would have to learn a bit.
A look at the SD card in my laptop revealed that /boot was empty. I tried copying /boot.bak into /boot and over the files in the boot partition and got the Pi booting again – but the problem was still there. I reasoned that the files I had actually copied in, then, were still a different version of the kernel to the one I had originally. I tried
apt-get upgrade
thinking that this should get me back to the latest stable release, unlike rpi-update which is supposed to be somewhat more bleeding edge. I still had the problem.
At this point I put out an appeal on twitter (thanks to @matthewbloch of @bytemark who gave me some advice) and also went back to the logs and realised that I could tell exactly which kernel I had from the boot logs – the output of uname -a
is one of the first
From this, I diagnosed that I was now running with
raspberrypi kernel: [ 0.000000] Linux version 3.6.11+ (dc4@dc4-arm-01) (gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1+bzr2458 - Linaro GCC 2012.08) ) #
538 PREEMPT Fri Aug 30 20:42:08 BST 2013
whereas originally I was running
raspberrypi kernel: [ 0.000000] Linux version 3.6.11+ (dc4@dc4-arm-01) (gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1+bzr2458 - Linaro GCC 2012.08) ) #
474 PREEMPT Thu Jun 13 17:14:42 BST 2013
I’d also been through #600 (3.10.19+), #557 (3.6.11+) and #604 (2.10.21+) on my way through
Using rpi-update, I found the commit hash for the 13th June build (#474 above) and got truly back to where I started with
sudo rpi-update 2681df6c284b3ca0283721af7cd14df5548377ae
I now have a very tight range of firmware changes (between 13th June and 30th August) to investigate for what caused the problem to become a ‘hard’ failure instead of intermittent and hopefully therefore what is causing it in the first place. And I have a working server again (although I think it must still be prone to the original cause of the intermittent issue).
Having got it working again, I put nginx onto the Pi to serve my site, so that I could reverse proxy the different ports serving the video stream and the controls onto a clean web page with nice URLs. That makes it a bit of a confused stack, with bottle.py creating pages to be served by CherryPy on one port and ffserver serving video stream on another port all fronted by nginx. I’m not really a web server admin – so that’s my next learning experience. For now, though, it looks reasonably good and running for 24 hours so far. Next battle – to get it exposed to the world…
One Response to Adventures with my Pi (part 1 of…)