A Kernel development bug
22 Feb 2017
So, the thing is; sometime around last year, I decided to write a simple kernel. Just for fun. The thing is called xdillah, it’s n my github profile and is ridiculously incomplete as of now. I was actively developing it some time around last summer, and here’s a very late post mortem about a bug I encountered sometime last August (or September or something). Oh, also, when developing, I extensively used JamesM’s tutorials (the link changes every now and then) and osdev.org
. Both great resources.
So, when you’re doing kernel development, you generally need to have some sort of basic I/O for things like printf, puts etc. So I impleented some basic I/O functionality there, no problem. Then, one day, suddenly, some of the stuff I was printing out just stopped being displayed correctly. At that time, I hadn’t been able to hook a debugger to qemu where I was testing my kernel, so the obvious solution was good old printf-debugging. But the problem was; how do you printf debug when the printf routine is your problem? The answer is; you add another function to write out to your serial port (qemu redirects serial port output to stdout), check that it works; and then continue with serial port debugging!
So, when I was serial port debugging my printf routine, I was just putting serial port put statements around the code. After a bit of playing around, I noticed one peculiar thing: None of the statements were outputting anything to the serial port when there was a bug in the execution! After a lot of googling around, I found the culprit: when gcc
does optimization, it turns single argument printf calls or printf("%s", asd)
calls into calls to the puts
function. So, turns out my printf implementation was fine, my puts implementation was the actual problem. However; one problem remained, the puts implementation is so simple that it obviously has no weird behavior. Sio the defect remains to this day.
In the end; I was able to work around this by replacing all the single argument printf calls within my code into printf("%s \n", asd)
s and was able to disable that ‘feature’. However, the original bug in the puts remains there and boggles my mind to this day. I thought that was a pointer issue; but the strings print just fine when they are printf
‘d just fine. How can such a simple implementation have such a bug?