This page contains day-to-day development ramblings of my SoC 2007 project. See the main page for more information.
Finally fixed write for file systems where blocksize is smaller than the page size. There were two problems. First of all, the putpages code assumed that it would write entire pages always. Naturally, if bsize < PAGE_SIZE, the file can end before the page boundary. So we write now only until eof. The second was that the putpages code assumed that the entire file had always been allocated when trying to write it. However, when truncating a file, ufs ballocs only the last file system block and allocates a page for it. Now, if the block size is smaller than the page size, a block will be allocated only at the end of the page (assuming a suitable alignment in the truncated length, of course) and mapping the blocks at the beginning will fail. So we simply skip over blocks which can't be bmapped (the kernel does this too).
Made the file systems usable non-NetBSD machines also.
I have continued development in the NetBSD tree. I have added support for more file systems (currently spanning cd9660fs, efs, ext2fs, ffs, hfs, lfs, msdosfs, ntfs and tmpfs) and I've just completed a full build of my kernel sources on ffs running in userspace.
Initial integration to NetBSD was done under /sys/rump (Runnable Userspace Meta Programs).
Managed to frob enough to get the build framework to build & work. I also put the first public version of the code out and proposed intergration to NetBSD.
Technically there's still some stuff to do:
Ok, started writing the vnode pager. I opted to make it work mostly like the kernel pager and cram functionality into genfs_getpages(). However, I'm skipping corners quite a bit. One issue left is page management, as I'm leaving pages hanging off of a vnode's uvm object now, and need to flush them at some point. Another issue is supporting VOP_PUTPAGES(), as it is needed for write to work.
Ok, got the put side also written. Did a weird weird little algorithm where I just flush everything in VOP_PUTPAGES(). But to take a queue from the famous cooking phrase, "flavour rules", I argue that "functionality rules". Okok, maybe it needs revisiting at some point in the future, although this is already better than the previous version. Other than that, I managed to re-introduce some offset calculation bugs I spent a hell debugging away from the previous implementation. So "hooray" for me getting to do that again.
*n+1 hours later* Aaah.. how many bugs can you fit into offset frobbing? A billion... Here are some examples:
curoff = ap->a_offset;Should have been:
curoff = ap->a_offset & ~PAGE_MASK;
xfersize = MIN(((lbn+1+run)<<bshift) - startoff, remain);Should have been:
xfersize = MIN(((lbn+1+run)<<bshift) - (curoff+bufoff), remain);
Also, the putpages routine should not write more than rounded to 1<<DEV_BSHIFT past the eof-of-file, i.e. *NOT* a full file system block. Otherwise things go nasty with fragments. But everything just about works now. So maybe tomorrow I can finish the build framework (which is also mostly in place already).
I restructured the read/write code to use memory object pagers instead of mucking directly with VOP_STRATEGY(). The good news is that after writing an anon memory pager, tmpfs now works. The bad news is that all other file systems are broken until I write a vnode pager.
I had the bright idea I should support tmpfs also. The namespace part already works, and file creation, deletion etc. is not a problem.
However, for managing file contents tmpfs is a fair bit different from other file systems. "Regular" file systems use a vnode backed vm object for storing data. But since tmpfs operates on memory instead of files, it uses anonymous memory vm objects for storing the data (which at first is a bit funny, since usually anonymous memory is considered to be memory not backed by a file. But that holds also here, since the memory backs the file - the file does not back the memory). Now, to read and write, tmpfs does ubc_alloc() just like any file system. It just does it on an anonymous object.
Second, tmpfs does not use VOP_STRATEGY(), nor does it use genfs_getpages(). Rather, it defines it's own get/putpages routines where it accesses the anon object pager directly.
All of these ruin some assumptions that I made when I skipped corners. But getting tmpfs to work is a good exercise in getting some things done more properly. Let's see if I ever finish that exercise ....
Worked on the build framework. It's just about working now and should be ready for an import proposal in a few days.
Add a few symbols: cd9660fs works. At this point it should theoretically be trivial to support all kernel file systems in userspace. However, concentrating a bit more on the compilation architecture is probably in order instead of doing simple pax -rw [thisfs] [anotherfs].
Also, split virgin kernel source files into a separate library. They are compiled directly out of the kernel source tree. This brings the lines of code in the emulation library to below 2000.
Ok, the code is still quite messy. But I did a few things to make it a little more attractive and ready for import.
And now the ufs/ffs code is compiled out of /sys instead of a local modified copy (ok, I do have a special tree for this, but YKWIM)
Stubize some more symbols:
damn ... I was supposed to do cleanup ...
Did some toolwork as the first order of business today and ran the code in Valgrind. (Kudos to the people who made vg4nbsd happen. You rock!) It found a couple of places that leaked memory in my emulation library and caused it to eventually run past the process allocation limit. Once I (or someone else) add better support for a few syscalls, I plan to give the ffs code a better whipping in valgrind to see if there are any prehistoric bugs lurking around.
I other news, I figured out why VOP_RENAME() wasn't working. When I implemented the relookup() stub, I did it as "return 0" instead of "abort()". And so relookup() was just returning back the same input, which happened to be valid vnodes. But now it works.
So now the order of business is just cleanups and generalizing the code to be able to run non-ffs file systems. I think I'll have a go at making it work for efs next.
HOLY SHIT, IT WORKS! [This comment includes memes contributed by Jared D. McNeill]. *phew*, getting write to work really was a tough call. For some reason it took a few days of trying to realize how fragments work (might have something to do with my cranial capacity ... ):
pain-rustique:310:/puffs/tmp> touch a b pain-rustique:311:/puffs/tmp> ls a b pain-rustique:312:/puffs/tmp> mv a b pain-rustique:313:/puffs/tmp> ls a pain-rustique:314:/puffs/tmp> mv a b pain-rustique:315:/puffs/tmp> ls a pain-rustique:316:/puffs/tmp> rm a pain-rustique:317:/puffs/tmp> ls pain-rustique:318:/puffs/tmp>Time to start warming up gdb, it's gonna be a long one ....
I've been doing some other puffs/fs stuff recently, such as adding nicer mount info display, so you get this
/dev/vnd0a on /puffs type puffs|p2k|ffs (nosuid, nodev, mounted by pooka)instead of this
puffs:p2k:ffs /puffs type puffs (nosuid, nodev, mounted by pooka)(ok, and I was on vacation, let's not deny it)
But this still needs plenty of effort, so it was time to hacking again.
There's still some data corruption occuring somewhere preventing an untar, but I think that may be related to small write sizes. This works nicely:
/dev/vnd0a on /puffs type puffs|p2k|ffs (nosuid, nodev, mounted by pooka) pain-rustique:217:/puffs/tmp> cp /netbsd . pain-rustique:218:/puffs/tmp> md5 netbsd MD5 (netbsd) = af302207e0c3cbedf8f236faf2b276d1 pain-rustique:219:/puffs/tmp> md5 /netbsd MD5 (/netbsd) = af302207e0c3cbedf8f236faf2b276d1
So, let's support all the other operations.
Now everything mostly works. Except writing goes horribly wrong, since I forgot all about fragments and treated everything as file system blocks. I need to fix this tomorrow or some other day when I'm less tired. I guess 375 wasn't enough ...
How many times can you do block address calculation wrong? Apparently quite a few times, approximately 375. So the few things you need to remember:
When I put i that way, it doesn't look too hard. But there's a helluva amount of maneuvering space for bugs ;)
Time to attack the kernel as promised. Now, initially I was going to implement a totally new kernel file system which attempts to preserve the exact vfs interface better than puffs (and later merge the two). However, due to my success with wacky "this is enough" emulations so far for ffs, I'll start by implementing a vfs emulation layer on top of standard-issue puffs interface. So the whole architecture will look a bit like an hourglass: ffs - puffs vfs emulation - libpuffs - puffs vfs - vfs - kernel; the middle puffs layer "squeezes" information and expands it on the other side.
It should be noted that this is only for investigating the situation and seeing how things work out. Eventually I'd like the userspace puffs interface and kernel vfs be as close to each other as humanly possible. But that's hard stuff ...
Wow, stuff works now. That was easy. And they pay me for this.... but I'm not complaining.
pain-rustique:85:/puffs> df . Filesystem 1K-blocks Used Avail Capacity Mounted on puffs:p2k:ffs 254079 193889 47487 80% /puffs pain-rustique:86:/puffs> ls MAKEDEV ctl img libdata mnt3 puffs2 stand altroot dev img2 libexec mtn rescue tmp bin etc kern lost+found netbsd root u9fs blah flop ktrace.out mnt proc sbin usr boot home lib mnt2 puffs sshfs var pain-rustique:87:/puffs> cd etc pain-rustique:88:/puffs/etc> head -3 passwd root:*:0:0:Charlie &:/root:/bin/csh toor:*:0:0:Bourne-again Superuser:/root:/bin/sh daemon:*:1:1:The devil himself:/:/sbin/nologin pain-rustique:89:/puffs/etc> ls -l rmt lrwxr-xr-x 1 root wheel 13 Apr 16 2006 rmt -> /usr/sbin/rmt
Ok, seriously, there's still *a lot* to do, but this is a good initial start.
Try to run ufs_lookup() (*cower*) to completion just with simulated arguments. That's easier than trying to integrate the kernel into this now ;)
Lo and behold, ufs_lookup() for "etc/passwd" using the root vnode as dvp runs to completion (yes, that's done as two separate lookups).
I also sprinkled some abort()s in the emulation code, since it started being annoying trying to track down the failures from do-nothing-and-return-success stub implementations.
note-to-self: I really need to make a stab at properly implementing the vnode operation vectors soon. Otherwise I'll just end up tying myself into a knot with the constant subpar hacks. Luckily, like most people with half a brain, I never listen to me.
So let's try to get some more stuff working in userspace. After that I'll jump to kernel integration (mostly because I'm tired of typing fake calls in a userspace test program ;)
VOP_READ works: I can read files from a file system. Think I'll seriously start attacking kernel integration now.
Let's start running the code we've manage to compile. I'll split the code thusly:
I need to implement the operation vectors for vnodes so that VOP_FOO() calls done from the ffs code work. Instead banging my head around with the standard vop_desc stuff, I'll just add "JIT" hacks to vnode_if.c, i.e. hack support in after each time the driver crashes with a jump to hyperspace. I'll fix it later.
The mount code uses DIOCGPART to fetch information about the partition. This is clearly a kernel-internal interface, since struct partinfo used by it contains pointers to the relevant structures instead of the structures themselves. This clearly requires some special internal/external handling, but I won't implement that just yet. So very brutally kludge this in the ioctl code by observing that code doesn't *really* need the information (it has fallbacks) and just fail the request.
Next the code uses bread() to fetch the superblock. Time to implement userspace buffer cache (or "cache" ;). I'll not implement double caching in userspace and always issue a kernel read and let the kernel worry about caching.
After this ffs_mount() runs succesfully to completion in userspace.
milestone one reached
It links. wh00t! Party time! (ok, not really). Filling in all the missing 200-something symbols was kinda an annoying and pedestrian task, but it would be difficult to automate this properly, since going through everything manually was required to properly group them. It also provided some opportunities to actually think about the problems. So if a tool-based solution is aimed for, it should definitely be one of those "programmer guided" ones where the tool creates the stubs and the programmer verifies and groups them.
Ok, let's try to attack some of the more difficult stuff today. And what I mean by "attack" is "add stubs" and isolate hard parts into smaller chunks.
I gotta start doing something about this some day, so might as well be today. As the old saying goes: you must first compile before you can run. So let's try to get the ffs/ufs code at least compiling. *writes some Makefiles*. Wow, that was easy. Now I have to only get it linking also, which might be a tad more difficult with 231 missing symbols without the rest of the kernel.
Since I have absolutely no idea what I'm doing, might as well start with a machete(tm) approach for now and nuke everything irrelevant from the code but still provide everything necessary.
There are still a few low-hanging fruit, but after that the real fun begins. There's at least the buffer cache, genfs, vnode subroutines and relevant uvm parts which are the really hard stuff.
This web page brought to you by vi and psshfsAntti Kantee <firstname.lastname@example.org>