I got away from the kernel mini-conf for the early part of the afternoon to check out the distro mini-conf. HP’s own Bob Gobeille gave a talk on FOSSology that turned out to be quite entertaining.
The basic problem they’re trying to solve can be described as: what content, exactly, is in a file, and does that content resemble anything that we might care about?
Ok, some examples might help. Let’s say both Fedora and Debian are shipping a library, zlib say, and you run ldconfig on both distros and both times, ldconfig says v1.2.3. Cool, they’re shipping exactly the same library right?
Erm, maybe. Who knows what patch levels each distro might be shipping, etc. Source level inspection is pretty much the only way to figure this out.
Let’s say you’re some poor schlub teaching assistant at Uni because the brochure tricked you into thinking that you’d go to some prestigious institution to get a great education, but in reality, you have to teach data structures 500 indifferent undergraduate halfwits. The half clever bit of wit that all those undergrads have is to figure out who the nerd in the class is that actually understands what a pointer is, promise to get him some girl’s phone number, copy his program, but only hand it in after cleverly changing all the variable names. Genius! The same algorithm used to analyze zlib above can be applied to help detect exactly this sort of cheating independent of variable names. Zing!
Oh, and you can use the algorithm to figure out such mundane things as what free software license any given source file might be distributed under. And it turns out that’s actually the reason HP wrote this tool, but the general underlying principle is extremely powerful because it’s so simple.
The grand vision behind FOSSology is to create a huge repo of source files, analyzing them for… whatever. Licenses, code reuse/originality, dependencies, etc. And HP gets it, which is really cool — we’re going to hand this repo over to the Linux foundation and allow anyone to poke at it.
Good job guys, and good job bobg for the entertaining talk.