tuna & the oscilloscope

July 30th, 2008

Since I haven’t blogged about these toys, please take a look at Carsten’s article at OSADL and perhaps at my OLS 2008 paper too.

right to self-defense

June 12th, 2008

Cool, gitmo is located in the free world after all.

ait and tuna

February 22nd, 2008

Work is supposed to take most of your time, right? Survival, mouths to feed, all that stuff… But it can be fun, even if not _directly_ kernel related. Sure, there is life outside the kernel, hey, “kernel”… I keep listening to the head-honchos (heck, I had to use that term, it looks like spanish, the language of our capital, Buenos Aires!): “forget about the kernel, the action is somewhere else, in (l)userland!”

So here I am, in userland, playing with GUI stuff, drag and drop! Python! GTK! Wow! Sounds boring? No, I don’t think so. Discovery time is never boring. Its fun to try for hours to grok some new semantic domain. Even if you, in another life, wrote a DOS GUI system out of reading a marketing insertion on Unix Review in 1988 :-P

So what have I been doing in this strange land? Well, when you work you have to show the numbers, and explain them, and remember what happened when you switched that knob or applied that patch, no?

Damn, alchemy doesn’t requires you remember all that stuff, just taste the new stuff, if you think its not poisonous, that is.

But if you need to remember… There is something cooking for this year OLS, or to the first conference after it if it thinks I’m too bollocksish :-)

And to _show you the code_, full of python newbie mistakes… but hey, I even dared to write python bindings for such supposedly interesting stuff as ethtool and schedutils.

And finally to what uses it: AIT, because short, meaningless names are en vogue. Anyway, try tuna, a tuning application that has a cool name, one that I unfortunately didn’t came up with but fortunately was near the genius that though about it, thanks!

And thanks to whoever did the right thing and brought Evgeniy to kernelplanet, he is da guy from Russia! Get healthy and in crazy coding frenzy mode again!

Dwarving

January 13th, 2008

Cool stuff in the dwarves front. They are growing. One new feature I’ve been working on is to be able to change the word-size.

One can now use the DWARF info produced for an architecture with a X word-size and see how it looks like on another architecture with word-size Y.

More work is needed to cope with exceptions to a general set of alignment rules and then we’ll be able to ask pahole for the best possible member organization that will work across a series of architectures.

The mailing list is picking up steam, with different suggestions being made and improvement ensuing, with more patches trickling in.

One example of recent changes that were discussed in the mailing list was the inline improvements in codiff, the binary diff tool, discussed with Ilpo Järvinen, that he right away used in debloating the Linux kernel network infrastructure.

Soon I’ll be hard pressed to do some further improvements, with database, probably sqlite, being used to cache the results using build-id, that is becoming the norm in the -debuginfo packages made by distros, used as a key to previous results.

Database caching, once implemented, will pave the way for historical data collection, when we’ll be able to see how data structures evolve over time, like when you keep drinking and see your belly go round 8-)

At that point creating a script that checks out tag after tag (changeset even, since we have plenty unused CPU power and disk space) and builds a software project, creating database entries, this time SCM changeset/tag tagged.

Wouldn’t it be lovely to know exactly when that extra belly wrinkle fat was generated? ;-)

dwarves mailing list

December 18th, 2007

I invite everybody interested in the continuing development of pahole and the dwarves to join us subscribing the dwarves mailing list by sending a message to majordomo@vger.kernel.org with “subscribe dwarves” on the body of the message. There are people already working on packaging the dwarves for several Linux distributions and even working on man pages!

Quickie: dwarves 1.3 released, bitfields edition

December 8th, 2007

F8 RPMS for x86_64 here, will build for other arches soon. If you saw a bitfield related BRAIN FART ALERT! please try again with this release.

ctracer on this release already generates systemtap scripts, here is a callgraph generated with it. Nevermind the timestamps, its using a too expensive routine.

pahole on debian!

July 8th, 2007

I was just thinking about (finally) learning how to write debian packages when I came across this. Now I can continue procrastinating… Thank you whoever you are!

vee-tables

July 8th, 2007

So now we have basic support for exposing the vtables information in the DWARF info for C++ objects, class__fprintf will just print something like this (from a struct in the CERN ATLAS project):

        /* vtable has 7 entries: {
           [5] = setProperty(_ZN9IProperty11setPropertyERK8Property),
           [6] = setProperty(_ZN9IProperty11setPropertyERKSs),
           [7] = setProperty(_ZN9IProperty11setPropertyERKSsS1_),
           [8] = getProperty(_ZNK9IProperty11getPropertyEP8Property),
           [9] = getProperty(_ZNK9IProperty11getPropertyERKSs),
           [10] = getProperty(_ZNK9IProperty11getPropertyERKSsRSs),
           [11] = getProperties(_ZNK9IProperty13getPropertiesEv),
        } */

We still have to support multiple vtables, but its a good start, and by looking at the linkage_name (C++ mangle-o-rama) we can get an idea where the vtable entries are from anyway.

Break on thru, to the other side

July 7th, 2007

I spent most of my early life as a contributor to free/open source software as a packager, coming from a life as a software developer in the dark dungeons. Now I’m just happy as a project I finally managed to make go to the 1st version, the funny little peoples one, get on gentoo, mandriva and now in one of the ones I respected the most while a packager: PLD. Thank you guys, its good to be packaged. Too late for a cl package tho :-(

Back Home

July 7th, 2007

The dwarves presentation at OLS went pretty well, people even seem to have liked it. The paper is now available, use it as the documentation.

Implemented –expand_pointers, that unfolds the pointer types in the same way that –expand_types expand non-pointer types. Should be useful in getting a bigger picture of a project data structure maze of relationships.

It should be a good first step on helping with checking ABI breakage, just use it in the old and new binary and use plain old diff to see what changed, perhaps something down three pointer levels. abichk will probably be a combination of this and what codiff does.

Here is an example:

Lets look at struct request_list in the Linux kernel:

$ pahole -C request_list fs/super.o
struct request_list {
        int                        count[2];       /*     0     8 */
        int                        starved[2];     /*     8     8 */
        int                        elvpriv;        /*    16     4 */

        /* XXX 4 bytes hole, try to pack */

        mempool_t *                rq_pool;        /*    24     8 */
        wait_queue_head_t          wait[2];        /*    32    48 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */

        /* size: 80, cachelines: 2 */
        /* sum members: 76, holes: 1, sum holes: 4 */
        /* last cacheline: 16 bytes */
};

Now lets expand its pointers, just one, mempool_t, and one that doesn’t have lots of pointers, to fit into this blog entry:

$ pahole --expand_pointers -C request_list fs/super.o
struct request_list {
        int                        count[2];        /*     0     8 */
        int                        starved[2];      /*     8     8 */
        int                        elvpriv;         /*    16     4 */

        /* XXX 4 bytes hole, try to pack */

        /* typedef mempool_t */ struct mempool_s {
                spinlock_t         lock;
                int                min_nr;
                int                curr_nr;
                void *             *elements;
                void *             pool_data;
                /* typedef mempool_alloc_t */ void * (*alloc)(gfp_t, void *);
                /* typedef mempool_free_t */ void (*free)(void *, void *);
                wait_queue_head_t  wait;
                /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
        } *rq_pool; /*    24     8 */
        wait_queue_head_t          wait[2];         /*    32    48 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */

        /* size: 80, cachelines: 2 */
        /* sum members: 76, holes: 1, sum holes: 4 */
        /* last cacheline: 16 bytes */
};

It even seems to naturally avoid expanding opaque types, i.e. struct forward declarations, etc, so if header file with structs in the ABI is well written it just doesn’t goes down the non-ABI type rabbit hole, which seems the right thing to do.

Use it with –expand_types and the complete picture can be seen. And in the Linux kernel it is, humm, big:

$ pahole --quiet --expand_pointers -C inode fs/super.o | wc -l
1345
$ pahole --quiet --expand_pointers --expand_types -C inode fs/super.o | wc -l
121922

One explanation for this last example is needed tho: –expand_types expands a type possibly many times, as many as there are members of its type, –expand_pointers, on the other hand, only expands any given type once, for the first member that is a pointer to this type.

This is because –expand_types was implemented to help in finding what was the field at some offset from complex data structures with a deep hierarchy, –expand_pointers, on the other hand, used most of the –expand_types code, but was implemented to help in finding ABI breakage deep inside the type.

I’ll eventually implement a config option to tell that only the first type should be expanded in the –expand_types case, make that a shell script that takes as parameters two files, call pahole for the two, do a diff and show where the ABI stopped being a virgin 8-)

Inheriting namespaces

May 25th, 2007

I’d prefer real state space, but for now I’ll have to seattle for C++ namespaces as the only inheritance to tap into. The dwarves now are able to show the wonders of goat guts, oops, of modern C++ techniques hidden behind layers of type within types within templates within namespaces represented in DWARF tags. This time there will be no tool sample output, they wouldn’t be pretty. Just be warned, pull from my kernel.org repo and be sure you have already digested everything you ate in at least two hours. Remember: you’ve been warned.

P.S. yes, there is a typoish message to friends in the northern hemisphere, far from the brazilian winter :o)

No song for 1.0

May 22nd, 2007

As part of the review process to get dwarves into Fedora I released 1.0. Numbers are not that much meaningful anyway. In a previous life, as a package pac-man (sorry about the pun, too lame…) in a jungle distro I always looked forward for one dot zero. Even for nfs-server! So, here it goes, one-dot-zero! In a Fedora mirror some day soon!

P.S.: rpms in the usual place.

Virtual doesn’t hurts as much as Real Life: !wii session

May 19th, 2007

Today went with friends to do some non-Wii bowling. What can I say? My fingers are hurting. Yeah, I suck in real life. Nah, back at home listening to some torrents, nevermind I have the CDs for 20 years, just that those Metallica songs are so buried in layers of dust I found it easier to pound my ears with something freshly downloaded 8-) My 5 years old niece still knows nothing about torrents and stuff, but she beats me hands down on the not-so-heavy-as-real-life Wii bowling. She will learn, but lets not hurry her…

Quickies

May 10th, 2007

- Paper for OLS about the dwarf debugging tools written, will be the badly needed documentation for these tools as soon as it is released. Will continue improving it even after presentation at OLS.
- Several new pahole options, for instance look at this
- libdwarves now uses libdwfl, so works well with RELA architectures such as x86-64

Kernel hacking mostly…

March 30th, 2007

The “mostly” allows me to celebrate my most recent non-kernel-hacking, at least personally, interesting news: I’m keeping up with jogging, now trying to get past the confortable part of jogging, over-achiever, not really… only 17:44.06 on a 3.3 KM circuit at Barigui, if only da beers would keep some distance from myself…

libdwfl conversion

March 30th, 2007

Finally I found time to convert dwarves to use the more active libdwfl library found in elfutils, thanks to Ulrich Drepper and Roland McGrath we can forget about most of the low level details related to RELA architectures and get the dwarves to work with kernel modules on architectures such as x86_64, where we need to do relocation work on the .debug_info section, now just enjoy pahole & friends working on more targets.

There is still some work to do on two of the dwarves that handle more than one target, codiff and ctracer, where dwfl_default_argp is not enough as it only handles one side of the game needed for these two dwarves, we need to open another set to compare in the case of codiff and another set for resolving the kprobes functions in the case of ctracer.

Some hops thru the USA

February 3rd, 2007

So I’ll be flying two hours from now, with a little layover at Sao Paulo trying to be cautions about the brazilian hellish flying russian roulette, that should be past news by now, but would you take any chances on a _real_ russian roulette? me neither.

So I’ll be on the east coast for some 4 days and then on another hop, close to the canadian border in the west coast for 4 days more, then back home, so expect no news (or just a few, to pester ya with news about you know what ;) ).

News fom dwarves land

February 2nd, 2007

Pahole:

Now we can also move bitfields around to combine it with other fields, moving a list of members with the same offset (the definition of a bitfield) to after other members that have holes bigger than the size of the (possibly type demoted) list of members comprising a bitfield, also there is a new pahole command line option to show all the steps involved in reorganizing a struct, –show_reorg_steps, look here for how it looks like.

Also related to this pahole news is the fact that the class__reorganize() & supporting code was moved to libdwarves.so, where its is made useful for the other dwarves, which leads us to the next dwarves news:

Ctracer:

The class tracer now is past its stone age, getting fast to an exciting iron age: printk was replaced by a not so carefully (as in SMP?!?! I only have a dual pentium 100 machine here, damn!) stolen from blktrace (I hope that the UTT effort is merged sooner than later so that I can use it), namely using debugfs + relay to, ho hum, relay the internal state of the class (a funny name for structs) being traced to userspace, ah, since I’m talking about the internal state: the way that it is collected at the probe points now is much, much improved, looking like blktrace, except for the fact that the data being collected is the subset of the members that are “reducible” to a basic type (signed and unsigned int, long, char, long long), which, for now, its just the basic types, but will be shortly augmented by a set of “reducers” for things like spinlock_t, wait_queue_head_t, etc.

Ok, lets break the paragraph a bit to get away from my problem with writing: very long paragraphs…

Back to how ctracer packs the internal state: have you seen how pahole can reorganize structs to fill holes and reduce the structure size? Yeah, we just look at all the struct members, looking for DW_TAG_basic_type members and leave just those, then call class__reorganize() and the struct gets in the best possible layout to save space at each probe point.

Ctracer also uses the same opportunity to generate a tool called ctracer2ostra.c, that is used to convert the binary traces to a colon separated list of formatted values easily parsable by ostra-cg, the callgraph tool that produces nifty buzzword soups (CSS, javascript, ajax in the future, cross-referencing with LXR, python matplotlib graphs of values that each traced class member got in the course of tracing, and plottings for the times each of the “class” “methods” took).

The new improved process for this is detailed in the README.ctracer file, but to give you a quick glance of what it involves:

rpm -ivh http://oops.ghostprotocols.net:81/acme/dwarves/rpm/libdwarves1-0-14.i386.rpm http://oops.ghostprotocols.net:81/acme/dwarves/rpm/dwarves-0-14.i386.rpm
mkdir foo
cd foo
ln -s /usr/lib/ctracer/* .
make CLASS=sock
insmod ctracer.ko
# do some networking kung foo fighting activity or change CLASS to your preferred struct (but beware, as we use do_gettimeofday (how lame) you can get some crashes if tracing something more fishy than struct sock methods)
cat /sys/kernel/debug/ctracer.o > /tmp/ctracer.log
rmmod ctracer.ko
make callgraphs
# Enjoy things like this, where I was bold enough to try this thing on my main machine while doing all sorts of stuff, believe me, it was just one or two dozen harmless oopses, kprobes trying to justify its paycheck!

And to top it all I’ve been getting help from Davi Arnaut on the global variables and DW_AT_location front, were he contributed basic support for where variables are in the computer memory pyramid (Registers? Stack?), he even wrote the first non-acme dwarf, pglobal, that shows all the global variables in your dear project binary files, now its just a matter of extending this to use libebl and get the register names for each arch to go to the next ctracer level, where we’ll stop using costly Jay-probes and get the elusive parameters directly from their hiding place!

Kill-a-hole: Reorganizing struct layouts

January 30th, 2007

Say a project has this struct:

struct cheese {
char name[52];
char a;
int b;
char c;
int d;
short e;
};

And we want to see how the layout looks like more precisely:

[acme@filo examples]$ pahole swiss_cheese cheese
/* <11b> /home/acme/git/pahole/examples/swiss_cheese.c:3 */
struct cheese {
char name[52]; /* 0 52 */
char a; /* 52 1 */

/* XXX 3 bytes hole, try to pack */

int b; /* 56 4 */
char c; /* 60 1 */

/* XXX 3 bytes hole, try to pack */

/* — cacheline 1 boundary (64 bytes) — */
int d; /* 64 4 */
short int e; /* 68 2 */
}; /* size: 72, cachelines: 2 */
/* sum members: 64, holes: 2, sum holes: 6 */
/* padding: 2 */
/* last cacheline: 8 bytes */
[acme@filo examples]$

Heck, what a swiss cheese! Surely we can do better, huh? Lets ask pahole for a little help:

[acme@filo examples]$ pahole -kV swiss_cheese cheese
/* moving c(size=1) to after a(offset=52, size=1, hole=3) */
/* moving e(size=2) to after c(offset=53, size=1, hole=2) */

/* <11b> /home/acme/git/pahole/examples/swiss_cheese.c:3 */
struct cheese {
char name[52]; /* 0 52 */
char a; /* 52 1 */
char c; /* 53 1 */
short int e; /* 54 2 */
int b; /* 56 4 */
int d; /* 60 4 */
/* — cacheline 1 boundary (64 bytes) — */
}; /* size: 64, cachelines: 1 */
/* saved 8 bytes and 1 cacheline! */
[acme@filo examples]$

Much better now, no?

Ok, lets try something more interesting, like some Linux kernel structs, the output is big, so here are some links for some structs that spent some jiffies on pahole’s spa:

Type demotion of bigger than needed bitfields will help getting more saved :-)

Unfolding structs

January 29th, 2007

Check the git cset comment! New rpms are available at the usual place with this new feature, for the curious, here is struct task_struct unfolded.

ctracer made easy

January 27th, 2007

Make sure you have the kernel-debuginfo package installed!

rpm -ivh http://oops.ghostprotocols.net:81/acme/dwarves/rpm/libdwarves1-0-9.i386.rpm http://oops.ghostprotocols.net:81/acme/dwarves/rpm/dwarves-0-9.i386.rpm
mkdir foo
cd foo
ln -s /usr/lib/ctracer/Makefile .
ln -s /usr/lib/ctracer/ctracer_jprobe.c .
make CLASS=sock
insmod ctracer.ko
# do some networking activity or just move the mouse :-)
dmesg # or tail /var/log/messages
rmmod ctracer

Now to work on kahole, kill a hole, a tool that will reorganize structs to kill holes and that will provide the bits needed for struct “views”, i.e. to specify a list of fields in a struct that are of interest for collection at each probe point and that will also be used to create a userspace utility that will read the relay channel (/sys/kernel/ctracer0) and generate what ostra-cg, the callgraph tool from the OSTRA days needs to generate things such as this session collecting struct task_struct methods calls in the Linux kernel, ah, ostra-cg also produces this methods statistics and from this page you get plottings such as this one for sched_
fork calls
, workload is forgotten right now, but as soon as I get my new test machine we’ll have more interesting graphics :-)

libdwarves (mostly) explained

January 20th, 2007

In response to a comment I’ve added comments to the ctracer tool source code in an attempt to help people interested in helping improving these tools, check it out!

How many functions receive as a parameter a pointer to a struct?

January 13th, 2007

Implemented a new command line option in pahole: –nr_members, look here for the result using a kernel built for qemu.

Support for more C++ tags

January 13th, 2007

Added support for DW_TAG_reference and DW_AT_specification, still have to work on DW_TAG_namespace and rework the list of tags to make classes within classes to properly be shown in the tools, nowadays its just a flat namespace, which has not been a problem for the tools written so far, but for pdwtags, the dwarf I’m working on now, that will do a complete dump of all tags it is a must.

What are the dwarves?

January 12th, 2007

Answering Avi Alkalay’s request for a description of what are the dwarves I managed to write the rpm spec file description tag, should help in clarifying the usefulness of these tools :-)

RPM packages for the dwarves

January 11th, 2007

Got cmake in shape for installing the tools, resulting rpms packages are available here, please test and tell me your impressions!

The rpms were built on a Mandriva Cooker machine, but were already tested by friends on at least a Fedora Core 6 machine, working perfectly.

Lots of improvements done on the dwarves

January 10th, 2007

Mostly on class nesting, look at this commit to see the goodies, hint, look for anonymous unions and structs :-)

Fuck for all to see

January 9th, 2007

Want to get your paycheck boosted? Go to a beach, find some celebrity going down the drain, fuck her, make sure that it will get recorded and sent to some place where it’ll get into the screens of curious people, share the profits, yes, it is that easy.

Its not easy to be a brazilian, judges could help our people feel more proud after all the messes we’ve been seeing from people supposed to be keeping law, order and decency, go, go, set the standards, but please, don’t make it to be so lame as they are right now.

Update: some friends even thought that my machine was invaded for such a rude language, sorry, but to provide some context on the outrage that is motivating many such blog posts look no further than here.

RMS being the first to talk about the “Open Source” idea

December 27th, 2006

Nah, I never commented on anybody’s blog, but this was over the top, Richard, you should really try to experience some online translation service, if you want to get your blood pressure over what is needed to lose some weight, and as I’m trying do so, go, quick, read Avi Alkalay’s blog!

5th dwarf fighting for attention

December 27th, 2006

After all it will be eclipsed by the 6th, and then by the long cherished 7th, the one that will trigger the 1.0 pahole (btw, the 1st dwarf!) release. So, without too much ado, ctracer, the aforementioned current dwarf, is making great inroads into being something useful to more than one person, perhaps, but anyway, I’m getting short on time, my wife is almost ready for us to enjoy the current weather here in Curitiba, after our beloved cocker spaniel got his nail cut way too short, bleeding :-(, well, continue reading from here

The 5th dwarf

December 24th, 2006

Have to cook the turkey, so this one will be brief, have been working on a new tool, that ultimately will provide what ostra was capable of, ctracer, the new tool generates a kernel module that after built and loaded provides these preliminary results. Now to work on a not so lame relaying module for the trace information, such as the one in blktrace.

C++ dwarves

December 6th, 2006

Have been working on supporting C++ concepts expressed in DWARF tags, so far we support definitions of classes, enums, etc inside classes, namespacing not yet supported, but as each class, being inside another class or not have a unique DWARF identifier, the so called cu (Compilation Unit) offset, not having namespaces is not much of a problem, only when printing a class we don’t see it properly nested, but this will be handled shortly.

Another thing to do is to add support to DW_TAG_inheritance, so that the dwarves gang don’t get confused with the “first” member having a non-zero offset, this also is something that sounds rather easy to implement.

All in all this experience has been so far very interesting, to finally match all the theory I have learned in another life, some 13 years ago, when I was a C++ programmer, matching the reality of a, by now, mainstream compiler, g++ :-)

Anybody that by chance reads these ramblings and uses another compiler that generates DWARF info (Intel C, for instance comes to mind) and that could send me an object file + the preprocessed output for me to see if these tools works correctly with something non-gcc/g++ would be very welcome and I’d really appreciate a lot.

Pahole fighting non network bloat

December 1st, 2006

Andrew Morton merged the two patches I’ve submitted for saving bytes in struct inode and struct mm_struct, great! I’ve also did some more work on pahole based on ideas by Bernhard Fischer, that also provided a patch fixing the –help command line option.

On the DCCP front we’re going thru some growing pains, with CCID3 getting bugs fixed but in the process seemingly having its performance dropped, in the next few days we’ll surely get it to a better state, but its nice to have more people involved and actively working on fixing bugs, implementing missing functionality and discussing the patches that are continuously being submitted, thanks guys!

Fight bloat, save some bytes here and there

November 30th, 2006

Since I’ve wrote pahole & friends I’ve just announced it, posting some results to lkml Andrew Morton has made good use of it uninlining some big functions, but I was sidetracked with all sort of things, life, Real Work(tm) and whatnot, not many people got interested tho, so I thought about doing myself a witch hunt, looking for structs with holes, DaveM has picked all the ones I’ve submitted on net/, now I’ve been working on reducing netfilter struct holes (note that due to utter slowness in the www.kernel.org mirroring process not all of the csets where at this URL as of this blog entry time), and got out of my shyness to post at least one for non networking stuff, struct inode and will go, in the next days, attack other things such as struct task_struct that has some holes too, so lets see how people react to such patches, that of course have to be taken with a grain of salt, as moving things around in such core structures is not a straigthforward mission, locality of reference may well justify some of the holes, that is why I implemented –cacheline(_size) on pahole :-)

Ressurection

November 29th, 2006

OK, stopped procrastrinating and got ghostprotocols.net back working, this time using www.zoneedit.com, cool service, everything working as expected so far, still have to configure the dyndns bits, using ddclient, thanks Aristeu for knocking my head on this one!

Now to get back at reviewing the excellent series of DCCP patches coded by Gerrit Renker, this guy is really putting a lot of effort in getting DCCP on Linux to the next level.

Promise, will try to keep this blog alive, next thing probably to report will be about pahole and the 7 dwarves 8-)

Saturating Bandwidth!

February 23rd, 2006

DCCP progressing nicely, merged a fair number of patches and have some more in the backburner.

Cleaned up the RSBAC debugging code, thousands of lines replaced by a pr_debug like macro, already merged, probably will work on some more cleanups before getting into some refactorings, but its something completely new to me, so lets see how this thread goes.

Also got ostra simplified and on track for improvements to get it in shape for OLS, where I’ll do a presentation about it. One of these improvements is support for userland apps, with X.org Xlib as the first victim :-)

In LLC land: have to find time to help a friend working on MPI over LLC, but this codebase needs a revamp, perhaps I’ll dig my old net-experimental-2.6.1 tree where I played with converting LLC to use sk_prot, etc.

And to use the last bandwidth bits I’ll present a talk about the current status of Linux IPv6 at Sao Paulo’s LinuxWorld, in a few months.

Window growing…

February 4th, 2006

Finally managed to merge Andrea Bittau work on ack vector records and CCID2, after reworking some bits.

Also made ostra for the first time work with a userspace app, in fact a X.org library, next week will continue working with Gustavo Boiko on tracing not just the Xdisplay (IIRC) struct, but the Xevent one as well (that thing is in need of some of the treatment I gave struct sock in the linux kernel some years ago, class hierarchy with per Xevent derived class slab caches, etc).

Still on kernel hacking slow start

December 22nd, 2005

Failed on a promise to Andrea Bittau to review his CCID2 patches, but at least provided him with some feedback of my dislike for the “fastlane” stuff in his patch, as I said to him in private mail I’ll try tomorrow morning to finally do this task.

In the previous days I was also distracted by the kernel sources include hell, while introducing something I proposed and Dave said he was ok with it, sk_receive_skb, for the common sequence at the bottom of some layer 3 protocols, such as DCCP v4 and v6 and TCP v6 (TCPv4 has the preload stuff that has to be generalised first before we can use sk_receive_skb on it), and also inet_sk_receive_skb and inet6_sk_receive_skb, wrapping the xfrm routine just before the by then present sk_receive_skb call, in the process I fixed a fair number of “needed header not included but got for free from some header actually included” cases, putting some forward declarations for things like struct module (#include linux module.h was unneeded in some cases).

If you want to see a glimpse of this hell look no further than the output of an old script I wrote that uses graphviz (see my www.kernel.org/ acme area for ghviz and hviz), the one for tcp.h is here.

DCCPv6 Submitted

December 11th, 2005

3 patches still remaining from a previous life, aka the two days I actually took to implement this code, before my wife health problems, now I’m getting used to having a life back, so hacking has to cope with its rightful second place in my life, but I’ll get used to it, as its much better for all the people I love, including myself :-)

DCCPv6: Almost There

December 1st, 2005

Today I flushed a series of patches generalising some more networking code to pave the way to introduce IPv6 support for DCCP, the last changeset in a series of 6 has a “we’re close to the end” comment, but it is kinda misleading, as there are some more changesets in my queue that have to be submitted before we really get to introduce DCCPv6, oh well, tomorrow I’ll continue.

Trying to Get Back

November 30th, 2005

Life is getting back to normal, finally. I’m now trying to get my dccpv6 tree, that (guess what) generalises lots more stuff among TCP and DCCP v4 and v6 and that will ultimately introduce IPv6 support for DCCP.

Tomorrow I’ll try to continue this work and post about the new family member, Micaela, a lovely female persian kitten :-)

If everything goes well I’ll even dare to start working on NFS over DCCP next month, that should provide a nice way for me to test the DCCP code without resorting to userland programming 8-)

Net 2.6 patch monkeying

October 21st, 2005

So David Miller is on vacation and left me as the net 2.6 maintainer for two weeks, after an initial delay it seems this is working as Linus merged my first batch for 2.6.14, whee!

Now to work this weekend on collecting lots of patches for 2.6.15, and one of the things I’ll definetely merge is… my work on generalising more stuff on IPv6 and of course DCCPv6.

Back to the Future

September 25th, 2005

After a week working on customer projects I got back at kernel hacking, initially going back in time some 20 years to work on merging an old LLC (802.2) series of patches I wrote in the 2.6.0-test1 days that were forward ported by Jochen Friedrich, that also fixed some bugs that were preventing him from using an HP printer that only talks LLC2 (connection oriented) over Token Ring, now his CUPS backend works just fine 8)

Then back to the future to implement DCCPv6, that was rather easy as most of the infrastructure was already in place for TCP, I just had to tweak the timewait and request sock hierarchies to work just like the inet_sock one, i.e. having a pointer to where in the per protocol slab objects the IPv6 specific bits are located, in fact I used an offset variable in struct inet_timewait_sock, not a pointer like struct inet_sock’s pinfo6, that probably I’ll convert to offsets too so as to save precious bytes in 64bit architectures.

Ah, the LLC2 tests were made using openssh patched to work with PF_LLC sockets, ’ssh -5 00:04:76:3B:53:C1′ and look ma, no TCP/IP 8)

DCCPv6 was tested using a supercharged, get{name,addr}info enabled ttcp tool, another one that looks like an animal at Jurassic Park 8)

Thinking out loud…

September 17th, 2005

The DCCP specs emphasise the 2 “half connection” concept, where most people see one connection, represented in Linux by a “struct sock”, that has a series of mechanisms (sk_lock, sk_backlog, timers) shared by both the TX and RX path, i.e. if one is doing a TCP sendmsg the lock has to be grabbed with lock_sock() (TX path) that in turn will make the RX path potentially put packets in the backlog to be processed only when the TX path queues or sends the packet (struct sk_buff) down to the next layer (tcp_transmit_skb for instance).

Yes, there is underlying queueing on the hardware card, driver, qdiscs, etc but following the DCCP recomendations I guess a packet on the RX queue that has information (data and ack) for both the TX and RX half connection should really not be delayed to be sent to the TX half connection just because there is a (potentially) big queue for the RX half connection, or if the TX half connection is busy sending a packet (in the dccp_sendmsg -> ccid rate limiter -> dccp_transmit_skb path) would have _another_ sk_backlog, and when the TX path finishes, _without_ waiting for any RX backlog processing would process its backlog, i.e. we would have a sk_tx_backlog/lock_tx_sock/release_tx_sock(TX half connection) triple and another for the RX half connection, increasing the paralelisation in the full connection (RX + TX half connections).

DATAACK packets, that are of interest to both half connections would perhaps be shared skbs, being in two queues at the same time, using part of skb->cb to have a next pointer for the second queue…

This gets more interesting in scenarios where the TX rate from A to B is roughly equal to the one from B to A, i.e. mostly no quiescent half connection, when as much as possible one half connection wouldn’t be stepping on the toes of the other.

A good chunk of struct sock/inet_sock would be shared, id lookup, others that exist today would be used for, say, the TX half connection, while others would be duplicated in dccp_sock, humm… time to look at the code and check if this is all nonsense…

Quickies

September 10th, 2005

As mentioned in Ian’s blog we did what may well be one of the first DCCP connections in the Internet, from .br to .nz, whee! Now I’m bored to write more than these few lines, but tomorrow I’ll finish the timeoffset code, with sk_buff tstamp converted to it and DCCP using it, lets see if we get better timing calcs out of this.

Real Ramblings

September 5th, 2005

So far I’ve been reporting what I had done, but I’ll now try to do what this blog title says: write ramblings about what I’m thinking about doing, lets start…

About the TCP pluggable congestion control and my intention of using it in DCCP for the CCIDs, one of the things I’ve read in the DCCP drafts was that one of the differences of DCCP from TCP was that the congestion decisions and access to the congestion state (cwnd, sstresh, etc) were not sprinkled all over the protocol definition, but clearly separated in the CCIDs, leaving the core to the things that are common to all CCIDs.

In the Linux implementation I basically took most of the TCP code not related to congestion control and made it generic, being used by DCCP (any other INET transport level can take advantage of this infrastructure, SCTP for instance) , and got a CCID3 implementation (implements RFC 3448) from a different tree, that had its lineage originally from FreeBSD, the way the TCP like core code interacts with the CCIDs is modelled after this CCID3 implementation, now I’m thinking about how to proceed.

So far, in the pursuit of having the DCCP code looking as much as the TCP equivalent code as possible, I got the sendmsg path all the way to the equivalent to tcp_write_xmit, that transmits as many skbs from sk_write_queue as the congestion control algorithm allows.

My current doubt is if the right thing is to model all the congestion control algorithms to use variables (congestion window, etc) accessed directly by the core TCP code or if it should always ask the congestion control module being used by means of functions that aggregate the opencoded now sprinkled thru the TCP code.

In the 2.4.20 DCCP implementation (incomplete and for an old draft) by Juwen Lai he implemented the concept of ->sendamap(), i.e. “send as much as possible”, that is roughly the equivalent to TCP’s tcp_write_xmit(), but is a function pointer provided by the CCID (Congestion Control algorithm) being used for the TX half connection.

So in my private tree I’m experimenting with the concept of ccid->ccid_hc_tx_write_xmit(), that uses the concepts outlined in the last paragraph and bodes well with the current CCID3 code, that hasn’t any cwnd variable, but instead calculates a send rate that is allowed at any given point in a connection lifetime, implementing almost directly the equations in the CCID3 draft and in RFC 3448.

At some point I’ll try to investigate how this CCID architecture would be used by TCP, in the search for the nirvana Grand Unified Congestion Control Infrastructure, the GUCCI ;)

Better Test Environment

September 2nd, 2005

Revived my A500 PARISC64 machine, upgrading to latest Debian unstable and installing a tg3 gigabit ethernet card, now it is my noisy internal router, with the added bonus of being a big endian, 64 bit machine, where we can test DCCP, in the kernel and the apps, like tcpdump, that after some tweaking seems to be working. Too bad the openssl library isn’t optimized for parisc64-linux (according to Grant on the #parisc-linux channel), git is slow to a crawl due to that.

On the DCCP kernel front I’ve been working on CCID3, experimenting with different CCID infrastructure hooks, reading Juwen Lai’s old stack for 2.4.20.

Also found a bit of time to write ostra-mstats, that collects profiling data out of the data already probed, and now that I have a faster test environment the simplistic collector shows its defficiencies and gets in the way, have to move on to relayfs, when I get it working well with relayfs, steal some of the ccache ideas to make ostra become in fact the preprocessor that it really is, get a web page for in place for Rusty, etc I’ll finally announce this toy. :-)

DCCP Progresses

August 28th, 2005

A lot of things were implemented since my latest entry here, let me mention some:

  • setsockopt and getsockopt for the packet size, that is used in the CCID3 transmit rate calculations and tested it using ttcp patched to support DCCP
  • mucho ccid3 polishing (timeval, usecs divisions, etc)
  • moved the loss interval code from ccid3 to loss_interval.c
  • moved the TFRC equation used to determine the sending rate according to RFC 3448 (TCP Friendly Rate Control) to tfrc_equation.c
  • moved more packet history code to packet_history.c
  • created a new module, dccp_tfrc_lib.ko, that for now is used only by ccid3.c but will be used to implement other CCIDs, such as the CCID3 VoIP variant
  • implemented dccp_poll, and tested it with netcat patched to support DCCP
  • implemented dccp_get_info, now its possible to get information such as the rcv and send RTT, etc using iproute2

I also setup a test network here at home with a router between the two machines where I run DCCP clients and servers, ttcp mostly so far and in the router I use the netem qdisc to induce packet loss and delay, I’m really liking netem so far, very handy.

The behaviour of this stack is very promising, with good results, mostly matching what is expected, interested readers can take a look at some OSTRA callgraphs and plottings, such as the one for the sending rate under high packet loss.

This entry somehow disappeared from my MySQL wordpress tables, so I restored it from google cache, unfortunately there were some comments in this entry but I didn’t took the time of restoring it from the cache, I’m sorry…

DCCP Milestone

August 23rd, 2005

Sent a message to the DCCP lists (at vger and IETF) reporting a milestone, the current implementation finally survives sending tons of data in a DCCP patched ttcp, the CLOSING timer is basicly implemented and now I’m working on providing more info thru inet_diag, so it seems this thing is in the right direction.

Sparse & DCCP

August 17th, 2005

Got most of net/ sparse clean, Dave merged all the submitted patches and the last cases (related to put_cmsg et al) I’m discussing with Al Viro.

Also fixed a nasty bug in DCCP related to the draft pseudocode step 6, where we’re not using the updated SWL when receiving SYNC or SYNCACK packets. Also actually answer SYNC packets with SYNCACK ones, there is still some remaining case where a received SYNC is not eliciting an immediate SYNCACK packet, will investigate tomorrow.