Archive for Programming

The worst signal safety bug

Suppose you want to change a QObject’s thread affinity with moveToThread().  Of course, you need to be mindful of what your object is doing, how it is parented, what it is interacting with, etc, to use moveToThread() properly; it is certainly an advanced function.  However, there’s a “gotcha” that might not be immediately apparent, which is that you have to be even more careful if you want to use moveToThread() on an object that isn’t signal safe (SS). Why?  Let’s do an example.

Here we have class Foo, which emits stuffDone() at some point. The Bar class is listening for this signal, and will move Foo into another thread once the signal is received.

void Foo::doStuff()
{
    [...]
    emit stuffDone();
    [... more code ...]
}

void Bar::foo_stuffDone()
{
    foo->moveToThread(otherThread);
}

The problem is that the code following the signal emit will be executed from the object’s original thread and not from the thread that the object has been moved to. Even if the object isn’t doing anything in the new thread, it may still be illegal to use certain calls from the original thread. For example, if Foo had child timer or socket objects, and these got moved into the new thread, then any calls to these objects from the original thread would be major fail.

Anyway, there isn’t anything new to teach here. If you don’t know the SS compliancy of an object, then you can’t do anything with it if one of its signals might be on the call stack. It’s that simple, and moveToThread() should not be treated any more or less specially than other member functions. What’s interesting about moveToThread() are the disastrous results you can get. ;)

Comments

Why I don’t document or unit test

I apologize, the headline was mostly an attention grabber.  I do write documentation and create unit tests, and in fact I think they are great things.  All software projects should be completely documented and have thorough automated testing.  However, more often than not, projects fall short in providing these things.  Maybe it’s because there is no glory in documenting and testing?  Tests are especially a bore, as they just prove what should have already been true.

Or maybe it’s because documentation and tests provide only marginal value compared to what else you could be doing?  For a moment, I wanted to say that volunteer open source projects are more likely to be resource-strapped, but even software companies can be guilty of not wanting to assign resources to these things.

With this in mind, I want to say that deferring documentation and testing is a natural result of trying to get the most bang for your development buck.  I won’t disagree that it’s easy to be lazy about these things, but, personally, even when I attempt to discipline myself, I’m quite cautious about investing time in documentation or tests that I will only need to repair in the future.  Consider how often many of us refuse to make fixes to our implementations if we are planning to “rewrite it all” later.  It is this very same reason that I resist documentation and testing, and I’m not just making excuses. :)

I don’t document, because I hate my design

So here I am writing a paragraph of doxygen text about a method call, of course with an example usage and with cross-referencing, etc.  Hey, I like good, high quality documentation like the best of them.  Then, later on during development, I realize that this method is all wrong and needs to go.  Heck, this whole class was written with a misunderstanding about the data it operates on.  Hrmph.

It happens in software that you have to rewrite stuff.  That’s just the way it goes.  It’s hard to get a design right, even when you’re sure it’s right.  However, if you know your design is wrong, then you really have no business documenting.  Even if you have no opinion as to whether the design is good or bad, because you wrote it one day and it’s done the job fine, you’re probably still better off not writing documentation until you do a design audit.  Oh, and you really don’t want to change your design but then not update your documentation.  As the saying goes, wrong documentation is worse than no documentation.

In general, if other users and programmers are figuring out how your software works and are not doing idiotic things all the time, then skip the documenting for now and put your energies elsewhere.

I don’t test, because I hate my implementation

And I almost always hate my implementation.  Design comes first, then implementation, and with so much design still left to get right it’s easy to see how implementation takes a back seat.  Often, I write implementations well enough to satisfy a given design, and then quickly move on to the next design issue.  What a day it will be when I only have to worry about implementations.

You may say: but a test need not depend on the implementation.  I agree, to a point.  There are basic properties you can test, for example you could verify that a return value received is the correct one based on the arguments you provided, or you could verify that some public value is within an acceptable range, etc.  However, to test all code paths for full coverage, you need to make tests that are intimately related to the implementations.  Anytime the object changes private state, this state needs to be confirmed by a test.  It is not enough to check merely the public state.

In general, we all write working code.  If Psi did not consist of a majority of working code, then nobody would use it.  Even without having any unit tests for many years, the software more-or-less still did the right thing.  There is not a strong reason to bend over backwards doing unit testing when the software naturally has such a high correctness rate.  Spend your energies on tasks with greater impact (like new features or refactoring, which will end up having a high probability of correctness) and return to unit testing once you’re happy with your implementation.

Exceptions

Yes, there are times when you should document an incomplete design, and that’s when you’re depending on other people understanding your work-in-progress.  The documentation needn’t necessarily have a glossy finish.  Sometimes even the smallest notes can go a long way in directing the reader.  I’m not just talking about programming documentation here.  Even a bare bones readme file describing how to compile your darn project can be immensely helpful and takes little effort.

I don’t know if it makes sense to unit test an incomplete implementation.  At least, doing mediocre unit testing seems like a waste of time compared to mediocre documentation which can be quite valuable.

Conclusion

Deferring documentation or unit tests is not always about laziness.  It’s about efficiency.  Note my usage of the word “defer”.  Documentation and tests should be done eventually.  I just feel that it is acceptable to prioritize tasks of higher impact first, and to only put serious effort into documentation and tests when the probability of having to rewrite them later is low.

Happy coding!

Comments (2)

Nested eventloops are pure evil

From the gst_bus_poll() documentation of GStreamer:

“This function will run a main loop from the default main context when polling.

You should never use this function, since it is pure evil. This is especially true for GUI applications based on Gtk+ or Qt, but also for any other non-trivial application that uses the GLib main loop. As this function runs a GLib main loop, any callback attached to the default GLib main context may be invoked. This could be timeouts, GUI events, I/O events etc.; even if gst_bus_poll() is called with a 0 timeout. Any of these callbacks may do things you do not expect, e.g. destroy the main application window or some other resource; change other application state; display a dialog and run another main loop until the user clicks it away. In short, using this function may add a lot of complexity to your code through unexpected re-entrancy and unexpected changes to your application’s state.”

See, I’m not the only crazy one around. Now, run along and get rid of those QMessageBoxes.

Comments (1)

Meeting expectations in QObjects (Part 1)

I’ve written a few articles on how I believe QObjects should behave, and I’ve even attempted to codify these behaviors in the Delta Object Rules.  However, I haven’t offered many solutions on how to go about making QObjects that follow these rules.  This is in part because I didn’t really have any good solutions.  For the past few years I’ve experimented with some different approaches though, and I’ll outline them here.

Basic Approaches

The most common approach you may have seen in my newer code is the use of member QTimer objects for single shots to defer execution.  This is the main way to solve the DOR DS problem.  QTimer::singleShot() and QMetaObject::invokeMethod() can do this as well, but the key is to use QTimer as a real object so the single shots can be canceled.  The ability to cancel allows solving SR.

Of course there’s also the old-school technique of emitting signals as the last part of a code path.  This is often a solution to SS.

Another old-school technique is for an object to monitor itself with QPointer, but I no longer consider this to be a real solution.  In my revised article on SS, I discuss how the problem goes beyond destructors.  You can use QPointer if the only state change you’re worried about is destruction, but I’d say that’s the least of your worries.

Therefore, QTimer and “signals last” are in, and QTimer::singleShot(), QMetaObject::invokeMethod(), and QPointer are out.  Let’s talk about QTimer and “signals last”.

“Signals Last”

The idea here is that you try to emit your signals at the end of your object’s code path.  So if you have X things to do while code runs in your object, and one of them is a signal, then you arrange your code such that the signal emit is the last thing that happens.

Why is this useful?  Well, whenever you emit a signal you’re leaving your object vulnerable.  The slot at the other end of the signal might run code that inspects or changes your object (maybe it runs some member methods on your object).  Is your object prepared for this?  Is it in the right state?  If you emit your signals last, then the idea is that your object has already performed all of its other operations and is now in a predictable state.  Additionally, there’s no worry of the object being in an unpredictable state upon return, because there’s no code to execute upon return!

The “signals last” approach becomes a challenge when you need to emit two signals, or when you need to add a new signal into the middle of some existing code.

QTimers

Use QTimers to defer a signal (or to defer the code that leads to a signal).  If a directly-invoked member method would want to emit a signal, then you can split the method into two parts: 1) a method called by the user which simply starts a QTimer, and 2) a deferred method, invoked by QTimer, which performs whatever the member method was originally supposed to perform.

In other words, this:

void MyClass::doStuff()
{
    something();
    somethingElse();
    emit stuffDone();
}

becomes:

void MyClass::doStuff()
{
    doStuffTimer.start(); // singleshot timeout calls doStuffDeferred
}

void MyClass::doStuffDeferred()
{
    something();
    somethingElse();
    emit stuffDone();
}

Alternatively you could perform all but the signal in the first part, and let the second part be just the signal emit. There is a difference when you do this, and I’ll discuss what that means later on.

Unfortunately, the QTimer approach gets unwieldy for complex objects.  For example, I consider the QCA::TLS implementation to be the state-of-the-art regarding the QTimer approach, and yet the resulting code is barely maintainable.

The Challenge Ahead

You pull up a clean, blank source file, and declare a nice QObject-based API.  You then set out to write an implementation for your API, and, upon completion, you discover that your object doesn’t behave the way a user would expect.  It seems downright silly, but the most straightforward implementations are almost entirely wrong.  Did you use “signals last” or QTimer?  No?  Then you’re probably in violation of at least DS and SS.

Before you attack the Delta Object Rules for being overbearing, keep in mind that they aren’t rules I pulled out of thin air.  DOR states the way everyone generally expects Qt objects to behave.  Yes, even you.  You start working with someone’s object, and you expect DOR-compliance.  All I did was give it a name.  The expectations have always been there.

So, straightforward implementations behave wrong.  A non-straightforward implementation is needed for proper behavior.  Can we improve this situation? Can we create convenience mechanisms so that we can write properly-behaving objects with straightforward implementations?  That’s the challenge ahead, and we’ll get into that in Part 2.

Comments

Why signal retraction is hard (Part 1)

While dusting off some Iris code the other day, I found a crash in ServiceBrowser.  Apparently, deep down in the internals, some unexpected event was being received.  Specifically, an asynchronous operation was started and then later canceled, but a completion event for that operation was delivered anyway and the application would naively process it.

There’s two solutions to a problem like this.  1) the application should ignore irrelevant/obsolete/invalid events, or 2) the subsystem should not deliver irrelevant/obsolete/invalid events.  Solution #1 is often the most straightforward, but it is also the least satisfying, and requires some way of identifying an event as irrelevant/obsolete/invalid.  Solution #2 is more difficult, but it leads to a system that works in a way the user (of your API) is more likely to expect.  I’m reminded of what I wrote in my Signal Rectraction article, about how association between an event and the action that caused it is almost always implied in Qt programming.  That is, we don’t go around passing context ids in our signals.  You just have to assume that all Qt signals are valid.  So, for someone like me, the natural solution is #2.

(Side note: I’m not positive as to what the catalyst to my particular problem was, but I think it had to do with my recent IPv6 adjustments, where LAN queries are sent on both the IPv4 and IPv6 interfaces simultaneously if you happen to have both.  Once a reply is received on one interface, the query on the other is canceled.  The unexpected event was coming from the other interface.  Most likely it was delivering an answer to a query that I had canceled and no longer cared about.)

In any case, I started out by going through the layers, from the bottom up, to see where any problem situations were.

First up: JDNS.  In JDNS, you call jdns_query() to start a query and jdns_cancel_query() to stop a query.  You run the engine by calling jdns_step() as appropriate (without going into too much detail, jdns_step() performs one iteration of the JDNS engine and then returns instructions about how and when you’re supposed to call jdns_step() again).  After the jdns_step() call completes, there might be some events waiting for you, which you can pick up by calling jdns_next_event().  JDNS only ever appends events to the event queue.  It never removes events.  It is your job to pop them off with jdns_next_event().  When you start a query, you are given an integer id as a “handle” to associate with it.  You can use this id to cancel the query, and incoming events for the query have this id as a property.

My first question about JDNS was: is it possible to receive events for canceled queries?  Yes, I could think of some ways.  At the very least, the user could run a query, accumulate an event for it, and then cancel the query.

// start a query
int id = jdns_query(session, "jabber.org", JDNS_RTYPE_A);
...
// run the engine, long enough such that the query succeeds
jdns_step(session);

// cancel the query
jdns_cancel_query(session, id);

// read an event
jdns_event_t *event = jdns_next_event(session);
// event->id == id

There will be an event in the queue, not associated with any active query, that the user could misinterpret.  Before considering this to be a flaw in JDNS, I wondered if this could simply be considered a misuse of the API (the misuser being me, heh).

Maybe the user could just filter out invalid events by comparing the id of a received event with the ids of any active queries.

// a container to hold active queries
QSet<int> queries;
...
// start a query
int id = jdns_query(session, "jabber.org", JDNS_RTYPE_A);
queries.insert(id);
...
// run the engine, long enough such that the query succeeds
jdns_step(session);

// cancel the query
jdns_cancel_query(session, id);
queries.remove(id);

// there may be an event, but we can easily see if it is for a query we care about
jdns_event_t *event = jdns_next_event(session);
// queries.contains(event->id) == false

This should be possible as long as events are only generated during the jdns_step() call.  I say this because, well, imagine code like this:

// a container to hold active queries
QSet<int> queries;
...
// start a query
int id1 = jdns_query(session, "msn.com", JDNS_RTYPE_A);
queries.insert(id1);
...
// cancel it
jdns_cancel_query(session, id1);
queries.remove(id1);
...
// start another query
int id2 = jdns_query(session, "jabber.org", JDNS_RTYPE_A);
queries.insert(id2);
...
// run the engine, long enough such that the query succeeds
jdns_step(session);

jdns_event_t *event = jdns_next_event(session);
// queries.contains(event->id) == true, and answer is for "jabber.org"

The idea here is that even if the second query were to use the same exact id value as the first query (id1 == id2), there would be no problem for the application to differentiate events between the two.  Why?  Because there wouldn’t be any events for the first query, since jdns_step() wasn’t called while the query was valid.

Finally, for any of this to work, the user wouldn’t be allowed to leave events lying around in the queue otherwise they might get mismatched with future queries.  It would basically rely on the user to not initiate any new queries between the call to jdns_step() and the last call to jdns_next_event().  That’s probably a fair restriction, and the app should always be able to know what events are legit.

That’s what I thought at first, anyway.  However I then discovered that JDNS caching doesn’t play by these rules. JDNS will cache answers for common queries, and there is at least one case where answer events might be queued up immediately at the call to jdns_query(). For example, if an answer for “msn.com” was available in the cache, then the jdns_query() call would immediately put that answer into the event queue.

// let's assume "msn.com" is cached
int id1 = jdns_query(session, "msn.com", JDNS_RTYPE_A);
// (there is now an event in the queue)
jdns_cancel_query(session, id1);
int id2 = jdns_query(session, "jabber.org", JDNS_RTYPE_A);
jdns_step(session);
jdns_event_t *event = jdns_next_event(session);
// event->id == id1, and answer is for "msn.com".
// this is a problem if id2 was allocated to the same value as id1

If the second call to jdns_query() happened to assign the same handle id as the first call, then the application could read the first event from the queue and associate it with the second query. Someone looking for jabber.org might end up at msn.com, and that would be a tragedy!

There are a couple of solutions to this problem.  One is that I could internally refrain from queuing the event until the next call to jdns_step(), even if the answer is available in the cache. In other words, jdns_query() shouldn’t be creating events. The other is that when jdns_cancel_query() is called, I could remove any related events from the event queue. Also, before you say it, I’m aware that I could just base the handle allocations on a strictly increasing integer (FWIW, this is actually what JDNS does anyway), but this is unsatisfying as a solution to the event mismatch problem. Theoretically, an event mismatch wouldn’t happen on a 32-bit counter unless you performed 4 billion queries without calling jdns_next_event(), which is incredibly impractical and would never happen, but… what can I say, I just don’t consider it a solution.

I went with the second option, where events are cleared from the queue when jdns_cancel_query() is called.

// let's assume "msn.com" is cached
int id1 = jdns_query(session, "msn.com", JDNS_RTYPE_A);
// (there is now an event in the queue)
jdns_cancel_query(session, id1);
// (event now removed!)
int id2 = jdns_query(session, "jabber.org", JDNS_RTYPE_A);
jdns_step(session);
jdns_event_t *event = jdns_next_event(session);
// event->id == id2, and answer is for "jabber.org".
// it doesn't matter if id2 was allocated to the same value as id1

This pretty much solves all problems.  It even lifts that restriction discussed earlier about how the user would have to read all events immediately following a jdns_step() call. You can start or cancel queries wherever you like. It just makes everything work as expected.

Despite JDNS being plain C and not Qt, this is still fundamentally a signal retraction problem (in fact, all of the Delta Object Rules are environment-neutral, despite my framing in a Qt/C++ context). The difficulty suggested by the title of this article is not necessarily about the implementation, as in this case it was rather easy (I just added a few lines to the jdns_cancel_query() function). No, here the difficulty was spotting the problem in the first place. Most signal retraction bugs are edge cases that are hard to replicate. How fortunate of me to get repeated crashes on something that was surely the result of a network race condition.

So, that fixes JDNS. In the next article of this series, I’ll show you how the next layer up, QJDns, managed to negate the fixes and reintroduce the problem again.

Comments

Introducing PsiMedia

PsiMedia Test screenshot

Voice (and video) chat is a feature we’ve wanted in Psi for a long time.  However, implementing voice/video chat is not straightforward, and this is partly due to all of the new concepts that have to be introduced into the application in order to make it happen.  Cameras, microphones, codecs, and RTP are all just very foreign to Psi.  The code necessary to handle a multimedia “stack” could easily exceed the amount of code in our own IM stack!  Fortunately, there are libraries out there to handle the task.

In 2004, we considered RealNetworks’ Helix framework.  For receiving content, we found this framework to be quite mature.  However, for transmitting content, it was clearly not designed for end-user desktop applications and was even GPL-incompatible in that scenario.  Quite some work went into the Psi+Helix effort, but ultimately it was abandoned.

In 2005, we considered Google’s libjingle.  We managed to get voice chat working with it, but the code never went beyond the experimental stage.  This was due to the limited platform support at the time (Linux audio only at first, though Remko managed to add in Mac audio support) and libjingle’s lack of maintenance.  Libjingle works as a black box, handling not only multimedia but also the Jingle protocol.  Unfortunately, this meant that as the Jingle protocol changed, libjingle fell out of spec.  We also felt it was a tad intrusive for libjingle to be handling XMPP stuff.

In 2006, we investigated GStreamer.  This framework has proved to be the most interesting thus far, for a number of reasons.  Unlike the limited libjingle black-box, GStreamer is a comprehensive and flexible multimedia framework, similar in nature to Helix.  It goes further than Helix though, by offering a better API for transmitting, by being GPL-compatible throughout, and by being easier to extend.  I feel confident we can accomplish everything we need with GStreamer.

Today there is Phonon, however it lacks input and transmission facilities at this time.  We will keep an eye on it for the future.  There is also Farsight, which integrates with GStreamer.  We may make use of Farsight, depending on our needs.

In any case, I’ve started a new “wrapper” project called PsiMedia.  The goal of PsiMedia is to offer an API designed for the purpose of adding voice and video chat to Psi or a Psi-like client.  All of the details the client does not care about will be hidden behind PsiMedia.  It solves only the multimedia aspects, and not Jingle/XMPP, as I consider these two problems to be orthogonal.  Currently PsiMedia wraps GStreamer, but the requirements are abstract enough that the client should not care what is actually wrapped.  PsiMedia can be considered the successor of the old “Media” module I started in 2004, to wrap Helix.

Below are the requirements of the system.

What PsiMedia does:

  • Tell you what audio and video devices are available.
  • Tell you what audio/video modes are possible (codecs, sample rates, video resolutions, etc).
  • Allow you to specify your desired modes, and the modes of the remote party, to arrive at a list if common modes.
  • Capture audio/video and encode as RTP into a series of QByteArrays.
  • Accept QByteArrays containing RTP, and playback any audio/video contained within.
  • Play back video in a QWidget.
  • Allow displaying video currently being captured (preview of yourself).
  • Volume controls.
  • Ability to separate the backend into a plugin, so that no new compile-time dependencies are introduced to Psi.

(RTP, by the way, is a standard packet format for transporting multimedia data in real-time.  It is used by SIP, Jingle, and, well, everybody.)

What PsiMedia does not do:

  • Use the network.
  • Implement Jingle or anything XMPP.
  • Expose anything more than very basic multimedia details.  There are no filters, no pipelines, etc.

In short, PsiMedia should make implementing voice/video chat in Psi straightforward.

Comments (16)

Signal Safety Revised

It has been almost four years since I first wrote about Signal Safety. The concept has held up well all this time, and I follow it (as well as the rest of the Delta Object Rules) in any new code I write. However, last year I discovered there is more to the problem, and given that the original article was written in the Qt 3 era I decided it was time for an update. The original article is even obsolete, now that Qt 4 restricts QObject::deleteLater() processing to the same event-loop depth. This article shall serve as a modern replacement for the original article.

The Problem

Most method calls in a program run in a self-contained or “black box” fashion. That is, a method is given a set of inputs, and, upon completion, that method provides some expected outputs. The state of the universe during the method call is most often undefined. This goes for plain C functions as well as for C++ class member functions.

Suppose you are calling QString::truncate(int), which shortens a QString to some length. Now, suppose you could somehow interrupt the truncate() function while it is in progress, and proceed to call QString::length() on the same object. What value would you expect length() to return? Would it be the length before truncation? After truncation? Some value in between? Maybe the program would simply crash, which is certainly within its rights. After all, QString makes no claim about the integrity of its internal data structures while the truncate() method is being executed. The object state is really none of your business until the method completes, at which point QString is ready to take on your next method call.

Of course, this should all go without saying. Unless the author of an object claims some degree of thread-safety or such, the assumption is that you can’t poke around at an object while it is busy. However, this becomes much more complicated with the introduction of callbacks. With a callback, an object or method makes a call “back” to you. Qt signals are a form of callbacks. You connect an object’s signal to one of your slot methods, and now that object will call your slot methods. With callbacks, the million dollar question is: What is the state of the object that is invoking the callback, during the callback? That is, are any claims made about the integrity of the object’s internal data structures? What methods are you allowed to call on this object during the callback?

In the world of Qt, the answer is most often: “Who knows?” Signals are emitted wherever the author finds most convenient to do so. No consideration is made about the state of the object during these emits. This means that most objects are in an undefined state during signal emission. Calling a method on an object during that same object’s signal emit is the same in concept as interrupting QString::truncate() and calling length() on that same string. In both cases, a method is randomly interrupted, and you are accessing the object in a state that was likely not considered by the author.

A real-world example I mentioned in one of my other articles is a bug/problem I found in QTcpSocket some months back (note: it could be corrected by now, but I have not checked). On failure, this object emits the error() signal. During this signal, calling QTcpSocket::read() will result in a crash. Note that calling read() on a failed QTcpSocket object is absolutely allowed. This is because QTcpSocket buffers incoming data, and you are allowed to read this data at any time, even after the underlying TCP connection is closed. However, QTcpSocket is in an inconsistent state when the error() signal is emitted. It is likely in a state of mid-cleanup, and not expecting any method calls. Once QTcpSocket has finished what it is doing and returned to the event loop, you can safely call read(). So who is at fault here? Is it me for assuming I can call read() when I’m really not supposed to? Or is it Qt, for having a bug that prevents read() from working in a situation that it clearly should have?

The core of the problem is that nothing is written in a stone tablet saying what we are supposed to do here. If I had to describe the current state of things, it would be that all methods are implied to be callable during a signal, minus the destructor. Yet, code is generally not written to match this implicit contract. Sure, the code works most of the time, but it is only sheer luck, and sooner or later we get a problem like the above QTcpSocket issue.

The Solution

In my old article, I proposed that you should be allowed to call an object’s destructor during its signals. In this article, I expand the definition to include all methods of the object.

Imagine an object that doesn’t use callbacks. You may call members of this object:

foo.init();
foo.addData(someData);
foo.sort();
foo.reset();

After each of these member calls, foo returns to a ‘neutral’ state whereby you are free to call any other method. This is within reason, of course. Maybe there’s one member that you aren’t allowed to call unless you make some other member call first. That’s all fine. It is still the neutral state.

Essentially, I propose that an object should be in the neutral state during a signal emit. Here’s a rather contrived sample class:

class TextHolder : public QObject
{
    Q_OBJECT
private:
    QString *curText;
    bool modified;

public:
    TextHolder()
    {
        curText = new QString("");
        modified = false;
    }

    ~TextHolder()
    {
        delete curText;
    }

    QString text() const
    {
        return *curText;
    }

    void setText(const QString &text)
    {
        delete curText; // delete the text we were currently holding
        emit textChanged(text); // announce the new text we received
        curText = new QString(text); // take this new text
        modified = true; // flag that this object was modified
    }

signals:
    void textChanged(const QString &newValue);
};

Alright, so in the object’s neutral state, you are allowed to call text(), because curText is always non-NULL and valid in that state. We can see here that during the textChanged callback, curText is invalid. Therefore, this code violates signal safety. Calling text() during the callback is dangerous. Not only does this object need to be in the neutral state before and after the call to setText, it needs to be in the neutral state during textChanged. Let’s fix it by ensuring curText is atomically updated:

void TextHolder::setText(const QString &text)
{
    emit textChanged(text);
    delete curText;
    curText = new QString(text);
    modified = true;
}

We could also do it before the signal:

void TextHolder::setText(const QString &text)
{
    delete curText;
    curText = new QString(text);
    emit textChanged(text);
    modified = true;
}

Whether curText is modified before or after the textChanged() signal depends on whatever you think makes more sense for the class. The point is that calling text() during the textChanged() signal is made legal, because it would have been legal to call it while not in the signal.

There is another issue to address though, which is to make the destructor callable. Right now, if the TextHolder object is deleting during the textChanged() signal, then the application will probably crash when the callback completes and there is an attempt to access the ‘modified’ bool which no longer exists. Let’s fix that:

void TextHolder::setText(const QString &text)
{
    delete curText;
    curText = new QString(text);
    modified = true;
    emit textChanged(text);
}

Now whenever you are able to call methods on TextHolder, you can be sure it is in the neutral state. There is one catch, though: you cannot call setText() from within the callback, else you get infinite recursion. Oh well, at least that one should be expected.

Objects on a Signaling Course

It is not uncommon for an object to need to emit multiple signals in one pass of the event loop. For example, suppose you have an object that receives lines of text over the network, and emits a signal for each line of text received:

void LineParser::receiveData(const char *data)
{
    QStringList lines;
    ...
    // [ parse data into lines ]
    ...
    foreach(const QString &line, lines)
        emit lineReceived(line);
}

Now, suppose there is a function called LineParser::reset(), which clears all buffers and initializes the LineParser object back to what it would have been at original construction. What happens when you do the following?

void MyApp::slot_lineReceived(const QString &line)
{
    if(!lineUnderstood(line))
    {
        lineParser->reset();
        ...
        return;
    }
    ...
}

Guess what? MyApp could still get fed a bunch of lines it doesn’t care about. We can solve this by adding a flag in LineParser that accounts for whether or not the object is in a reset state. Let’s say there’s a bool called isReset, that is set to true on construction and also after a call to reset(). We can then rework the receiveData() function:

void LineParser::receiveData(const char *data)
{
    QStringList lines;
    ...
    // [ parse data into lines ]
    ...
    foreach(const QString &line, lines)
    {
        emit lineReceived(line);
        if(isReset)
            break;
    }
}

Now LineParser can be reset in the middle of the line signaling, living up to what MyApp had expected. Of course, we still can’t delete the LineParser object in the lineReceived() signal. Let’s fix that too:

void LineParser::receiveData(const char *data)
{
    QStringList lines;
    ...
    // [ parse data into lines ]
    ...
    QPointer<QObject> self = this;
    foreach(const QString &line, lines)
    {
        emit lineReceived(line);
        if(!self)
            break;
        if(isReset)
            break;
    }
}

Yep, it’s that same old trick, just like the stuff with Qt 3’s QGuardedPtr in the old article.

The Case for Safe Deletion

At its core, QObject supports being deleted while signaling. In order to take advantage of it you just have to be sure your object does not try to access any of its own members after being deleted. However, most Qt objects are not written this way, and there is no requirement for them to be either. For this reason, it is recommended that you always call deleteLater() if you need to delete an object during one of its signals, because odds are that ‘delete’ will just crash the application.

Unfortunately, it is not always clear when you should use ‘delete’ vs deleteLater(). That is, it is not always obvious that a code path has a signal source. Often, you might have a block of code that uses ‘delete’ on some objects that is safe today, but at some point in the future this same block of code ends up getting invoked from a signal source and now suddenly your application is crashing. The only general solution to this problem is to use deleteLater() all the time, even if at a glance it seems unnecessary.

I commonly offer a reset() function in my objects. However, I have to be careful about the code path at the time reset() is called. For example:

class NetworkObject
{
    Q_OBJECT
private:
    QTcpSocket *sock;

public:
    NetworkObject() : sock(0) { }
    ~NetworkObject()
    {
        reset();
    }

    void reset()
    {
        delete sock;
        sock = 0;
    }

signals:
    void error();

private slots:
    void sock_error(QAbstractSocket::Error)
    {
        reset();
        emit error();
    }
};

With the above object, the reset() function might be called in three different places: 1) directly by the user when the object is in a neutral state, 2) when the object destructs, and 3) when the socket encounters an error. The above class is broken, of course, because it deletes QTcpSocket while inside the QTcpSocket::error() signal. How can we solve this problem? Well, we could rework sock_error() like this:

void NetworkObject::sock_error(QAbstractSocket::Error)
{
    // the disconnect statement is needed to prevent signals between now and actual delete
    sock->disconnect(this);
    sock->deleteLater();
    sock = 0;
    emit error();
}

However, this sort of violates the DRY (Don’t Repeat Yourself) principle. Right now our reset() function is small, but if it starts growing large then there will be a lot more to repeat. Let’s rework the reset() function instead:

void NetworkObject::reset()
{
    sock->disconnect(this);
    sock->deleteLater();
    sock = 0;
}

There, now we only reset in one place. Unfortunately, in two out of the three situations reset() could be called, only one of them actually needs the deleteLater(). This seems wasteful. What can we do? How about this:

class NetworkObject
{
    Q_OBJECT
private:
    QTcpSocket *sock;

public:
    NetworkObject() : sock(0) { }
    ~NetworkObject()
    {
        reset();
    }

    void reset(bool delLater = false)
    {
        if(delLater)
        {
            sock->disconnect(this);
            sock->deleteLater();
        }
        else
            delete sock;
        sock = 0;
    }
    ...
signals:
    void error();

private slots:
    void sock_error(QAbstractSocket::Error)
    {
        reset(true);
        emit error();
    }
};

There, now we get exactly what we want, although I wonder if this approach may become risky over time. Perhaps it is best to have reset() always do deleteLater(), just in case the object gets so complex that it is unclear what value you need to pass to reset. It also gets more complex if there are other QObjects to delete. If ‘delLater’ does deleteLater on every object, then that’s not optimized. It should only use deleteLater() on the one object that matters. Maybe reset could be rewritten like this:

void NetworkObject::reset(QObject *delLater)
{
    ...
    // [ delete all objects, except for whichever one is delLater ]
    delLater->disconnect(this);
    delLater->deleteLater();
    ...
}

As you can see, there’s a lot of hassle that can go into ensuring correct usage of deleteLater() vs ‘delete’.

A simpler approach is to just use deleteLater() all the time. Unfortunately, this is just not standard practice among Qt developers. They will still try to guess which one they should use based on the observable code path, thinking that deleteLater() is unoptimized and should only be used as a last resort. It is error-prone to guess here, because you can’t always predict how that code path will be invoked. My advice is to use deleteLater() every time, unless you know you are using an object that can be deleted in its signals (in which case you can ‘delete’ every time for that object).

You are doing a great favor by designing your object to be deleted while in a signal. If your object can handle a reset()-style call while it is “on a signal course”, then you’ve already done the heavy lifting. At that point, handling a destructor is not much extra effort.

Conclusion

We have a problem today, where we are mucking with objects while they are in an undefined state. At least with ‘delete’ we can choose to use deleteLater() instead. But what about the problem of calling QTcpSocket::read() while in the error() signal? Call readLater()? There is no such thing!

I suppose to be absolutely safe, we should avoid making any call to an object during one of its signals. In my opinion, this is simply not practical with the way all of us write Qt code today. A better approach is to fix the implementation of our objects, so that they behave the way our users (and ourselves!) had always expected them to behave.

Comments

QCA and Smartcards

In public key cryptography, it is critical to keep your private key, well, private. One of the major features of QCA 2.0 is support for smartcard devices, and these devices allow convenient and safe storage of private keys. Smartcards perform cryptographic operations in hardware, never revealing the keys to the requesting computer. Traditionally, smartcards have a “credit card” form-factor and require a reader device, but today there also exist smartcards which have the form-factor of a USB mass storage device (they are not “cards” at all). The term “smartcard” can apply to either kind of device, as can “smart token” or simply “token”.

Since QCA has a pluggable backend system, changing out software algorithms for hardware calls is easily possible by writing a proper plugin. However, there is much more to using smartcards than simply substituting a private key call. Smartcards may not always be present in the computer, and may contain many stored keys. QCA offers complete smartcard support, allowing you to discover available smartcards, track inserts and removals, enumerate keys on the devices and prompt for PINs, all using a friendly Qt-based API.

The QCA distribution contains an example application called CMS Signer. It exists mainly to demonstrate using keys from smartcards in a GUI application, although it also works with on-disk keys. CMS, or Cryptographic Message Syntax, is a standard format for storing various types of secure payloads. In the case of the CMS Signer application, the CMS format is used to store digital signature data created from X.509 certificates (fun fact: S/MIME email consists of CMS-formatted parts wrapped in MIME). It is possible to configure and use smartcards completely within the CMS Signer GUI, and I’ll demonstrate this below.

First, we need a smartcard. For this article, I’m using an ePass2000 USB device.

I don’t necessarily recommend it, as I’ve not managed to get it to work under non-Windows operating systems. However, it is cute and it works, and the Windows drivers are available online for free (I find this very convenient, as I’ve managed to misplace the drivers to some of my other smartcard devices…). Most important of all though, is that it supports the PKCS#11 standard. This means the device should be compatible with all other software also supporting the PKCS#11 standard, such as Firefox and QCA.

In CMS Signer, there is a place to perform PKCS#11 configuration.

In here, we must set the path to the PKCS#11 DLL file (on other platforms this may be a .so or .dylib file). For the ePass2000, this is C:\WINDOWS\system32\ep2pk11.dll. There are also some other low-level options you can set, but generally the defaults should be fine. The PKCS#11 provider plugin for QCA was written by Alon Bar-Lev, and it makes use of his pkcs11-helper library, which is where most of these options come from.

It is unfortunate that this configuration step is needed at all, but there is currently no known or standard way to locate a user’s desired PKCS#11 DLL files. You would have to perform a very similar configuration step in Firefox as well.

Now that the drivers are configured, it should be possible to browse for our smartcard.

We can then choose a key:

The key is then imported into a local keyring. It is important to keep in mind that only the public portion of the key is imported. The private portion is not extractable from the smartcard. We can select this key to be used for signing, and enter some data to sign:

… and hit the Sign button, which prompts for a PIN:

… and now the CMS-format signature is generated:

CMS Signer can also perform signature verification, as long as the Text and Signature fields are filled. Since they are filled at this time, we can immediately verify the signature after signing.

Looks good. :)

Both Psi 0.11 and KDE 4 use QCA. In the future, expect to see smartcard support in these projects.

Comments (2)

Beyond Signal Safety

I’d like to expand on the topic of Signal Safety (or SS, part of the Delta Object Rules).  SS states that an object should be safe to delete within a slot connected to one of its signals.  However, this definition should be expanded to include other object operations besides deleting.  For example, suppose an object has a reset() function which brings it back to a default state (I tend to write objects with such a function).  It should be safe to call reset() as the result of a signal.

I’ve known about this variant of SS for awhile, but I’ve never written about it until now.  It comes to my mind tonight because I’m seeing a couple of possible problem spots in QCA during my final code cleaning, and I recall just recently fixing a problem like this in Psi.  Revision 731 of the Psi SVN states: “We can’t call read() while in a QTcpSocket error slot.”  Now that’s an odd one, isn’t it?  Calling read() doesn’t seem so harmful.  It’s not like we’re deleting the object or doing something overly destructive.  Trolltech’s position is that you should not assume an object can be deleted during a signal, and that you should use deleteLater() to be safe.  However, what can their position be for this problem?  Call readLater instead of read?  There is no such thing as readLater.  It seems the only sane answers are that Trolltech should either make it possible to call read() during the error() signal, or they should document that you cannot make that call at that time.

Like the Signal Retraction (SR) problem, SS of non-delete methods borders on theoretical. I say that because I think most Qt-based code is absolutely infested with the problem, but we manage to get by anyway. Sure, it may crop up on rare occasions (the QTcpSocket one), but for the most part we simply get lucky and avoid it.  Like SR, it is also a royal pain to fix in some cases, such as when you emit two signals in a row:

emit sig1();
emit sig2();

Every so often I may use QPointer (or QGuardedPtr in Qt 3) to solve the problem like this:

QPointer<QObject*> self = this;
emit sig1();
if(!self)
    return; // return or otherwise get the heck out of the call stack
emit sig2();

Okay, classical SS solved now. But what happens if instead of deleting the object during sig1(), the user calls some other method? Depending on the state changes caused by that method, it may not be appropriate to emit sig2() anymore. According to my expanded definition of SS, we may still have a problem here. Especially if that method is a reset() function, this may introduce an SR problem (sig2() being emitted when the object is expected to be in an inactive state).

There’s always the queued approach:

QMetaObject::invokeMethod(this, "sig1", Qt::QueuedConnection);
QMetaObject::invokeMethod(this, "sig2", Qt::QueuedConnection);

Now SS is solved, but you’re going to have SR problems.

Lately I’m becoming more fond of queued signals, because they solve all SS and DS (Delayed Signals) problems. However, they introduce an SR mess. One solution to that is to bounce your signals off of an intermediate “relay” QObject. You queue your signals to the relay object, and the relay object emits direct. This way you get the queued effect, but you can destroy the relay object at any time to retract your signals.

Comments

Parallel builds with qmake

So you’d like to run make -j4 on your qmake project (or make -j3, make -j2, hi google). Yes, it is possible, but like many things qmake, it is not documented. As usual, the best way to learn about qmake is to see how Qt itself uses it. Actually, I recommend Qt’s own build files as a primary source for qmake information anyway, as if you know where something ought to be in a project then it is faster to find it than going through the formal manual.

In a SUBDIRS project, you may have something like this:

SUBDIRS += library unittest examples

Change it to this:

# depend on variables, not actual directories
SUBDIRS += sub_library sub_unittest sub_examples

# specify the directories and dependencies
sub_library.subdir = library
sub_unittest.subdir = unittest
sub_unittest.depends = sub_library
sub_examples.subdir = examples
sub_examples.depends = sub_library

Have fun.

Comments (1)

« Previous entries

Bad Behavior has blocked 224 access attempts in the last 7 days.