Event mul­ti­plex­er

Applications can be involved with many I/O through many routes, as defined by the various subsystems they interact with. A subsystem can create a handle for any I/O resource in use by an application, and give it to the application for future reference. The subsystem can then send an application event to the application to indicate that some work can be done with the resource, or that some work has been completed.

An application uses the event multiplexer subsystem to simultaneously await one or more events from one or more subsystems. For each resource, it asks its subsystem to specify an associated event handle, then gathers these up and passes them to the event multiplexer, which then blocks until at least one resource is available.

This API is designed with these goals in mind:

Application-event handles

The event-handle type is the common system data type used for building interoperable subsystem interfaces. It consists of two words, the first being the subsystem identifier, and the second being opaque outside the owning subsystem. The application must use both of these when talking to the event multiplexer, but in other contexts, only the opaque word is necessary, as the identity of the subsystem is implied.

typedef struct {
  unsigned subsysid;
  unsigned opaque;
} _evhandle;

Question: Why are we using event handles, instead of file descriptors?

Answer:

File descriptors might constrain us to certain patterns we could use on top of them, and such patterns might be less suitable for time-critical applications than others.

And it should still be possible to emulate any of select, pselect, poll or epoll on top anyway.


Question: Aren't other interfaces, like select, poll or MsgWaitOnMultipleObjectsEx, adequate?

Answer:

Firstly, I don't like the idea of the application having to look up its own data, when the kernel has already probably had to do some look-up, and has done it more efficiently. When you call select, you have to either go through all your structures and test to see if their corresponding descriptors are in the resultant set, or you have to do bit tricks with the fd_set to find the members, and then map the descriptor to your own structure.

Secondly, these calls require you to put all your data together each time you make the system call. Most do have the advantage that you only make one system call per cycle, and I've tried to preserve that here — but you only have to pass on changes in this interface.

The interface presented here should be able to support any of the others (i.e. they can be emulated above), while still being efficient and flexible.



Interactions with the application

An application can wait on multiple event handles. It does this by first asking the event subsystem to create an internal interest set of application-event handles, which the application subsequently refers to by a 1-word set reference. It can subsequently add/remove event handles to/from the interest set, and ask the event multiplexer to block until one of those events has occurred. These two operations are done atomically through a single SWI, Event​Manager_​Parallel​Wait. When events are offered back, they are all of the same priority level.

Different threads of an application can wait on different interest sets — the application can create as many sets as it needs. Each thread can can only wait directly on one set at a time. However, a set is itself a watchable resource, so it can be added to another set.

An application must associate its own opaque handle, the user reference, with each event handle that it adds to a interest set. It is the return of this opaque handle that indicates to the thread that the event has occurred.


Interactions with the subsystem

If the resource is released while still in the set, the subsystem must notify the event multiplexer. Most importantly, the subsystem must notify the event multiplexer when the event has actually occurred, and what priority to set it at, or to rescind the event before it is supplied to the application — both of these notifications are simply a change of priority applied to the event handle, with a special null priority meaning that the event is not to be offered to the application. The subsystem may offer an API to allow the applied priority to be set.

An event may be placed in an event group. It will not be offered to the application with events of other groups. Most events will operate in group 0. WIMP events will operate in group 1.

When an event handle becomes part of a thread's interest set, the event multiplexer notifies the owning subsystem. If the handle is dropped from the set, the subsystem is similarly notified. It is similarly notified if the handle is offered back through Event​Manager_​Parallel​Wait.


Support for the reactor and proactor patterns

Suppose that a networking subsystem defines a ?socket? resource, and then certain processes may invoke operations on certain sockets. One operation should be to obtain the ?read handle? of the socket (which exists for the lifetime of the socket).

int sock; // a socket descriptor
_evhandle h = _get_socket_handle(sock, SH_READ);
_set_prio(h, 4); // subsystem-independent

The process then has the right to wait on that handle.

// application-dependent
void user_action(void *);
struct ctxt user_data;

// The handle will be added to interest set on next yield,
// and remains there until the application cancels it or quits.
ref = _watch_handle(h, user_action, &user_data);

The example subsystem may allow the event handle and its associated resource to operate according to one or more patterns, such as ?reactor? and ?proactor?. If it supports more than one, it must provide operations to select the pattern or mode.

In the reactor pattern, when data arrives on the socket, the handle is triggered, and the wait call can return. The application's function therefore looks something like this:

void user_action(void *user_data)
{
  // The handle has been triggered.

  struct ctxt *c = user_data;
  int done = recv(c->sock, buf, max_len);

  // The handle is back in the waiting state.
}

In the proactor pattern, instead of using a separate ?read? call to transfer data from kernel space to user space, the subsystem receives a user-space structure identifying the buffer space to be used directly. The application and the kernel update this structure as data is added to or removed from the buffer.

struct _sockbuf_state state;
_set_socket_proactor_state(sock, SH_READ, &state, sizeof state);

(This operation could also serve as the switch between reactor and proactor modes.)

The application's handler function doesn't perform a read operation any more — that has already been done. It just examines the structure to see how much data the kernel has put in, processes it, and alters the structure to indicate how much was used:

void user_action(void *user_data)
{
  struct ctxt *c = user_data;
  // check c->state
  // process data
  // update c->state
}

Question: Should applications, or subsystems, manage handles?

Answer:

Short answer: subsystems should manage handles.

If applications manage their own handles, we have to consider what happens when a process forks (since the process owns the handle). And there must be a subsystem operation to bind the handle to a resource. Code would look something like this:

int sock = socket(...);
_evhandle h = _create_handle();
_set_prio(h, 4);
_set_socket_handle(sock, h, SH_READ);

Question: What should happen when a process forks?

Answer:

One option is to invalidate all handles in the child process, but this means the library managing them has to be explicitly told that a fork has taken place.

Another option is to share the handles. Both processes have to release a handle for it to be destroyed; indeed, it's probably better for the subsystems to manage handles in that case (but applications still set priorities). We'd have to allow a single handle to be blocked on by two threads, with only one of them arbitrarily being released.

A third option is to keep handles scoped within a process. The event manager will map them to global, internal handles. This might look identical to option 2, from the point of view of the applications.


If subsystems manage (i.e. create and destroy) handles for their own resources, they can ensure that a handle is destroyed exactly when it is not needed anymore, e.g. when a socket is closed. There still has to be a mechanism for the application to set priorities. Code would look something like this:

int sock = socket(...);
_evhandle h = _get_socket_handle(sock, SH_READ);
_set_prio(h, 4);

Library support for events

Suppose that the application retains its own information about handles being watched. This could include a pointer to associated application data, a function to act on that data if the handle is triggered, and the handle itself. This collection of information could be maintained and accessed through the following functions:

struct _watchrec *_create_watch(_evhandle h,
                                void (*func)(void *), void *ctxt);
_evhandle _destroy_watch(struct _watchrec *);
void _invoke_watch(struct _watchrec *); // call (*func)(ctxt)

A struct _watchrec * pointer would be used as the opaque reference submitted to the event subsystem, to be returned when an event is triggered.

The application also maintains an array of event handles to be newly registered for watching, or to be deregistered. Assuming no more than 20 need to be registered or deregistered at a time (i.e. between each waiting call):

struct _handle_assoc {
  _evhandle sys_h;
  void *usr_ref;
} ev_changes[20];
int num_ev_changes;

Two functions could then be provided to extend the set:

struct _watchrec *_watch_handle(_evhandle h,
                                void (*func)(void *), void *ctxt)
{
  // Choose an unused _handle_assoc structure.
  int idx = num_ev_changes++;

  // Add func and ctxt to a maintained structure.
  struct _watchrec *ref = _create_watch(h, func, ctxt);

  // ...and make usr_ref point to it.
  ev_changes[idx].sys_h = h;
  ev_changes[idx].usr_ref = ref;

  return ref;
}

void _ignore_handle(struct _watchrec *ref)
{
  // Obtain the handle, destroy the application structure,
  // and record the handle in the array.
  ev_changes[num_ev_changes++] = {
    .sys_h = _destroy_watch(ref),
    .usr_ref = 0
  };
}

The arrays are then ready to be submitted to Event​Manager_​Parallel​Wait call:

void _yield(void)
{
  // We will handle at most 40 handles being triggered at once.
  void *usr_ref[40];

  // Pass all ev_changes to kernel.
  int rc;
  _swix(Event​Manager_​Parallel​Wait, _INR(0,3)|_OUT(0),
        sizeof usr_ref / sizeof usr_ref[0], usr_ref,
        num_ev_changes, ev_changes,
        &rc);
  
  // Clear ev_changes array, since we've
  // now dealt with them.
  num_ev_changes = 0;

  // Go through usr_ref[0] to usr_ref[rc - 1], invoking the
  // associated user functions.
  for (int i = 0; i < rc; i++)
    _invoke_watch(usr_ref[i]);
}

Event-multiplexer Interface

The event multiplexer provides a SWI operation that accepts a set of event handles, and then blocks the caller until one or more events have occurred, or until after a specified time.

Question: Should the module hold state per thread, allowing the application to alter it through multiple calls? Or should the application keep track of what it is interested in, and then pass all data to the module in a single call?

Answer:

A single call would be better, since communication must be through SWIs, which are expensive.

Ah, here's the solution: Store state per thread in the module, but use a single SWI which can register a set of handles, deregister another set, and watch the current set all in one go. The support library within the application can do the collation of handles to be added or removed, then submit the latest changes to the SWI in one go.


A SWI is also provided for use by other subsystems to obtain several SVC subroutines to allocate and deallocate handles, adjust their priorities, wait on single events, and signal that an event has occurred.

#include <kernel/events.h>

typedef unsigned _evhandle;
struct _evpair {
  _evhandle sys_h;
  void *usr_ref;
};
Event-multiplexer routines
Type Routine
SWI operation Event​Manager_​Link
SWI operation Event​Manager_​Open​Set
SWI operation Event​Manager_​Close​Set
SWI operation Event​Manager_​Get​Handle
SWI operation Event​Manager_​Parallel​Wait
SWI operation Event​Manager_​Set​Priority
SVC subroutine Event​Manager_​Resource​Closed
SVC subroutine Event​Manager_​Set​Priority​SVC

SWI​ ​Event​Manager_​Open​Set​ ​​ ​

Create a new interest set

On exit

  • R0=set reference

Description

A new interest set is created. The returned reference is only valid within the process. If the process forks, the set is also duplicated. Use Event​Manager_​Parallel​Wait to add application-event handles to it, or remove them. Use Event​Manager_​Close​Set to discard the set. It will also be destroyed autmatically when the process terminates.

SWI​ ​Event​Manager_​Close​Set ​ ​​ ​

Destroy a interest set

On entry

  • R0=set reference

Description

Destroy an existing interest set.

SWI​ ​Event​Manager_​Get​Handle ​ ​​ ​

Get the application-event handle for a interest set

On entry

  • R0=set reference
  • R1=address to write handle

Description

This obtains the event handle of a interest set so that it can be watched itself in another interest set.

Question: What is this for?

Answer:

This is mainly to support the epoll interface, which allows an epoll descriptor to become a watched entry in another such descriptor.


SWI​ ​Event​Manager_​Parallel​Wait ​ ​​ ​

(De-)register events and wait for existing and newly registered events

On entry

  • R0=set reference
  • R1=maximum number of output events
  • R2=address of array of R1 uninitialized words
  • R3=number of event-handle changes
  • R4=address of array of R3 struct _evpair pointers

On exit

  • R1=number of events that have occurred
  • R2[0]…R2[R1-1]=the values of the user references for the events that have occurred

Description

The application's calling thread wishes to block while awaiting multiple events. The handles of the events to be monitored are kept in a set identified by R0 and held by the subsystem, and this set is changed by the array of R3 two-word entries starting at R4.

The first word of each entry in the R4 array is an event handle. The second word is an application-specific value (user reference) which the application could use to (say) point to the context associated with the event handle in the first word. If this is null, this and subsequent calls will no longer monitor the event; any other value adds the event handle to the monitored set. These changes occur just before the call begins monitoring.

[Edit:

It might be convenient to have a 64-bit user reference, in order to directly support the epoll interface. But it's not necessary — the user reference could simply point to the user data in the epoll wrapper. Ah, in fact, that's what we should do, since we attach significance to the user reference being null.

]

When the call returns, a selection of events will have occurred, and their corresponding application-specific values will appear in the R2 array, R1 now giving the number of entries actually filled. The application can then read through the array to trigger its own subroutines to handle those events. All returned events will have the same priority, and no other triggered events that haven't been listed will have a higher priority (but some at the same level may be missing, if the array has been filled). Handles for all returned events have been removed from the interest set.

Question: Do we have to provide a time here?

Answer:

No — we could have a timing subsystem, whose resources are specific times, and whose handles are triggered when those times are reached. It would also make this SWI independent of any specific timing format (absolute vs relative; struct timeval vs monotonic). It would also allow times to have specific priorities, if that makes any sense. (I think it might do!)


SVC subroutine​ ​Event​Manager_​Set​Priority​SVC ​ ​​ ​

Notify of change of priority of event handle

On entry

Description

This allows a subsystem to inform the event manager that a resource has become available, so a thread waiting on it can unblock.

This call should only be made between receiving an Event​User_​Switch​On and an Event​User_​Switch​Off or Event​User_​Offered for the specified priority reference.

The priority can be changed with a subsequent call, if it has not yet been offered to, or dropped by, the application. It may also be cancelled by specifying the null priority, -1, which is the default after a Event​User_​Switch​On.

The event group is primarily intended to allow the WIMP to stop its event from appearing at the same time as others (since it implies exclusive access to the screen) without having to allocate a unique priority to all WIMP events.

SVC subroutine​ ​Event​Manager_​Resource​Closed ​ ​​ ​

Notify of resource being closed

On entry

Description

This allows a subsystem to inform the event manager that the application has closed a resource.

This call should only be made between receiving a Event​User_​Switch​On and a Event​User_​Switch​Off or Event​User_​Offered for the specified priority reference.

In spite of the existance of this call, an application should inform the event manager before closing the resource, in case the resource handle gets re-used, and is mistaken for the old resource.


Event-user Interface

A subsystem that has blocking resources, and intends to allow applications to monitor them in parallel with resources belonging to other subsystems, must provide the following SVC routines to the event manager by a call to Event​Manager_​Link.

Event-user routines
Type Routine
SVC subroutine Event​User_​Switch​On
SVC subroutine Event​User_​Switch​Off
SVC subroutine Event​User_​Offered

SVC subroutine​ ​Event​User_​Switch​On​ ​​ ​

Report that a thread is waiting on a resource

On entry

  • R0=opaque part of subsystem handle
  • R1=priority reference
  • R2=zero

On exit

  • R2=non-zero if the resource does not exist

Description

This indicates that a thread has called Event​Manager_​Parallel​Wait on a interest set that has just included (perhaps indirectly, through other sets) the specified resource. Until the next call to Event​User_​Switch​Off, the subsystem may call Event​Manager_​Set​Priority​SVC to indicate whether the thread can unblock.

The priority reference should be retained by the subsystem and used when calling the event manager back.

By changing R2 before returning, the subsystem can report that the resource does not exist, e.g. if the application has closed it without informing the event manager directly.

SVC subroutine​ ​Event​User_​Switch​Off​ ​​ ​

Report that a thread is no longer waiting on a resource

On entry

  • R0=opaque part of subsystem handle

Description

This indicates that a thread has removed a resource from a interest set.

SVC subroutine​ ​Event​User_​Offered​ ​​ ​

Report that a thread has been given a resource

On entry

  • R0=opaque part of subsystem handle

Description

This indicates that a thread has returned from Event​Manager_​Parallel​Wait on a interest set that includes (perhaps indirectly, through other sets) the specified resource.