<< Prev  |  TOC  |  Front Page  |  Talkback  |  FAQ  |  Next >>
LINUX GAZETTE
...making Linux just a little more fun!
select() on Message Queue
By Hyouck "Hawk" Kim

Introduction

When using message queue with socket or any other file descriptor based unix facilities, the most inconvenient thing is message queue does not support select() system call. So usually unix programmers solve the I/O multiplexing issue in a simple but ugly way like

while(1)

{

    select on socket with timeout;

    ...

    wait on a message queue with IPC_NOWAIT

}

Certainly, the above implementation is ugly. I don't like it. Another solution might be adopt multi-threading. But here in this article, I want to show you a funny approach, that is, implementing a new system call called msgqToFd(). I'm not trying to provide you with full-fledged, bug-free kernel implementation. I just want to present you my experiment. This article might be interesting to readers who like to play with GNU/Linux kernel source.


msgqToFd() - A new non-standard system call

Here is its signature.

int msgqToFd(int msgq_id)

It returns a file descriptor corresponding to a message queue , which can be used with select().

If any error happens, it returns -1.

An application can use the call like

      ...

q_fd = msgqToFd(msgq_id);

while(1)

{

      FD_ZERO(&rset);

      FD_SET(0, &rset);

      FD_SET(q_fd, &rset);

      select(q_fd + 1, &rset, NULL, NULL, NULL);

      if(FD_ISSET(0, &rset))

      {

           ...

      }

      if(FD_ISSET(q_fd, &rset))

      {

           r = msgrcv(msgq_id, &msg, sizeof(msg.buffer), 0, 0);

           ...

      }

}


How select() works

A file descriptor is associated with a file structure. In the file structure, there is a set of operations supported by this file type called file_operations. In the file_operations structure, there is an entry named poll. What the generic select() call does is call this poll() function to get status of a file (or socket or whatever) as the name suggests.

In general, the select() works like

while(1)

{

     for each file descriptor in the set

     {

         call file's poll() to get mask.

         if(mask & can_read or mask & can_write or mask & exception)

         {

             set bit for this fd that this file is readable/writable or there is an
             exception.

             retval++;

         }

     }

     if(retval != 0)

         break;

     schedule_timeout(__timeout);

}

For detailed implementation of select(), please take a look at sys_select() and do_select() in fs/select.c. of standard kernel source code.

Another thing required to understand is poll_wait(). What it does is put current process into a wait queue provided by each kernel facilities such as file or pipe or socket or in our case, message queue.

Please note that the current process may wait on several wait queues by calling select()


long sys_msgqToFd(long msqid)

The system call should return a file descriptor corresponding to a message queue.  The file descriptor should point to a file structure which contains file_operations for message queue.

To do that, sys_msgqToFd() does

  1. with msqid, locate the corresponding struct msg_queue

  2. allocate a new inode by calling get_msgq_inode()

  3. allocate a new file descriptor with get_unused_fd()

  4. allocate a new file structure with get_empty_filp()

  5. initialize inode, file structure

  6. set file's file_operations with msgq_file_ops

  7. set file's private_data with msq->q_perm.key

  8. install fd and file structure with fd_install()

  9. return the new fd

Please take a look at msg.c and the accompanying msg.h provided with this article. See also sys_i386.c

msgq_poll()

msgq_poll() implementation is pretty simple.

What it does is

  1. With file->private_data, which is a key for a message queue, locate the corresponding message queue

  2. put current process into the message queue's wait queue by calling poll_wait()

  3. if the message queue is empty (msq->q_qnum == 0), set mask as writable( this may cause some arguments but let's forget this for now). If not, set mask as readable

  4. return the mask


Modification of existing message queue source code

To support poll() on a message queue, we need to modify existing message queue source code.

The modification includes

  1. adding a wait queue head to struct msg_queue, which will be used to put a process into for select(). Also the wait queue head should be initialized when a message queue is created. Please take a look at struct msg_queue and newque() in msg.c.

  2. Whenever a new message is inserted to a message queue, a process waiting on the message queue( by calling select()) should be awaken. Take a look at sys_msgsnd() in msg.c.

  3. When a message queue is removed or it's properties are changed, all the processes waiting on the message queue(by calling select()) should be awaken. Take a look at sys_msgctl() and freeque() in msg.c.

  4. To allocate a new inode and file structure, we need to set up some file system related 

  5. s for VFS to operate properly. For this purpose, we need additional initialization code to register a new file system and set something up. Take a look at msg_init() in msg.c.

All the changes are "ifdef"ed with MSGQ_POLL_SUPPORT. So it should be easy to identify the changes.


File System Related Stuff

To allocate a file structure, we need to set up the file's f_vfsmnt and f_dentry properly. Otherwise you'll see some OOPS messages printed our on your console.  For VFS to work correctly with this new file structure, we need some additional setup, which is already explained briefly.

Since we support only poll() for the file_operations,  we don't have to care about every detail of the file system setup code. All we need is a properly set up f_dentry and f_vfsmnt. Most of the related code is copied from pipe.c.


Adding a new system call

To add a new system call, there two things need to be done.

The first step is add a new system call in kernel level, which we already did (sys_msgqToFd()).
In the GNU/Linux kernel, all system V IPC related calls are dispatched through sys_ipc() in arch/i386/kernel/sys_i386.c. sys_ipc() uses call number to identify a specific system call requested. To dispatch the new system call properly, we have to define a new call number(which is 25) for sys_msgqToFd() and modify sys_ipc() to call sys_msgqToFd(). Just for your reference, please take a look at  arch/i386/kernel/entry.S in the standard kernel source and sys_ipc() in sys_i386.c provided with this article.

The second step is add a stub function for user level application. Actually all the system call stub functions are provided by GLIBC. And to add a new system call, you have to modify the GLIBC and build your own and install it. Oh hell, NO THANKS!!!. I don't want to do that and I don't want you to do that either. To solve the problem, I did some copy and paste from GLIBC. If you look at user/syscall_stuff.c provided with this article, there is a function named msgqToFd(), which is the stub for msgqToFd() system call.

What it does is simply

return INLINE_SYSCALL(ipc, 5, 25, key, 0, 0, NULL);

Here is a brief description for the macro.

ipc  :  system call number for sys_ipc(). ipc is expanded as __NR_ipc, which is 117.
5    :  number of arguments for this macro.
25   :  call number for sys_msgqToFd()
key  :  an argument to sys_msgqToFd()

INLINE_SYSCALL sets up the arguments property and invokes interrupt 0x80 to switch to kernel mode to invoke a system call.

Conclusion

I'm not so sure about practical usability of this modification. I just wanted to see whether this kind of modification was possible or not.

Besides that, I want to talk about a few issues needed to be addressed.

  1.  If two or more threads or processes are accessing a message queue and one process is waiting on the message queue with msgrcv() and another is waiting with select(), then always the former process/thread will receive the new message. Take a look at pipelined_send() in msg.c.

  2. For writability test, msgq_poll() sets the mask as writable only if the message queue is empty. Actually we can set the mask as writable if a message queue is not full and there will be no big difference. But I chose the implementation for simplicity.

  3. Let's think about this scenario.

    1. A queue is created
    2. A file descriptor for the queue is created
    3. The queue is removed

    In this kind of case, what should be do? A correct solution would be close the fd when the queue is removed. But this is impossible since a message queue can be removed by any process which has a right to do that. This means a process removing the message queue may not have a file descriptor associated with the message queue even if the message queue is mapped to a file descriptor by some other process.

    Additionally, if the same queue (with the same key) is created again, the mapping will be still maintained.

  4. Efficiency problem. All the processes waiting on the wait queue by calling select() will be awaken when there is a new message. Eventually only one process will receive the message and all the other processes will go to sleep again.

  5. No support for message type. Regardless of message type, if there is any message, the select() will return.

Bugs and Improvements

DIY :-)

Source Code

msg.c                   Modified message queue implementation
msg.h                   Header file for message queue
sys_i386.c            Modified for the new system call
user/Makefile           Makefile to build test program (rename from Makefile.txt to Makefile)
user/syscall_stuff.c    Stub function for msgqToFd()
user/msg_test.h        Header for msgqToFd()
user/msgq.c             Test program source
user/msgq2.c            Another test program 

I used GNU/Linux kernel 2-4-20 on x86 for this experiment.
To build a new kernel with this modification, I suggest you should copy

msg.c            to    ipc/msg.c
msg.h            to    include/linux/msg.h
sys_i386.c     to    arch/i386/kernel/sys_i386.c

and build and install it!!!!

Before running the test programs, please be sure to make key files:

touch .msgq_key1
touch .msgq_key2

 


Copyright © 2003, Hyouck "Hawk" Kim. Copying license http://www.linuxgazette.net/copying.html
Published in Issue 92 of Linux Gazette, July 2003

<< Prev  |  TOC  |  Front Page  |  Talkback  |  FAQ  |  Next >>