通过哪个函数可以检测io-net进程异常退出了?

中文QNX 讨论: 欢迎大家灌水,讨论有关QNX 问题。

通过哪个函数可以检测io-net进程异常退出了?

帖子QNX master » 周日 5月 17, 2009 4:09 pm

我的系统有时会出现io-net退出的情况,我想通过一个守护进程重启它
QNX master
Senior Member
 
帖子: 852
注册: 周五 9月 12, 2003 4:24 am

帖子xtang » 周四 5月 21, 2009 10:39 am

xtang
Moderator
 
帖子: 1815
注册: 周五 9月 12, 2003 4:14 am
地址: China

帖子QNX master » 周四 5月 21, 2009 1:36 pm

xtang,这个链接上不去呀
QNX master
Senior Member
 
帖子: 852
注册: 周五 9月 12, 2003 4:24 am

帖子nakeyfish » 周四 5月 21, 2009 1:53 pm

Daddy, the network is down…
Posted May 12, 2008
Filed under: QNX |
The gateway of my home network, is not one of those “broadband router”. Instead, it’s an old Pentium 200Hz machine in my basement, running, of cause, QNX. Why am I doing this? I think it’s one of those “because I can” thing. Since I compiled my own TCPIP stack, I can really know every detail of the packets in and out of my gateway. Another reason is, of cause, I like to “live on the edge”.

Yes, it’s really “bleeding edge”, though a lot of benifit and fun of running the HEAD branch stack, one of the disadvantage is, while in it’s early stage, the stack “crashes”. The good thing is I have the core dump I could look at, but the bad thing is, that’s also when my kids started shutting at me.

Those of you who had been managed a home network, would really understand how stressful this is. :) Fortunatly, soon my kids find out the “engineering way” to fix the problem. They went down to the basement, press the little reset button on the old Pentinum, give it a couple of minutes, and wola, everything comes back.

This works well for a while, but one day while I was at home along, the stack on gateway gone again. I have to get out of my comfort couch, went down to the basement and reset it myself. I said to myself, “why can’t I just write a program to resetart the network if it’s crashed”, after all, QNX is all about Micro Kernel and Modular System, isn’t it?

That’s where my “sockmon” program cames from. Once started, it will keeps on monitoring if TCPIP stack is still running, if it disappered, “sockmon” will try to execute a shell script you gave it on command line, to re-start the network. If the restart somehow failed after some try, then it will just reboot the system.

You may wonder “how do you know if TCPIP stack is there or not”? Well, QNX resource manager have builtin notification to all connected clients if their server disappeared. So all you need is to establish a connection to the tcpip stack (by call socket()), and setup to waitfor the notification events.

I have include the source here, the “notification” thing above is true for ALL resource manager (unless the manager is written in such way that turned off this feature), so you can easilly extended my program to any resource manager. Just give it a config file to read about which resource manager (what namespace you care) to watch out, and what to do (which script to execute) if the manager went away.

I will leave this for reader exercise, but if you did that, you would realiz you just got yourself a simple, basic, HA program.

-xtang


代码: 全选
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <process.h>
#include <signal.h>
#include <string.h>
#include <syslog.h>
#include <sys/procmgr.h>
#include <sys/sysmgr.h>
#include <sys/neutrino.h>
#include <sys/socket.h>

int main(int argc, char **argv)
{
        char *script;
        int sd, chid, fcount;
        struct _pulse pulse;

        if (argc < 2) {
                fprintf(stderr, "sockmon <re-start script>\n");
                return -1;
        }

        script = argv[1];
        if (access(script, X_OK) != 0) {
                fprintf(stderr, "access(’%s’): %s\n", script, strerror(errno));
                return -1;
        }

        /* creat a channel for accept COIDDEATH pulse */
        if ((chid = ChannelCreate(_NTO_CHF_COID_DISCONNECT)) == -1) {
                perror("ChannelCreate");
                return -1;
        }

        /* don’t care about the child */
        signal(SIGCHLD, SIG_IGN);

#ifdef NDEBUG
        if (procmgr_daemon(0, PROCMGR_DAEMON_NOCLOSE) == -1) {
                perror("procmgr_daemon");
                return -1;
        }
#endif

        openlog("sockmon", LOG_PID, LOG_DAEMON);
        setlogmask(LOG_UPTO(LOG_INFO));

        for (;;)
        {
                fcount = 0;

                /* connect to tcpip to monitoring, give it 30 seconds, if still can’t
                 * connect, reboot the system
                 */
                while ((sd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) {
                        if (++fcount >= 6) {
                                syslog(LOG_ERR, "Can’t connect to socket after 3 minutes, reboot..."
);
                                spawnl(P_NOWAIT, "/bin/slay", "slay", "-f", "syslogd", NULL);
                                sleep(1);
                                sysmgr_reboot();
                                return 0;
                        }
                        sleep(5);
                        syslog(LOG_INFO, "Connect to Socket failed: %m");
                }

                syslog(LOG_INFO, "Connected to Network, start monitoring...");
                if (MsgReceivePulse(chid, &pulse, sizeof(pulse), NULL) == -1) {
                        syslog(LOG_ERR, "MsgReceivePulse(): %m");
                        return -1;
                }

                if (pulse.code != _PULSE_CODE_COIDDEATH) {
                        syslog(LOG_ERR, "MsgReceivePulse(): %m");
                        return -1;
                }

                if (pulse.value.sival_int != sd) {
                        syslog(LOG_ERR, "COIDDEATH pulse for %d\n", pulse.value);
                        continue;
                }

                syslog(LOG_INFO, "Network gone, restarting...");
                spawnl(P_WAIT, "/bin/ksh", "/bin/ksh", script, NULL);
        }

        return 0;
}
nakeyfish
Senior Member
 
帖子: 375
注册: 周五 8月 06, 2004 4:12 pm
地址: BJ

帖子QNX master » 周三 5月 27, 2009 11:41 am

xtang,请问这个程序的argv[1]参数是什么,代入io-net则出现memory fault
QNX master
Senior Member
 
帖子: 852
注册: 周五 9月 12, 2003 4:24 am

帖子xtang » 周三 5月 27, 2009 8:29 pm

QNX master 写道:xtang,请问这个程序的argv[1]参数是什么,代入io-net则出现memory fault


是个Shell Script。

Once started, it will keeps on monitoring if TCPIP stack is still running, if it disappeared, “sockmon” will try to execute a shell script you gave it on command line, to re-start the network.
xtang
Moderator
 
帖子: 1815
注册: 周五 9月 12, 2003 4:14 am
地址: China

帖子QNX master » 周日 5月 31, 2009 11:26 am

这个脚本写了些什么内容呢?
QNX master
Senior Member
 
帖子: 852
注册: 周五 9月 12, 2003 4:24 am

帖子xtang » 周一 6月 01, 2009 9:34 pm

QNX master 写道:这个脚本写了些什么内容呢?


“执行这个脚本以重启io-net”.

slay -f io-net inetd dhcp.client
io-net -d xxxx -p xxxx
waitfor /dev/socket/1
dhcp.client
inetd
xtang
Moderator
 
帖子: 1815
注册: 周五 9月 12, 2003 4:14 am
地址: China


回到 四海同心QNX论坛

在线用户

正在浏览此版面的用户:没有注册用户 和 3 位游客

cron