| Author |
Message |
|
|
Post subject: pidin hangs at a specific process
Posted: Feb 19, 2008 - 12:31 AM
|
|
New Member
Joined: Feb 18, 2008
Posts: 5
|
|
Hi !
I have a problem with a process (my_process) which after running it for a few hours the node begins to get slow.
My process is not consumming CPU (hogs) and the only thing I can see is that when I run 'pidin' or 'pidin mem', this utility hangs exactly when it's going to show information about my_process.
My process have 3 threads. One of them (main) is a resource manager with only io_read and io-write programmed.
The problem is progressive. After 4 hours pidin hangs for a moment but then it frees. After one or two days pidin hangs there forever and the node is almost unnoperable.
When I kill this process (it takes almost a minute to kill), everything returns to normal state.
May be a programming problem of my resource manager ?
When pidin executes without arguments, it sends some message to my resource manager ?
Thanks for your help !
Fabio.
[I'm using QNX6.3.0 SP3 with CorePath 6.3.2a.] |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 19, 2008 - 03:22 PM
|
|
Senior Member
Joined: Mar 10, 2004
Posts: 512
|
|
Fabio,
If you are not using CPU then it sounds like you are leaking something else.
Maybe file descriptors, timers, memory, threads etc.
Can you get the 'sin' command to execute? That provides a quick way to check memory use, open files etc. You can also get that from pidin but the command line args are a little more archaic.
Tim |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 20, 2008 - 12:08 AM
|
|
QNX Master
Joined: Jun 25, 2003
Posts: 974
|
|
| This sounds very familiar. My guess is that you are not closing fd's properly. Some system table fills up and gets larger and larger until accessing it takes a noticably long time. You might want to take a close look at your close code, and compare it to the documentation examples. |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 21, 2008 - 03:14 PM
|
|
New Member
Joined: Feb 18, 2008
Posts: 5
|
|
Thanks for the answer.
Tim: at this moment I have the node in that situation and I executed a 'sin' command and a 'sin fds' commands and it didn't hang up at my proccess
maschoen: Thanks for the advice. I'm going to check my code looking for that (fds close).
I programmed a few tests codes today and I detect that when I execute the following function, it hangs. It's the same function that hangs the 'pidin' command.
devctl(fd, DCMD_PROC_TIDSTATUS, &status, sizeof status, 0)
with fd being
fd = open ('/proc/pid_my_process/as', O_RDONLY)
The node is very slow by now.
Can you suggest any other test for this problem ??
Thanks.
Fabio. |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 21, 2008 - 07:25 PM
|
|
QNX Master
Joined: Jul 11, 2002
Posts: 557
|
|
[quote="FabioG"
devctl(fd, DCMD_PROC_TIDSTATUS, &status, sizeof status, 0)
with fd being
fd = open ('/proc/pid_my_process/as', O_RDONLY)
The node is very slow by now.
Can you suggest any other test for this problem ??
Thanks.
Fabio.[/quote]
Sounds like you have a thread creating other threads, and it is creating too many of them... |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 21, 2008 - 09:53 PM
|
|
New Member
Joined: Feb 18, 2008
Posts: 5
|
|
As the problem is progressive, now my node is slow but after a couple of minutes it shows the information of pidin (it hangs for a while in my process and then it goes on).
It has only 3 threads and pidin only show that threads.
It has only 6 file descriptor opened (one of them is a TCP/IP connection with other qnx node)
Information on TID 1 and TID 2 is shown relatively quick, then it hangs waiting for the procnto resmgr reply with status information of TID 3
- Is there any way to check that system table that might be large (mentioned by maschoen) ?
- I have made other tests now and I concluded that really the node isn't slow all the time. For example, when I do several 'ls /tmp' commands quickly it works fine, but when I execute some other tasks with procnto, like 'pidin' o my 'test process' with devctl function, it gets slow and the 'ls /tmp' command now returns its output after almost half a minute.
Do you think that it might be a general scheduling problem generated by my_process ? |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 21, 2008 - 10:15 PM
|
|
Senior Member
Joined: Mar 10, 2004
Posts: 512
|
|
|
FabioG wrote:
Do you think that it might be a general scheduling problem generated by my_process ?
You already mentioned that Hogs shows your process consuming no CPU (I assume that means <2%) when the node gets slow. If that's the case it's not a scheduling problem.
What is the 3rd thread that's causing the pidin command to hang doing?
What I would suggest you do is open 1 terminal and run Hogs at a high priority (like 20 or anything higher than your process) with an update rate of every second.
In the other terminal, you can run the ls /tmp command a few times and see what the result is in hogs. Then run the pidin command and watch what hogs reports. It will be interesting to see if hogs reports a lot of CPU being used when the node is slow.
Also, I assume you have already checked that your process isn't consuming large amounts of Ram or disk space (not open files, but instead 1 giant file) or creating lots and lots of temporary files.
One other thing to check in your code (I don't think this info is available via pidin). But you should make sure you are not leaking channels (created via ChannelCreate()).
Tim |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 25, 2008 - 07:15 PM
|
|
New Member
Joined: Feb 18, 2008
Posts: 5
|
|
Thanks Tim.
The 3rd thread is writing periodically some information to a MySQL database via ODBC (tcp/ip). I've checked that code and it looks ok.
On the other hand, I ran a hogs with higher priority and nobody is consuming CPU on that node when I run pidin or ls.
I have checked for RAM, disk space, etc and everything is fine.
Recently, I ran IDE System Analysis Tool (via qconn) on that node and all information is fine, except by this:
My process have a signal pending (signal #57) only in my 3rd thread.
All other process have no signals pending.
I've checked on other working nodes and it looks like all process that have some kind of tcp/ip connection have this signal pending.
Do you know what is means ?
Might it be a clue for finding out the solution to my problem ?
Thanks.
Fabio. |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 26, 2008 - 03:16 PM
|
|
Senior Member
Joined: Mar 10, 2004
Posts: 512
|
|
Fabio,
Looking in signal.h it says signals starting at 57 and above belong to the kernal. So I suspect that signal you see is from the pidin command/Momentics IDE.
It would be interesting to comment out the actual ODBC code that goes over tcp (including opening/closing sockets) and see if that makes any difference in terms of getting rid of the slowness. I'm wondering if your 3rd thread is leaking sockets (which are file descriptors) on open/close if you open/close each time you update (vs open once and then write periodically).
Tim |
|
|
| |
|
|
|
 |
|
|
Post subject:
Posted: Feb 26, 2008 - 03:52 PM
|
|
New Member
Joined: Feb 18, 2008
Posts: 5
|
|
Recently, I've tested my process running database server in my local node.
It was a way to check if problem is network related.
There is no difference: node is getting slow and pidin hangs at my process again and then, after a seconds, it goes on.
Thanks Tim. I'm going to comment out my actual ODBC code for testing.
Only thing I'm wondering is: if I'm leaking sockets (fds).. shouldn't be shown by pidin fds, IDE analisys, sin fds or other related utilities ?
Regards
Fabio. |
|
|
| |
|
|
|
 |
|
|