OpenQNX :: The QNX Community Portal

May 13, 2008 - 12:14 PM
Google
  Web openqnx.com   
     Create an account Home · Submit News · QNX Forums · QNX Download · Search   
_
Main Menu
Who's Online
There are 60 unlogged users and 1 registered user online.

You can log-in or register for a user account here.

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
FabioG
Post subject: pidin hangs at a specific process  PostPosted: Feb 19, 2008 - 12:31 AM
New Member


Joined: Feb 18, 2008
Posts: 5

Hi !

I have a problem with a process (my_process) which after running it for a few hours the node begins to get slow.
My process is not consumming CPU (hogs) and the only thing I can see is that when I run 'pidin' or 'pidin mem', this utility hangs exactly when it's going to show information about my_process.

My process have 3 threads. One of them (main) is a resource manager with only io_read and io-write programmed.

The problem is progressive. After 4 hours pidin hangs for a moment but then it frees. After one or two days pidin hangs there forever and the node is almost unnoperable.
When I kill this process (it takes almost a minute to kill), everything returns to normal state.

May be a programming problem of my resource manager ?
When pidin executes without arguments, it sends some message to my resource manager ?


Thanks for your help !
Fabio.

[I'm using QNX6.3.0 SP3 with CorePath 6.3.2a.]
 
 View user's profile Send private message  
Reply with quote Back to top
Tim
Post subject:   PostPosted: Feb 19, 2008 - 03:22 PM
Senior Member


Joined: Mar 10, 2004
Posts: 512

Fabio,

If you are not using CPU then it sounds like you are leaking something else.

Maybe file descriptors, timers, memory, threads etc.

Can you get the 'sin' command to execute? That provides a quick way to check memory use, open files etc. You can also get that from pidin but the command line args are a little more archaic.

Tim
 
 View user's profile Send private message Visit poster's website  
Reply with quote Back to top
maschoen
Post subject:   PostPosted: Feb 20, 2008 - 12:08 AM
QNX Master


Joined: Jun 25, 2003
Posts: 974

This sounds very familiar. My guess is that you are not closing fd's properly. Some system table fills up and gets larger and larger until accessing it takes a noticably long time. You might want to take a close look at your close code, and compare it to the documentation examples.
 
 View user's profile Send private message Send e-mail Visit poster's website  
Reply with quote Back to top
FabioG
Post subject:   PostPosted: Feb 21, 2008 - 03:14 PM
New Member


Joined: Feb 18, 2008
Posts: 5

Thanks for the answer.

Tim: at this moment I have the node in that situation and I executed a 'sin' command and a 'sin fds' commands and it didn't hang up at my proccess

maschoen: Thanks for the advice. I'm going to check my code looking for that (fds close).

I programmed a few tests codes today and I detect that when I execute the following function, it hangs. It's the same function that hangs the 'pidin' command.

devctl(fd, DCMD_PROC_TIDSTATUS, &status, sizeof status, 0)

with fd being

fd = open ('/proc/pid_my_process/as', O_RDONLY)

The node is very slow by now.

Can you suggest any other test for this problem ??

Thanks.
Fabio.
 
 View user's profile Send private message  
Reply with quote Back to top
rgallen
Post subject:   PostPosted: Feb 21, 2008 - 07:25 PM
QNX Master


Joined: Jul 11, 2002
Posts: 557

[quote="FabioG"
devctl(fd, DCMD_PROC_TIDSTATUS, &status, sizeof status, 0)

with fd being

fd = open ('/proc/pid_my_process/as', O_RDONLY)

The node is very slow by now.

Can you suggest any other test for this problem ??

Thanks.
Fabio.[/quote]

Sounds like you have a thread creating other threads, and it is creating too many of them...
 
 View user's profile Send private message Visit poster's website  
Reply with quote Back to top
FabioG
Post subject:   PostPosted: Feb 21, 2008 - 09:53 PM
New Member


Joined: Feb 18, 2008
Posts: 5

As the problem is progressive, now my node is slow but after a couple of minutes it shows the information of pidin (it hangs for a while in my process and then it goes on).

It has only 3 threads and pidin only show that threads.
It has only 6 file descriptor opened (one of them is a TCP/IP connection with other qnx node)

Information on TID 1 and TID 2 is shown relatively quick, then it hangs waiting for the procnto resmgr reply with status information of TID 3

- Is there any way to check that system table that might be large (mentioned by maschoen) ?

- I have made other tests now and I concluded that really the node isn't slow all the time. For example, when I do several 'ls /tmp' commands quickly it works fine, but when I execute some other tasks with procnto, like 'pidin' o my 'test process' with devctl function, it gets slow and the 'ls /tmp' command now returns its output after almost half a minute.

Do you think that it might be a general scheduling problem generated by my_process ?
 
 View user's profile Send private message  
Reply with quote Back to top
Tim
Post subject:   PostPosted: Feb 21, 2008 - 10:15 PM
Senior Member


Joined: Mar 10, 2004
Posts: 512

FabioG wrote:

Do you think that it might be a general scheduling problem generated by my_process ?


You already mentioned that Hogs shows your process consuming no CPU (I assume that means <2%) when the node gets slow. If that's the case it's not a scheduling problem.

What is the 3rd thread that's causing the pidin command to hang doing?

What I would suggest you do is open 1 terminal and run Hogs at a high priority (like 20 or anything higher than your process) with an update rate of every second.

In the other terminal, you can run the ls /tmp command a few times and see what the result is in hogs. Then run the pidin command and watch what hogs reports. It will be interesting to see if hogs reports a lot of CPU being used when the node is slow.

Also, I assume you have already checked that your process isn't consuming large amounts of Ram or disk space (not open files, but instead 1 giant file) or creating lots and lots of temporary files.

One other thing to check in your code (I don't think this info is available via pidin). But you should make sure you are not leaking channels (created via ChannelCreate()).

Tim
 
 View user's profile Send private message Visit poster's website  
Reply with quote Back to top
FabioG
Post subject:   PostPosted: Feb 25, 2008 - 07:15 PM
New Member


Joined: Feb 18, 2008
Posts: 5

Thanks Tim.

The 3rd thread is writing periodically some information to a MySQL database via ODBC (tcp/ip). I've checked that code and it looks ok.

On the other hand, I ran a hogs with higher priority and nobody is consuming CPU on that node when I run pidin or ls.
I have checked for RAM, disk space, etc and everything is fine.

Recently, I ran IDE System Analysis Tool (via qconn) on that node and all information is fine, except by this:

My process have a signal pending (signal #57) only in my 3rd thread.
All other process have no signals pending.
I've checked on other working nodes and it looks like all process that have some kind of tcp/ip connection have this signal pending.

Do you know what is means ?
Might it be a clue for finding out the solution to my problem ?

Thanks.
Fabio.
 
 View user's profile Send private message  
Reply with quote Back to top
Tim
Post subject:   PostPosted: Feb 26, 2008 - 03:16 PM
Senior Member


Joined: Mar 10, 2004
Posts: 512

Fabio,

Looking in signal.h it says signals starting at 57 and above belong to the kernal. So I suspect that signal you see is from the pidin command/Momentics IDE.

It would be interesting to comment out the actual ODBC code that goes over tcp (including opening/closing sockets) and see if that makes any difference in terms of getting rid of the slowness. I'm wondering if your 3rd thread is leaking sockets (which are file descriptors) on open/close if you open/close each time you update (vs open once and then write periodically).

Tim
 
 View user's profile Send private message Visit poster's website  
Reply with quote Back to top
FabioG
Post subject:   PostPosted: Feb 26, 2008 - 03:52 PM
New Member


Joined: Feb 18, 2008
Posts: 5

Recently, I've tested my process running database server in my local node.
It was a way to check if problem is network related.
There is no difference: node is getting slow and pidin hangs at my process again and then, after a seconds, it goes on.

Thanks Tim. I'm going to comment out my actual ODBC code for testing.

Only thing I'm wondering is: if I'm leaking sockets (fds).. shouldn't be shown by pidin fds, IDE analisys, sin fds or other related utilities ?

Regards
Fabio.
 
 View user's profile Send private message  
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2007 The PNphpBB Group
Credits
All logos and trademarks in this site are property of their respective owners. The comments are property of their posters.
Powered by OpenQNX: The QNX Community Portal Site
QNX and the QNX logo are registered trademarks of QNX Software Systems.