Skip navigation.
Home
The QNX Community Portal

View topic - Multicore Thread Performance Measurement with Timestamps

Multicore Thread Performance Measurement with Timestamps

anything that doesn't fit to other groups.

Multicore Thread Performance Measurement with Timestamps

Postby strasserfj » Mon Aug 11, 2014 1:46 pm

Hello guys,

I want to benchmark the performance of two threads communicating with each other, each pinned to its own core. Specifically I want to benchmark the work as well as interaction/communication delays of these threads and have decided to use the PMU (performance monitoring unit) values like the cycle counter register and some other event counters.

My problem is now that the two PMUs aren't in sync and I can't compare the recorded values for each thread after the benchmark is done.

Is there a way to synchronize the PMUs?

What i found is the ClockCycles command. In the QNX documentation it is stated that these command might use a free running timer located directly on a core. In my case I think this command uses the global timer of the system, but I am not exactly sure.

How can i find out if the source of ClockCycles is the global timer? (Where to search in the BSP?)

Or can anyone recommend a better approach?

My environment:
Freescale Sabre Lite Board http://boundarydevices.com/products/sab ... -imx6-sbc/
QNX 6.6.0
QNX 6.6.0 BSP for Sabre Lite http://community.qnx.com/sf/sfmain/do/v ... ojects.bsp

Thank in advance!
strasserfj
New Member
 
Posts: 5
Joined: Tue Jul 22, 2014 12:18 pm

Re: Multicore Thread Performance Measurement with Timestamps

Postby maschoen » Mon Aug 11, 2014 8:01 pm

ClockCycles() returns a 64 bit processor cycle timer. There is a global system variable that can be used to convert intervals measured with this routine into seconds. I've only used it on x86 so I don't know if this is implemented properly on your processor, but a simple experiment should make this clear.
maschoen
QNX Master
 
Posts: 2644
Joined: Wed Jun 25, 2003 5:18 pm

Re: Multicore Thread Performance Measurement with Timestamps

Postby denkelly » Mon Aug 18, 2014 11:54 am

>>>ClockCycles ... use(s) a free running timer located directly on a core
This is correct - you can't use the ClockCycles() return value on different cores. Typically, to do these measurements you set "affinity" to a single core for all threads so they are using the same timer.

Another option would be to attempt to determine the approximate "delta" between timers on various cores. The delta would be constant until reboot. (I don't have a scheme for doing that...)
denkelly
Senior Member
 
Posts: 161
Joined: Sat Aug 02, 2008 3:27 pm

Re: Multicore Thread Performance Measurement with Timestamps

Postby mario » Tue Aug 19, 2014 6:29 pm

denkelly wrote:>>>ClockCycles ... use(s) a free running timer located directly on a core
This is correct - you can't use the ClockCycles() return value on different cores.


Depends on the model of CPU, check the doc, on the latest x86 all of the core RDTSC are synchronised and will even take into account things such as power down of cores, or slowing down to conserver power.

The RDTSC has two mode of operations, one is REAL core clock count, and the other is time keeping irrelavent or core behavior or configuration. I beleive QNX uses the lastest.
mario
QNX Master
 
Posts: 4132
Joined: Sun Sep 01, 2002 1:04 am

Re: Multicore Thread Performance Measurement with Timestamps

Postby strasserfj » Wed Aug 20, 2014 12:42 pm

At first, thank you for your replies and suggestions.

According to maschoens recommendation, I did some experiments to see if QNX uses the global timer of my CPU. Therefore I memory mapped the global timer and tried to read the low value some times. As expected by denkelly, the result was, that this value stays zero all the time which leads me to the conclusion that QNX does not uses the global timer of my CPU at all.

After doing some more research i configured the global timer to be free running and wrote some code to read the lower 32 bit of the timer. This is enough for me to do my work. So I can say my question is answered.

An additional benefit of using this timer is, that it runs at much higher frequency than CLockCycles which results in more precise measurements. The global timer runs at CLK/2, ClockCycles only at CLK/12.

Maybe it helps someone, here is my code:
Code: Select all
#define PERIPH_BASE_ADDR 0x00A00000
#define GLOBAL_TIMER_BASE_ADDR PERIPH_BASE_ADDR +0x0200
#define GLOBAL_TIMER_SIZE 0x02FF

volatile unsigned int addr;

inline uint32_t getGlobalTime32(){
   uint32_t value;
        __asm__ __volatile__("ldr %0, [%1]\n\t"\
              : "=&r"(value)
                  : "r" (addr)
              : "memory");
   return value;
}

inline void resetGlobalTimer(){
    out32(addr,0);
    out32(addr+4,0);
}

void configureGlobalTimer(){

   ThreadCtl(_NTO_TCTL_IO, 0);
   void *gtimer_addr = NULL;
   gtimer_addr = mmap_device_memory(0, GLOBAL_TIMER_SIZE, PROT_READ | PROT_WRITE | PROT_NOCACHE, 0, GLOBAL_TIMER_BASE_ADDR);
   if (gtimer_addr == MAP_FAILED) {
      perror("mmap_device_memory for physical address failed");
      exit(EXIT_FAILURE);
   }
   addr = (uint32_t)gtimer_addr;
// set timer value to 0
      out32(addr,0);
      out32(addr+4,0);

      // enable global timer to be free running, no IRQ no compare value
      out32(addr +8, 1);


strasserfj
New Member
 
Posts: 5
Joined: Tue Jul 22, 2014 12:18 pm

Re: Multicore Thread Performance Measurement with Timestamps

Postby denkelly » Thu Aug 21, 2014 12:13 am

This post is for i.mx6 so x86 is irrelevant :)
denkelly
Senior Member
 
Posts: 161
Joined: Sat Aug 02, 2008 3:27 pm


Return to General Programming

Who is online

Users browsing this forum: No registered users and 1 guest