[Not an issue] PasMP just helped FPC get the top Binary Trees score on BenchmarkGames! #11

Closed
opened 2019-03-09 02:09:06 +00:00 by Akira13641 · 10 comments
Akira13641 commented 2019-03-09 02:09:06 +00:00 (Migrated from github.com)
[Just thought I'd link you to this.](https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html)
BeRo1985 commented 2019-03-09 02:51:08 +00:00 (Migrated from github.com)

Hm, with TData(Data^)[FromIndex] do begin should be for Index := FromIndex To ToIndex do with TData(Data^)[Index] do begin :-)

Hm, `with TData(Data^)[FromIndex] do begin` should be `for Index := FromIndex To ToIndex do with TData(Data^)[Index] do begin` :-)
Akira13641 commented 2019-03-09 15:12:32 +00:00 (Migrated from github.com)

That was one thing that has confused me actually... when I've used WriteLn to examine the values of FromIndex and ToIndex before, it seems like they are always the exact same number, meaning the for loop would not do anything.

How does that work, exactly?

That was one thing that has confused me actually... when I've used `WriteLn` to examine the values of `FromIndex` and `ToIndex` before, it seems like they are _always_ the exact same number, meaning the for loop would not do anything. How does that work, exactly?
BeRo1985 commented 2019-03-09 15:18:03 +00:00 (Migrated from github.com)

ParallelFor spilts the whole range in multiple FromIndex..ToIndex sub-ranges parallel-recursively, so you must do for CurrentWorkIndex:=FromIndex to ToIndex do begin ... end; in any case, even when FromIndex is (most) always the same value like ToIndex on your own system, but it can different on other systems (for example for higher CPU core counts) or even on different system loads on your current system. Otherwise the benchmark implementation is not valid.

ParallelFor spilts the whole range in multiple FromIndex..ToIndex sub-ranges parallel-recursively, so you must do `for CurrentWorkIndex:=FromIndex to ToIndex do begin ... end;` in any case, even when FromIndex is (most) always the same value like ToIndex on your own system, but it can different on other systems (for example for higher CPU core counts) or even on different system loads on your current system. Otherwise the benchmark implementation is not valid.
Akira13641 commented 2019-03-09 15:34:07 +00:00 (Migrated from github.com)

Would this be the correct way to write it, then:

class procedure TNode.DoTrees(const Job: PPasMPJob;
                              const ThreadIndex: TPasMPInt32;
                              const Data: Pointer;
                              const FromIndex, ToIndex: TPasMPNativeInt);
var
  Index: TPasMPNativeInt;
  Inner: TPasMPInt32;
  IPool: TMemPool;
begin
  for Index := FromIndex to ToIndex do with TData(Data^)[Index] do begin
    Check := 0;
    Depth := MinDepth + FromIndex * 2;
    Iterations := 1 shl (MaxDepth - FromIndex * 2);
    IPool := TMemPool.Create(SizeOf(TNode));
    for Inner := 1 to Iterations do begin
      Check += CheckNode(MakeTree(Depth, IPool));
      IPool.Clear();
    end;
    IPool.Free()
  end;
end;
Would this be the correct way to write it, then: class procedure TNode.DoTrees(const Job: PPasMPJob; const ThreadIndex: TPasMPInt32; const Data: Pointer; const FromIndex, ToIndex: TPasMPNativeInt); var Index: TPasMPNativeInt; Inner: TPasMPInt32; IPool: TMemPool; begin for Index := FromIndex to ToIndex do with TData(Data^)[Index] do begin Check := 0; Depth := MinDepth + FromIndex * 2; Iterations := 1 shl (MaxDepth - FromIndex * 2); IPool := TMemPool.Create(SizeOf(TNode)); for Inner := 1 to Iterations do begin Check += CheckNode(MakeTree(Depth, IPool)); IPool.Clear(); end; IPool.Free() end; end;
BeRo1985 commented 2019-03-09 15:37:02 +00:00 (Migrated from github.com)

Rather

class procedure TNode.DoTrees(const Job: PPasMPJob;
                              const ThreadIndex: TPasMPInt32;
                              const Data: Pointer;
                              const FromIndex, ToIndex: TPasMPNativeInt);
var
  Index: TPasMPNativeInt;
  Inner: TPasMPInt32;
  IPool: TMemPool;
begin
  for Index := FromIndex to ToIndex do with TData(Data^)[Index] do begin
    Check := 0;
    Depth := MinDepth + Index * 2;
    Iterations := 1 shl (MaxDepth - Index * 2);
    IPool := TMemPool.Create(SizeOf(TNode));
    for Inner := 1 to Iterations do begin
      Check += CheckNode(MakeTree(Depth, IPool));
      IPool.Clear();
    end;
  end;
  IPool.Free();
end;

where all FromIndex are replaced with Index, and not just the first appearance of FromIndex.

Rather ``` class procedure TNode.DoTrees(const Job: PPasMPJob; const ThreadIndex: TPasMPInt32; const Data: Pointer; const FromIndex, ToIndex: TPasMPNativeInt); var Index: TPasMPNativeInt; Inner: TPasMPInt32; IPool: TMemPool; begin for Index := FromIndex to ToIndex do with TData(Data^)[Index] do begin Check := 0; Depth := MinDepth + Index * 2; Iterations := 1 shl (MaxDepth - Index * 2); IPool := TMemPool.Create(SizeOf(TNode)); for Inner := 1 to Iterations do begin Check += CheckNode(MakeTree(Depth, IPool)); IPool.Clear(); end; end; IPool.Free(); end; ``` where all `FromIndex` are replaced with `Index`, and not just the first appearance of `FromIndex`.
BeRo1985 commented 2019-03-09 15:40:53 +00:00 (Migrated from github.com)

and play with the Granularity parameter of ParallelFor, this is the smallest minimum value, how small a subdivided subrange can be as a minimum size.

and play with the `Granularity` parameter of `ParallelFor`, this is the smallest minimum value, how small a subdivided subrange can be as a minimum size.
Akira13641 commented 2019-03-09 15:42:42 +00:00 (Migrated from github.com)

Ok, I'll try a version with that. The benchmark did seem to give inconsistent results, so maybe that's why.

One other question: do you think it would be faster to do:

with TPasMP.Create() do begin
  Invoke(ParallelFor(@Data, 0, High(Data), @TNode.DoTrees));
  Free();
end;  

than it is to create the global instance?

Ok, I'll try a version with that. The benchmark did seem to give inconsistent results, so maybe that's why. One other question: do you think it would be faster to do: with TPasMP.Create() do begin Invoke(ParallelFor(@Data, 0, High(Data), @TNode.DoTrees)); Free(); end; than it is to create the global instance?
BeRo1985 commented 2019-03-09 15:46:27 +00:00 (Migrated from github.com)

Just use TPasMP.GetGlobalInstance in the most normal cases. Individal TPasMP instances are only for special cases necessary, for example when the root thread of a TPasMP instance is/should not the main thread of the whole process instance, and so on.

Just use TPasMP.GetGlobalInstance in the most normal cases. Individal TPasMP instances are only for special cases necessary, for example when the root thread of a TPasMP instance is/should not the main thread of the whole process instance, and so on.
BeRo1985 commented 2019-03-09 16:28:59 +00:00 (Migrated from github.com)

And please notify me, when it is also updated on https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

And please notify me, when it is also updated on `https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html`
Akira13641 commented 2019-03-09 17:13:36 +00:00 (Migrated from github.com)

It just went back up to the top again (both of the existing "incorrect" versions I've submitted so far.)

I will let you know when the new "correct" version has gone up though, certainly. Thanks again for the great library!

It just went back up to the top again (both of the existing "incorrect" versions I've submitted so far.) I will let you know when the new "correct" version has gone up though, certainly. Thanks again for the great library!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
BeRo1985/pasmp#11
No description provided.