Threading Grasshopper

Daniel Davis – 7 May 2011

Update 12 December 2013: This post is now over two years old. Just recently McNeel have released an update to the Grasshopper Python component that supports threading. I’m leaving this post up for prosperity, but you would be much better off using the Grasshopper Python component than following the advice below.

An alternative title might be Why CAD software hasn’t gotten any faster in the last three years. Simply, most CAD software is written to take advantage of only one processor, leaving the other three processors in your shiny new Quad-core i7 idle. The solution to the multi-processor problem is threading. Programming threads can seem daunting, I put them off until I had to create one for Yeti to, of all things, check for updates in the background without hanging up the interface. To my surprise I found threading in modern programming languages almost elegant. Naturally I was curious about whether threads can be used in Grasshopper, and whether they give any performance increase. It turns out threads can be used in Grasshopper, and that they significantly increase performance  – on my shiny new Quad-core i7 I get an almost 200% improvement. This post is the third part of a series on Optimising Grasshopper (following on from part one and part two). In this post I will explain a fairly ninja technique for breaking C# nodes down into multiple threads to be processed in parallel by all the processors on your computer.

What are threads?

Roughly speaking, a thread is a list of tasks for the processor in your computer to execute. All programs and scripts get compiled into a thread of tasks, which are then feed through the processor in linear order to produce the desired outcome. Conceptually this could be thought of as a chain of grasshopper nodes being evaluated one at a time by the processor.

Thread passing to one person
One processor works on the graph while the other three sit idle

Threads get interesting when you have more than one processor, like on the Quad-core Intel processor, which has four processors sitting in parallel. With a multi-processor computer, you can have each processor working on a separate thread. So the Quad-core processor can work through four threads simultaneously (it can actually work through more due to some processor magic but that is getting a little technical). The difficulty with a multi-processor computer is that the processors can not ‘talk’ to each-other, so (as shown above) a thread can only go through one processor at a time. This essentially means that if one of the processors on the Quad-core processor gets a really long thread of tasks, it cannot ask the neighboring processors to help out with some of the tasks, even if they are sitting idle. This is why sometimes you will be updating a really complex model in Grasshopper, and see the processors only working at 25% – there is only one thread going through the processors so only one of the four processors is actually working.

Multi-threading

Since the processor can not break a single thread into multipul strands, we have to break the tread up for the processor. Essentially rather than giving the processor one long thread of tasks, we can break the tasks up into little bundles of work, and generate a thread of tasks for each bundle. That way we can send a single thread of tasks as multipul threads and get each processor to work on a little part by itself.

Threads passing between two people
By splitting the thread up, it can be sent to more than one processor

Multi-threading in Grasshopper

To test threading in Grasshopper I decided to recreate the Project-point-to-surface node in Grasshopper. The full node can be downloaded from Parametric Model. The critical line of code in the node comes within the for-loop on lines 114-123:

  1. private void RunScript(List<System.Object> p, Surface s, ref object P, ref object uvP, ref object D)
  2. {
  3. int total = p.Count;
  4.  
  5. //setup arrays to catch data as it is produced.
  6. outP = new GH_Point[total];
  7. outUVP = new GH_Point[total];
  8. outD = new GH_Number[total];
  9.  
  10. //data needed to create the objects.
  11. done = total;
  12. mySurface = s;
  13.  
  14. //loop through each task, sending it to the threadpool
  15. for(int i = 0; i < total; i++)
  16. {
  17. if(p[i] is Point3d)
  18. {
  19. int tempI = i;
  20. Point3d tempP = (Point3d) p[i];
  21. taskInfo task = new taskInfo(tempI, tempP);
  22. System.Threading.ThreadPool.QueueUserWorkItem(pntToSrf, task);
  23. } else done-;
  24. }
  25.  
  26. //wait until all the threads are done.
  27. //TODO: Need a better wait method.
  28. while(done > 1)
  29. {
  30. System.Threading.Thread.Sleep(2);
  31. }
  32.  
  33. //push out the new data.
  34. P = outP;
  35. uvP = outUVP;
  36. D = outD;
  37. }
  38.  
  39. private static Surface mySurface = null;
  40.  
  41. private static object locker = new object();
  42. private static GH_Point[] outP = null;
  43. private static GH_Point[] outUVP = null;
  44. private static GH_Number[] outD = null;
  45. private static int done = 0;
  46.  
  47. //Adds the data to the outputs arrays. Have to lock this function to prevent co-current access.
  48. private static void addToList(GH_Point myPoint, GH_Point uvPoint, GH_Number myDis, int index)
  49. {
  50. lock(locker)
  51. {
  52. outP[index] = myPoint;
  53. outUVP[index] = uvPoint;
  54. outD[index] = myDis;
  55. done-;
  56. }
  57. }
  58.  
  59. //This is the function we feed into threads.
  60. static void pntToSrf(object data)
  61. {
  62. //check data is as we expect it.
  63. if(data is taskInfo)
  64. {
  65. taskInfo task = (taskInfo) data;
  66.  
  67. //find the closest point and distance to it.
  68. double u = 0;
  69. double v = 0;
  70. mySurface.ClosestPoint(task.startPoint, out u, out v);
  71. Point3d closestPoint = mySurface.PointAt(u, v);
  72. double dis = closestPoint.DistanceTo(task.startPoint);
  73.  
  74. //cast data into Grasshopper friendly terms.
  75. GH_Point uvPoint = new GH_Point(new Point3d(u, v, 0));
  76. GH_Point newPoint = new GH_Point(closestPoint);
  77. GH_Number newDis = new GH_Number(dis);
  78.  
  79. //add the new data to the output lists.
  80. addToList(newPoint, uvPoint, newDis, task.index);
  81. } else {
  82. done-;
  83. }
  84. }
  85.  
  86. public class taskInfo
  87. {
  88. public int index;
  89. public Point3d startPoint;
  90.  
  91. public taskInfo(int _index, Point3d _startPoint)
  92. {
  93. index = _index;
  94. startPoint = _startPoint;
  95. }
  96. }

Essentially this loop works its way through every point passed to the node. It then adds the point and the point’s index to a taskInfo object – a custom object created to store the information about the point being projected to the surface. The point is then projected onto the surface by calling the pntToSrf function and passing the taskInfo object in this line:

  1. System.Threading.ThreadPool.QueueUserWorkItem(pntToSrf, task);

Basically this line of code calls the pntToSrf function, but rather than calling it on the main thread (as you would do in a singally threaded application), it sends the function to the ThreadPool where a manager runs it on the next available thread. The ThreadPool has many different threads running through each processor in parallel, so the manager will send the pntToSrf function to whatever processor is not doing any work, which utilises the power each processor.

A couple of tricky things happen inside the thread, like the arrays are locked whenever data is written to them, to prevent the threads sending data into the array simultaneously:

  1. lock(locker)
  2. {
  3. outP[index] = myPoint;
  4. outUVP[index] = uvPoint;
  5. outD[index] = myDis;
  6. done-;
  7. }

Finally, since the threads are executing co-currently to the main thread, we have to pause the main thread until the other threads have finished processing before returning the data from the node. For simplicity sake I have used a really naive method of waiting, which loops until the right number of tasks are done.

  1. while(done > 1)
  2. {
  3. System.Threading.Thread.Sleep(2);
  4. }

And that is it.

In terms of performance, on my quad-core computer, the normal Grasshopper srfCP node can project 1,000,000 points onto a sphere in 10.0 seconds. Compared to both Digital Project and GC, this is remarkably fast, although it is nothing like the performance of Open Cascade. The same task performed with my threaded srfCP node takes 4.5 seconds, which is roughly a 2X improvement over the non-threaded version. You will notice using four cores does not result in a straight up 4X improvement, this is in part because some processor power is used to manage the threads, and partly because aspects of the code still happen in serial.

graph

Native types in Grasshopper

  1. outP = new GH_Point[total];
  2. outUVP = new GH_Point[total];
  3. outD = new GH_Number[total];

Some keen eyes would have picked up that the C# node is returning an array of GH_point’s rather than an array of point3d’s. When a C# node returns a point3d, this is automatically converted by Grasshopper into a GH_Point (allowing it to be baked ect.). This conversion is computationally expensive, presumably because Grasshopper is checking the data is valid. To avoid this performance hit, we can sidestep the conversion by giving Grasshopper the data back in its native format, a GH_Point. The other native Grasshopper formats can be found under Grasshopper.Kernal.Data. Similarly if the inputs to a node are of a specific type, Grasshopper does an automatic conversion. By asking for the inputs as a System.Object this conversion can be avoided.

A word to the eager

While multi-threading offers significant speed improvements, it is not without downsides. It can be challenging and frustrating to debug threaded code since interactions between threads can throw unforeseen errors from bugs that are invisible in the Grasshopper IDE. These errors will often crash Rhino: save frequently. Furthermore, not all functions in Rhino 4 are thread safe. So if you use the unsafe function of  plane-brep intersection in different threads, you will  either crash Rhino or end up with strange results. Rhino 5 addresses some of the thread safety issues and once it moves to .Net4.0 there will be access to even easier threading functions. For now take this post as an explanation for why CAD software hasn’t gotten any faster in the last three years, but expect to see significant performance increases in the near future once CAD finally catches up to processor developments.

For more information about multi-threading I recommended Joseph Albahari’s excellent tutorial on C# threading.