Multithreading error


Mark Bolstad
 

Hoping for some insight. I’ve bootstrapped OSL into a custom renderer and for the most part my simple scene is working on a single thread( the shader is three layers, marble->checkerboard (Cb)-> diffuse (base_color)).

As soon as I enable a 2nd thread, I get an assertion violation in oslexec_pvt.h:1319, “reinterpret_cast<uintptr_t>(ptr) % alignment == 0” failed.

AFAIK I’m creating the thread info and shading context per thread correctly, but obviously I’m doing something wrong.

This is with 11.15 and 11.13. Any ideas as to where I should poke to find the error?

Mark


Larry Gritz
 

With a debug build, can you get a stack trace at the point that the assertion is hit?

That looks like an assertion in SimplePool::alloc(). But I'm not sure what could go wrong. Maybe something that contains a SimplePool that should be per thread is being shared?
 

On Sep 23, 2021, at 9:38 AM, Mark Bolstad <the.render.dude@...> wrote:

Hoping for some insight. I’ve bootstrapped OSL into a custom renderer and for the most part my simple scene is working on a single thread( the shader is three layers, marble->checkerboard (Cb)-> diffuse (base_color)).

As soon as I enable a 2nd thread, I get an assertion violation in oslexec_pvt.h:1319, “reinterpret_cast<uintptr_t>(ptr) % alignment == 0” failed.

AFAIK I’m creating the thread info and shading context per thread correctly, but obviously I’m doing something wrong.

This is with 11.15 and 11.13. Any ideas as to where I should poke to find the error?

Mark


--
Larry Gritz





Mark Bolstad
 

  * frame #0: 0x00007fff7270733a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff727c3e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff7268e808 libsystem_c.dylib`abort + 120
    frame #3: 0x0000000109225f81 liboslexec.1.11.dylib`OSL_v1_11::pvt::SimplePool<20480>::alloc(this=0x000000010f604410, size=56, alignment=16) at oslexec_pvt.h:1319:9
    frame #4: 0x0000000109225ad1 liboslexec.1.11.dylib`OSL_v1_11::ShadingContext::closure_component_allot(this=0x000000010f6042c0, id=0, prim_size=40, w=0x00007ffeefbeee40) at oslexec_pvt.h:1614:70
    frame #5: 0x0000000109225a87 liboslexec.1.11.dylib`::osl_allocate_closure_component(sg=0x00007ffeefbef138, id=0, size=40) at opclosure.cpp:54:25
    frame #6: 0x000000010f1e505e
    frame #7: 0x00000001090eeeb2 liboslexec.1.11.dylib`OSL_v1_11::ShadingContext::execute(this=0x000000010f6042c0, sgroup=0x000000010f70b0e0, ssg=0x00007ffeefbef138, run=true) at context.cpp:174:13
    frame #8: 0x00000001090628ae liboslexec.1.11.dylib`OSL_v1_11::pvt::ShadingSystemImpl::execute(this=0x000000011280da00, ctx=0x000000010f6042c0, group=0x000000010f70b0e0, ssg=0x00007ffeefbef138, run=true) at shadingsys.cpp:2783:24
    frame #9: 0x0000000109062824 liboslexec.1.11.dylib`OSL_v1_11::ShadingSystem::execute(this=0x000000010f510748, ctx=0x000000010f6042c0, group=0x000000010f70b0e0, globals=0x00007ffeefbef138, run=true) at shadingsys.cpp:262:20
    frame #10: 0x00000001002e1c36 libgraphics.dylib`papillon::execute_surface_shader(context=0x000000010f6042c0, shader=0x000000010f70b0e0, ctx=0x00007ffeefbef2f0) at renderer_services.cpp:50:25
...


Larry Gritz
 

It sure smells like you have two threads simultaneously messing with the same closure pool, which should never happen.

Are you EXTRA SURE that the threads are each using their own distinct ShadingContext? Like, I would print the addresses of the SC before you kick off an execute call to be sure the two threads are using different ones.


On Sep 23, 2021, at 10:17 AM, Mark Bolstad <the.render.dude@...> wrote:

  * frame #0: 0x00007fff7270733a libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff727c3e60 libsystem_pthread.dylib`pthread_kill + 430
    frame #2: 0x00007fff7268e808 libsystem_c.dylib`abort + 120
    frame #3: 0x0000000109225f81 liboslexec.1.11.dylib`OSL_v1_11::pvt::SimplePool<20480>::alloc(this=0x000000010f604410, size=56, alignment=16) at oslexec_pvt.h:1319:9
    frame #4: 0x0000000109225ad1 liboslexec.1.11.dylib`OSL_v1_11::ShadingContext::closure_component_allot(this=0x000000010f6042c0, id=0, prim_size=40, w=0x00007ffeefbeee40) at oslexec_pvt.h:1614:70
    frame #5: 0x0000000109225a87 liboslexec.1.11.dylib`::osl_allocate_closure_component(sg=0x00007ffeefbef138, id=0, size=40) at opclosure.cpp:54:25
    frame #6: 0x000000010f1e505e
    frame #7: 0x00000001090eeeb2 liboslexec.1.11.dylib`OSL_v1_11::ShadingContext::execute(this=0x000000010f6042c0, sgroup=0x000000010f70b0e0, ssg=0x00007ffeefbef138, run=true) at context.cpp:174:13
    frame #8: 0x00000001090628ae liboslexec.1.11.dylib`OSL_v1_11::pvt::ShadingSystemImpl::execute(this=0x000000011280da00, ctx=0x000000010f6042c0, group=0x000000010f70b0e0, ssg=0x00007ffeefbef138, run=true) at shadingsys.cpp:2783:24
    frame #9: 0x0000000109062824 liboslexec.1.11.dylib`OSL_v1_11::ShadingSystem::execute(this=0x000000010f510748, ctx=0x000000010f6042c0, group=0x000000010f70b0e0, globals=0x00007ffeefbef138, run=true) at shadingsys.cpp:262:20
    frame #10: 0x00000001002e1c36 libgraphics.dylib`papillon::execute_surface_shader(context=0x000000010f6042c0, shader=0x000000010f70b0e0, ctx=0x00007ffeefbef2f0) at renderer_services.cpp:50:25
...

--
Larry Gritz





Mark Bolstad
 

And that smell would be weird. I have two/N distinct contexts, but the way I was using them I would use an SC from one thread in a different one leading to the corruption.

Thanks,
Mark


Larry Gritz
 

Are you saying you've fully figured it out?


On Sep 23, 2021, at 11:46 AM, Mark Bolstad <the.render.dude@...> wrote:

And that smell would be weird. I have two/N distinct contexts, but the way I was using them I would use an SC from one thread in a different one leading to the corruption.

Thanks,
Mark


--
Larry Gritz





Mark Bolstad
 

Yup. Took some rearchitecting, but it works with 12 threads.

Thanks,

Mark