added safe wrappers called copy_from_async_sync and copy_to_async_syc in crates/cust/src/memory/device/device_slice.rs#140
added safe wrappers called copy_from_async_sync and copy_to_async_syc in crates/cust/src/memory/device/device_slice.rs#140Adesoji1 wants to merge 1 commit intoRust-GPU:mainfrom
Conversation
…c in crates/cust/src/memory/device/device_slice.rs
|
Thank you for the PR! What can these new wrappers do that the CopyDestination trait methods, which are sync and safe, cannot? Does the addition of an explicit stream parameter help with some problem? |
thank you @juntyr i suppose that while the synchronous CopyDestination methods block until the copy is complete using the default or an internal stream, the new wrappers let the caller supply a specific stream. i believe that this is important for cases of when you want to integrate the copy into a larger asynchronous workflow or maybe coordinate it with other operations running on that stream though i stand to be corrected .Also i believe that the asynchronous with safety copy means that if needed, one could modify the usage (or write additional wrappers) to take advantage of overlapping computation and data transfer. |
|
Is this a theoretical improvement or do you have code or intend to write code that needs this? I'd love to see an actual example. |
For a code that need this, kindly view i await your response on this, also for the CI/CD test of the code, it failed because of the version in the github actions but after updating, you can run the code, ot will give no error, Thank you |
Now i introduced Safe Wrappers:
I suppose that these new two new methods (named copy_from_async_sync and copy_to_async_sync) simply wraps the existing unsafe methods defined by the AsyncCopyDestination(
https://bheisler.github.io/RustaCUDA/rustacuda/memory/trait.CopyDestination.html) trait. So they perform the asynchronous copy and then immediately call stream.synchronize(), thereby ensuring that the copy is complete before returning right?.
Furthermore, i know that with these additions, users who do not need overlapping computation can avoid unsafe blocks and explicit synchronization. This methods return a CudaResult<()> (https://bheisler.github.io/RustaCUDA/rustacuda/error/enum.CudaError.html) so that any error from either the asynchronous copy or the stream synchronization is propagated.
Availability on DeviceBuffer:
Now Since DeviceBuffer implements Deref<Target = DeviceSlice>, these new methods are also available on DeviceBuffer.