Skip to content

Improve dpnp.partition implementation#2766

Open
antonwolfy wants to merge 10 commits intomasterfrom
resolve-gh-2762
Open

Improve dpnp.partition implementation#2766
antonwolfy wants to merge 10 commits intomasterfrom
resolve-gh-2762

Conversation

@antonwolfy
Copy link
Contributor

@antonwolfy antonwolfy commented Feb 11, 2026

The PR propose to improve implementation and to use dpnp.sort call when

  • input array has number of dimensions > 1
  • input array has previously not supported integer dtype
  • axis keyword is passed (previously not supported)
  • sequence of kth is passed (previously not supported)
    In case of ndim > 1 previously the implementation from legacy backend was used, which is significantly slow (see performance comparation below). It used a copy of input data into the shared USM memory and included computations on the host.

This PR proposes to reuse dpnp.sort for all the above cases.
While in case when the legacy implementation is stable and fast (for 1D input array), it will remain, because it relays on std::nth_element from OneDPL.

The benchmark results were collected on PVC with help of the below code:

import dpnp, numpy as np
from dpnp.tests.helper import generate_random_numpy_array

a = generate_random_numpy_array(10**7, dtype=np.float64, seed_value=117)
ia = dpnp.array(a)
%timeit x = dpnp.partition(ia, 513); x.sycl_queue.wait()

Below tables contains data in case of 1D input array (shape=(10**7,)), where the implementation path was kept the same, plus adding support of missing integer dtypes using fallback on the sort function:

Implementation int32 uint32 int64 uint64 float32 float64 complex64 complex128
old (legacy backend) 7.46 ms not supported 9.46 ms not supported 7.39 ms 8.92 ms 10.9 ms 21.2 ms
new (backend + sort) 7.34 ms 10.8 ms 9.48 ms 12.5 ms 7.37 ms 8.89 ms 11 ms 21.2 ms

The following code was used for 2D input array with shape=(104, 104):

import dpnp, numpy as np
from dpnp.tests.helper import generate_random_numpy_array

a = generate_random_numpy_array((10**4, 10**4), dtype=np.float64, seed_value=117)
ia = dpnp.array(a)
%timeit x = dpnp.partition(ia, 1513); x.sycl_queue.wait()

In that case the new implementation is fully based on the sort call:

Implementation int32 int64 float32 float64 complex64 complex128
old (legacy backend) 6.4 s 6.89 s 7.36 s 7.66 s 8.61 s 10 s
new (sort) 57.4 ms 64.7 ms 62.2 ms 68 ms 77 ms 151 ms
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?

@antonwolfy antonwolfy added this to the 0.20.0 release milestone Feb 11, 2026
@antonwolfy antonwolfy self-assigned this Feb 11, 2026
@github-actions
Copy link
Contributor

View rendered docs @ https://intelpython.github.io/dpnp/pull/2766/index.html

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Array API standard conformance tests for dpnp=0.20.0dev2=py313h509198e_28 ran successfully.
Passed: 1354
Failed: 3
Skipped: 7

@coveralls
Copy link
Collaborator

coveralls commented Feb 11, 2026

Coverage Status

coverage: 81.074% (-0.04%) from 81.117%
when pulling 5f41788 on resolve-gh-2762
into af6205f on master.

@antonwolfy antonwolfy linked an issue Feb 11, 2026 that may be closed by this pull request
@antonwolfy antonwolfy marked this pull request as ready for review February 12, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dpnp.partition fails on 2D array of complex dtype

2 participants