+ :•üi ãó<€^RIHt^RIHt^RIt^RIHtRRltR#)é)Úcuda)ÚdriverN)Ú numpy_supportc óTaa €\VR^4pV'gzVPwr4VPPV,VPP3p\P PPWC3VVPVR7p\P!VP4o\P!4Pp\\P!^\P !V^4^,44p\Wg,4pW‡^,3o \P"VV 3Rl4p \VP^,V,^,4\VP^,V,^,43p W‡3pWšW²3,!W4V#)aÁCompute the transpose of 'a' and store it into 'b', if given, and return it. If 'b' is not given, allocate a new array and return that. This implements the algorithm documented in http://devblogs.nvidia.com/parallelforall/efficient-matrix-transpose-cuda-cc/ :param a: an `np.ndarray` or a `DeviceNDArrayBase` subclass. If already on the device its stream will be used to perform the transpose (and to copy `b` to the device if necessary). Ústream)Údtypercó<€\PPS S R7p\PPp\PP p\PP\PP,p\PP \PP ,pWc,pWT,pWd,VP^,8d:WS,VP^,8dWV,WS,3,W$V3&\P!4WP^,8d*WqP^,8dW#V3,WV3&R#R#R#))ÚshaperN) rÚsharedÚarrayÚ threadIdxÚxÚyÚblockIdxÚblockDimr Úsyncthreads)ÚinputÚoutputÚtileÚtxÚtyÚbxÚbyrrÚdtÚ tile_shapes&& €€ÚZ/var/www/html/photoedit/myenv/lib/python3.14/site-packages/numba/cuda/kernels/transpose.pyÚkernelÚtranspose..kernel)súø€ô{‰{× Ñ z¸Ð Ó<ˆä ^‰^× Ñ ˆÜ ^‰^× Ñ ˆÜ ]‰]_‰_œtŸ}™}Ÿ™Õ .ˆÜ ]‰]_‰_œtŸ}™}Ÿ™Õ .ˆØGˆØGˆà 7U—[‘[ •^Ô#¨°%·+±+¸aµ.Ô(@Ø b¥¨"'Ð!1Õ2ˆDR‰LÜ×ÒÔØ|‰|˜AÔ 1§|¡|°A¥Ô#6Ø B <ˆFa4‹Lñ$7Ñó)Úgetattrr rÚitemsizerÚcudadrvÚdevicearrayÚ DeviceNDArrayÚnpsÚ from_dtyperÚ get_deviceÚMAX_THREADS_PER_BLOCKÚintÚmathÚpowÚlogÚjit)ÚaÚbrÚcolsÚrowsÚstridesÚtpbÚ tile_widthÚtile_heightrÚblocksÚthreadsrrs&& @@rÚ transposer8sHù€ôQ˜ !Ó $€FçØ—W‘W‰ ˆØ—'‘'×"Ñ" TÕ)¨1¯7©7×+;Ñ+;Ð;ˆÜL‰L×$Ñ$×2Ñ2Ø ˆLØØ—'‘'Øð 3ó ˆô Š˜Ÿ™Ó €Bä × Ò Ó × 3Ñ 3€Cä”T—X’X˜a¤§¢¨#¨qÓ!1°AÕ!5Ó6Ó7€JÜcÕ&Ó'€Kà¨A~Ð.€Jä ‡XXõ(óð(ô$—‘˜•˜kÕ)¨AÕ-Ó .´°A·G±G¸AµJÀÕ4KÈaÕ4OÓ0PÐ P€FàÐ%€GØ 7Ð"Ö# AÔ)à€Hr)N) ÚnumbarÚnumba.cuda.cudadrv.driverrr*Únumba.nprr%r8©rrÚr=sðÝÝ,ÛÝ)ö: r