Algorithm customization

py4dgeo does not only provide a high performance implementation of the M3C2 base algorithm and some of it variants. It also allows you to rapidly prototype new algorithms in Python. We will demonstrate the necessary concepts by implementing some dummy algorithms without geographic relevance. First, we do the necessary setup, please consult the M3C2 notebook for details.

[1]:

import numpy as np
import py4dgeo

In order to test our dummy algorithms in this notebook, we load some point clouds:

[2]:

epoch1, epoch2 = py4dgeo.read_from_xyz(
    "plane_horizontal_t1.xyz", "plane_horizontal_t2.xyz"
)
corepoints = epoch1.cloud[::100]

[2024-05-14 13:00:40][INFO] Reading point cloud from file '/home/docs/.cache/py4dgeo/./plane_horizontal_t1.xyz'
[2024-05-14 13:00:41][INFO] Reading point cloud from file '/home/docs/.cache/py4dgeo/./plane_horizontal_t2.xyz'

Inherting from the algorithm class

Each algorithm is represented by a class that inherits from M3C2LikeAlgorithm. It does not need to inherit directly from that class, but can e.g. also inherit from a more specialized class like M3C2. Our first algorithm will behave exactly like M3C2 only that it reports a different name:

[3]:

class RenamedAlgorithm(py4dgeo.M3C2):
    @property
    def name(self):
        return "My super-duper M3C2 algorithm"

In the following, we will go over possible customization points for derived algorithm classes.

Changing search directions

Next, we switch to another method of determining the search direction, namely the constant direction (0, 0, 1):

[4]:

class DirectionAlgorithm(RenamedAlgorithm):
    def directions(self):
        return np.array([0, 0, 1])

[5]:

DirectionAlgorithm(
    epochs=(epoch1, epoch2), corepoints=corepoints, cyl_radii=(5.0,)
).run()

[2024-05-14 13:00:41][INFO] Building KDTree structure with leaf parameter 10
[2024-05-14 13:00:41][INFO] Building KDTree structure with leaf parameter 10

[5]:

(array([-0.10068844, -0.10017085, -0.10033093, -0.09955525, -0.09876984]),
 array([(0.00187895, 0.00280876, 26, 0.00384368, 24),
        (0.00091541, 0.0024623 , 79, 0.00325635, 75),
        (0.00088682, 0.0027008 , 81, 0.00293255, 75),
        (0.00098606, 0.00300521, 81, 0.00325891, 75),
        (0.00153745, 0.003502  , 37, 0.00306054, 33)],
       dtype=[('lodetection', '<f8'), ('spread1', '<f8'), ('num_samples1', '<i8'), ('spread2', '<f8'), ('num_samples2', '<i8')]))

In the above, we chose a constant vector across all corepoints by providing an array of shape (1x3). Alternatively we may provide an array of the same shape as the corepoints array to implement a normal direction that varies for each core point.

Adding Python callbacks to the C++ implementation

py4dgeo implements the M3C2 algorithm in performance-oriented C++. The implementation is substructured as follows: The main algorithm for distance calculation gets passed several callback functions that it calls during distance calculation. For each of these callback functions, there are two implementations:

An efficient C++ function that is exposed through Python bindings
A pure Python function that serves as a fallback/reference implementation.

In order to customize the algorithm behaviour, you can also provide your own implementation (either in Python or in C++). See the following example where we like this:

[6]:

def my_custom_workingset_finder(*args):
    print("I was called and returned a single point")
    return np.zeros((1, 3))

[7]:

class CallbackAlgorithm(RenamedAlgorithm):
    def callback_workingset_finder(self):
        return my_custom_workingset_finder

[8]:

CallbackAlgorithm(
    epochs=(epoch1, epoch2),
    corepoints=corepoints,
    cyl_radii=(2.0,),
    normal_radii=(1.0, 2.0),
).run()

I was called and returned a single pointI was called and returned a single point
I was called and returned a single point

I was called and returned a single point
I was called and returned a single point
I was called and returned a single point
I was called and returned a single point
I was called and returned a single point
I was called and returned a single point
I was called and returned a single point

[8]:

(array([0., 0., 0., 0., 0.]),
 array([(nan, nan, 1, nan, 1), (nan, nan, 1, nan, 1),
        (nan, nan, 1, nan, 1), (nan, nan, 1, nan, 1),
        (nan, nan, 1, nan, 1)],
       dtype=[('lodetection', '<f8'), ('spread1', '<f8'), ('num_samples1', '<i8'), ('spread2', '<f8'), ('num_samples2', '<i8')]))

In order to learn about what possible callbacks there are and what arguments they are expecting, please have a look at the Python fallback implementations in fallback.py Note: There will be better documentation about this in the future! For educational, testing and debugging purposes, there is an implementation of the M3C2 base algorithm that exclusively uses Python fallbacks:

[9]:

from py4dgeo.fallback import PythonFallbackM3C2

[10]:

PythonFallbackM3C2(
    epochs=(epoch1, epoch2),
    corepoints=corepoints,
    cyl_radii=(2.0,),
    normal_radii=(1.0, 2.0),
).run()

[10]:

(array([-0.10269298, -0.10179986, -0.10091512, -0.09819643, -0.09911222]),
 array([(0.00565384, 0.00417673,  5, 0.00491526,  5),
        (0.00255157, 0.00203436, 11, 0.00380835, 11),
        (0.00281475, 0.00259167, 12, 0.0040656 , 11),
        (0.00295429, 0.0032822 , 11, 0.00377072, 11),
        (0.00360769, 0.00356988, 10, 0.00436149,  9)],
       dtype=[('lodetection', '<f8'), ('spread1', '<f8'), ('num_samples1', '<i8'), ('spread2', '<f8'), ('num_samples2', '<i8')]))

Important: Using Python callbacks does considerably slow down your algorithm. While this is true for sequential runs, the effects are even more substantial when applying multi-threading. In the worst case (where you spend all your runtime in Python callbacks), your parallel performance will degrade to sequential. Use this feature for prototyping, but provide a C++ implementation of your callback for production runs.

Other customization

If your algorithm requires a different customization point, please open an issue on the py4dgeo issue tracker.