Correspondence-driven plane-based M3C2 (PBM3C2) with known segmentation

In this notebook, we are extending the PB-M3C2 implementation to work with segmentation information that is already present in the input data. This is useful if you are embedding the calculation into a larger workflow where a segmentation has already been produced.

[1]:
import py4dgeo
[2]:
py4dgeo.set_interactive_backend("vtk")

We are reading the two input epochs from XYZ files which contain a total of four columns: X, Y and Z coordinates, as well a segment ID mapping each point to a segment. The read_from_xyz functionality allows us to read additional data columns through its additional_dimensions parameter. It is expecting a dictionary that maps the column index to a column name.

[3]:
epoch0, epoch1 = py4dgeo.read_from_xyz(
    "plane_horizontal_t1_segmented.xyz",
    "plane_horizontal_t2_segmented.xyz",
    additional_dimensions={3: "segment_id"},
    delimiter=",",
)
[2024-05-14 13:04:24][INFO] Reading point cloud from file '/home/docs/.cache/py4dgeo/./plane_horizontal_t1_segmented.xyz'
[2024-05-14 13:04:24][INFO] Reading point cloud from file '/home/docs/.cache/py4dgeo/./plane_horizontal_t2_segmented.xyz'

Again, we instantiate the algorithm. Due to fundamental differences in the algorithm workflow, we are using a separated algorithm class for this use case:

[4]:
alg = py4dgeo.PBM3C2WithSegments()

Next, we will read the segmented point cloud, which is part of the input epochs, and reconstruct the required segments from it. As a result, we get the same information that we got from the export_segments_for_labelling method in the base PB-M3C2 implementation. Again, we need to provide labelling and can choose to do so either interactively or with external tools. In contrast to export_segments_for_labelling, reconstruct_post_segmentation_output only writes one file - the full segmentation information file (which defaults to extracted_segments.seg):

[5]:
xyz_epoch0, xyz_epoch1, segments = alg.reconstruct_post_segmentation_output(
    epoch0=epoch0,
    epoch1=epoch1,
)
[2024-05-14 13:04:24][INFO] No pipeline parameter is overwritten
[2024-05-14 13:04:24][INFO] Reconstruct post segmentation output using [x, y, z, segment_id] columns from epoch0
[2024-05-14 13:04:24][INFO] Reconstruct post segmentation output using [x, y, z, segment_id] columns from epoch1
[2024-05-14 13:04:24][INFO] Transformer Fit
[2024-05-14 13:04:24][INFO] Transformer Transform
[2024-05-14 13:04:24][INFO] Transformer Fit
[2024-05-14 13:04:24][INFO] Transformer Transform
[2024-05-14 13:04:24][INFO] Transformer Transform
[2024-05-14 13:04:24][INFO] 'Segments' saved in file: extracted_segments.seg
[2024-05-14 13:04:24][INFO] ----
 The pipeline parameters after restoration are:
{'_Transform_ExtractSegments': ExtractSegments(),
 '_Transform_ExtractSegments__columns': <class 'py4dgeo.pbm3c2.SEGMENT_COLUMNS'>,
 '_Transform_ExtractSegments__output_file_name': None,
 '_Transform_ExtractSegments__skip': False,
 '_Transform_Post Segmentation': PostPointCloudSegmentation(),
 '_Transform_Post Segmentation__columns': <class 'py4dgeo.pbm3c2.SEGMENTED_POINT_CLOUD_COLUMNS'>,
 '_Transform_Post Segmentation__compute_normal': True,
 '_Transform_Post Segmentation__output_file_name': None,
 '_Transform_Post Segmentation__skip': False,
 'memory': None,
 'steps': [('_Transform_Post Segmentation', PostPointCloudSegmentation()),
           ('_Transform_ExtractSegments', ExtractSegments())],
 'verbose': False}
----

Having completed the labelling process, we read it back in and start the trainging procedure:

[6]:
alg.training(
    extracted_segments_file_name="extracted_segments.seg",
    extended_y_file_name="testdata-labelling2.csv",
)
[2024-05-14 13:04:24][INFO] Reading segments from file '/home/docs/checkouts/readthedocs.org/user_builds/py4dgeo/checkouts/latest/doc/extracted_segments.seg'
[2024-05-14 13:04:24][INFO] Reading tuples of (segment epoch0, segment epoch1, label) from file '/home/docs/.cache/py4dgeo/./testdata-labelling2.csv'
[2024-05-14 13:04:24][INFO] Fit ClassifierWrapper
[7]:
distances, uncertainties = alg.compute_distances(epoch0, epoch1)
[2024-05-14 13:04:24][INFO] PBM3C2WithSegments.compute_distances(...)
[2024-05-14 13:04:24][INFO] PBM3C2._compute_distances(...)
[2024-05-14 13:04:24][INFO] No pipeline parameter is overwritten
[2024-05-14 13:04:24][INFO] Reconstruct post segmentation output using [x, y, z, segment_id] columns from epoch0
[2024-05-14 13:04:24][INFO] Reconstruct post segmentation output using [x, y, z, segment_id] columns from epoch1
[2024-05-14 13:04:24][INFO] Transformer Transform
[2024-05-14 13:04:25][INFO] Transformer Transform
[2024-05-14 13:04:25][INFO] Building KDTree structure with leaf parameter 10
[2024-05-14 13:04:26][INFO] ----
 The pipeline parameters after restoration are:
{'Classifier': ClassifierWrapper(),
 'Classifier__classifier': RandomForestClassifier(),
 'Classifier__classifier__bootstrap': True,
 'Classifier__classifier__ccp_alpha': 0.0,
 'Classifier__classifier__class_weight': None,
 'Classifier__classifier__criterion': 'gini',
 'Classifier__classifier__max_depth': None,
 'Classifier__classifier__max_features': 'sqrt',
 'Classifier__classifier__max_leaf_nodes': None,
 'Classifier__classifier__max_samples': None,
 'Classifier__classifier__min_impurity_decrease': 0.0,
 'Classifier__classifier__min_samples_leaf': 1,
 'Classifier__classifier__min_samples_split': 2,
 'Classifier__classifier__min_weight_fraction_leaf': 0.0,
 'Classifier__classifier__monotonic_cst': None,
 'Classifier__classifier__n_estimators': 100,
 'Classifier__classifier__n_jobs': None,
 'Classifier__classifier__oob_score': False,
 'Classifier__classifier__random_state': None,
 'Classifier__classifier__verbose': 0,
 'Classifier__classifier__warm_start': False,
 'Classifier__columns': <class 'py4dgeo.pbm3c2.SEGMENT_COLUMNS'>,
 'Classifier__diff_between_most_similar_2': 0.1,
 'Classifier__neighborhood_search_radius': 3,
 'Classifier__threshold_probability_most_similar': 0.8,
 'Transform_ExtractSegments': ExtractSegments(),
 'Transform_ExtractSegments__columns': <class 'py4dgeo.pbm3c2.SEGMENT_COLUMNS'>,
 'Transform_ExtractSegments__output_file_name': None,
 'Transform_ExtractSegments__skip': False,
 'Transform_Post_Segmentation': PostPointCloudSegmentation(),
 'Transform_Post_Segmentation__columns': <class 'py4dgeo.pbm3c2.SEGMENTED_POINT_CLOUD_COLUMNS'>,
 'Transform_Post_Segmentation__compute_normal': True,
 'Transform_Post_Segmentation__output_file_name': None,
 'Transform_Post_Segmentation__skip': False,
 'memory': None,
 'steps': [('Transform_Post_Segmentation', PostPointCloudSegmentation()),
           ('Transform_ExtractSegments', ExtractSegments()),
           ('Classifier', ClassifierWrapper())],
 'verbose': False}
----

Note: When comparing distance results between this notebook and the base algorithm notebook, you might notice, that results do not necessarily agree even if the given segmentation information is exactly the same as the one computed in the base algorithm. This is due to the reconstruction process in this algorithm being forced to select the segment position (exported as the core point) from the segment points instead of reconstructing the correct position from the base algorithm.

[ ]: