Command-Line Tool

What does this command-line tool do?

This command-line tool creates compact overviews over tens, hundreds or even thousands of simulation runs. This is done by performing an unsupervised learning method on the displacement field of the simulations. The result is a report in the form of a 3D plot. Every point in the plot is a simulation result. If two points are very close their simulation results are also expected to be very similar. One can confirm by using images from every simulation (there is an option to specify the path to a folder with those images).

Example Visualization Report

This is a plot for dimensionality reduction of a rail getting crushed. This plot has many detail levels.

Check out the following facts:

There are two groups.
- The small group stands for samples buckling bottom .
- The big group represents buckling in the top area.
There are two outliers
- Sample 143 is folding in the top and bottom.
- Sample 99 is folding in the middle.
Similarity Axis
- The similarity axis X describes buckling top or bottom
- The similarity axis Y describes how much a sample gets compressed
- The samples are basically in a plane. The out of plane axis represents buckling not strictly top but also slightly in the lower end (zoom in).

Quick Start

You can call the command-line tool as follows:

$ python -m lasso.dimred.run --help

usage: run.py [-h] --reference-run REFERENCE_RUN
              [--exclude-runs [EXCLUDE_RUNS [EXCLUDE_RUNS ...]]]
              [--start-stage [START_STAGE]] [--end-stage [END_STAGE]]
              --project-dir PROJECT_DIR [--embedding-images EMBEDDING_IMAGES]
              [--logfile-filepath [LOGFILE_FILEPATH]]
              [--n-processes [N_PROCESSES]]
              [--part-ids [PART_IDS [PART_IDS ...]]]
              [--timestep TIMESTEP]
              [--cluster-args [CLUSTER_ARGS [CLUSTER_ARGS ...]]]
              [--outlier-args [OUTLIER_ARGS [OUTLIER_ARGS ...]]]
              [simulation_runs [simulation_runs ...]]
Python utility script for dimensionality reduction.
positional arguments:
  simulation_runs       Simulation runs or patterns used to search for
                        simulation runs.
optional arguments:
  -h, --help            show this help message and exit
  --reference-run REFERENCE_RUN
                        Set the reference run instead of using the first entry in simulation runs.
  --exclude-runs [EXCLUDE_RUNS [EXCLUDE_RUNS ...]]
                        Runs to exclude from the analysis.
  --start-stage [START_STAGE]
                        At which specific stage to start the analysis
                        (REFERENCE_RUN, IMPORT_RUNS, REDUCTION, CLUSTERING,
                        EXPORT_PLOT).
  --end-stage [END_STAGE]
                        At which specific stage to stop the analysis
                        (REFERENCE_RUN, IMPORT_RUNS, REDUCTION, CLUSTERING,
                        EXPORT_PLOT).
  --project-dir PROJECT_DIR
                        Project dir for temporary files. Must be specified to
                        allow restart at specific steps
  --embedding-images EMBEDDING_IMAGES
                        Path to folder containing images of runs. Sample names
                        must be numbers
  --logfile-filepath [LOGFILE_FILEPATH]
                        Path for the logfile. A file will be created
                        automatically if a project dir is specified.
  --n-processes [N_PROCESSES]
                        Number of processes to use (default: n_cpu-1).
  --part-ids [PART_IDS [PART_IDS ...]]
                        Part ids to process. By default all are taken.
  --timestep TIMESTEP Sets timestep to analyse. Uses last timestep if not set.
  --cluster-args [CLUSTER_ARGS [CLUSTER_ARGS ...]]
                        Arguments for clustering algorithms. If not set,
                        clustering will be skipped.
  --outlier-args [OUTLIER_ARGS [OUTLIER_ARGS ...]]
                        Arguments for outlier detection befor clustering.

Following arguments are required for the analysis:

simulation-runs
--project-dir]

simulation-runs can be either tagged individually or by using placeholders for entire directories (e.g. '*.fz') and subdirectories (e.g. './**/*.fz').

Warning

Every run clears respective steps of the generated .hdf5 file of previous runs, as well as the logfile.

Tutorial

In this tutorial, we will introduce the application of the command-line utility and explain some of the arguments. Make sure to have some similar D3plots (20 should already be fine).

At first we will only apply the required arguments:

$ python -m lasso/dimred/run \
$     user/tutorial/plots/plot*.fz \
$     --project-dir user/tutorial/projectfolder

The first required argument is the filepath to the folder containing the simulation runs. In our case, all plots are contained in the same folder. If each plot is saved in its own folder, the command would look like this:

$ python -m lasso/dimred/run \
$     user/tutorial/plots*/plot.fz \
$     --project-dir user/tutorial/projectfolder

The --project-dir argument specifies the folder where the project data is saved. This will be a project_buffer.hdf5 file, a logfile, and the 3d_beta_plot[hh:mm:ss].html output. Note that for each run, corresponding entries in the project_buffer.hdf5 and the logfile will be overwritten. How to access the results in the project_buffer.hdf5 file will be discussed later.

Your output should look similar to this:

==== OPEN - LASSO ====

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ n-processes      ┃ 5                                                        ┃
│ reference-run    │ user/tuorial/plots/plot0.fz                              │
│ project-dir      │ user/tutorial/projectfolder                              │
│ # simul.-files   │ 35                                                       │
│ # excluded files │ 0                                                        │
└──────────────────┴──────────────────────────────────────────────────────────┘

   ---- Running Routines ----   
[14:25:56] Reference Subsample
 ...
[14:26:04] Loadtime Reference subample: 8.020
[14:26:04] Total time for Reference subample: 8.080
[14:26:04] Reference subsample completed
[14:26:04] Subsampling
Subsampling plots  ... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35 of 35;
[14:29:03] Average Time per Subsampling Process: 24.76
[14:29:03] Average Loadtime per sample: 24.55
[14:29:03] Subsampling completed
[14:29:03] Dimension Reduction
 ...
[14:29:04] Dimension Reduction completed
[14:29:04] No arguments provided for clustering, clustering aborted
[14:29:04] Creating .html viz
[14:29:05] Finished creating viz

The output provides information about the different stages of the dimred tool, as well as some additional information. This allows you to keep track of the tools progress. The following 5 stages are run through:

REFERENCE_RUN: The reference run is subsampled and saved. The reference run is either the first entry of the simulation runs, or can be manually set with --reference-run
IMPORT_RUNS: The provided simulation runs are loaded, and subsampled using the reference run. This process applies a nearest neighbor algorithm to match nodes between different meshes to compensate for different meshing.
REDUCTION: The dimensionality reduction is performed.
CLUSTERING: If clustering and outlier arguments are provided, the results will be further categorized.
EXPORT_PLOT: The results are exported and saved as .html file.

Note

Your computer may slow down while using the tool. If you want to use other programs while waiting on the tool to finish, reduce the amount of cores with --n-processes

Start and End stage

Next, we will take a look at the --start-stage and --end-stage arguments. These allow to restart and end the command-line utility at certain points in the process. This is usefull if you don't want to repeat certain stages to save time, or want to end the process prematurly, e.g. don't want to generate the .html output.

To set the desired start and end stage, use the the following keywords:

REFERENCE_RUN
IMPORT_RUNS
REDUCTION
CLUSTERING
EXPORT_PLOT

Example:

$ python -m lasso/dimred/run \
$     user/tutorial/plots \
$     --project-dir user/tutorial/projectfolder \
$     --start-stage REDUCTION \
$     --end-stage CLUSTERING

==== OPEN - LASSO ====

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ n-processes      ┃ 5                                                        ┃
│ reference-run    │ user/tutorial/plots/plot0.fz                             │
│ project-dir      │ user/tutorial/projectfolder                              │
│ # simul.-files   │ 35                                                       │
│ # excluded files │ 0                                                        │
└──────────────────┴──────────────────────────────────────────────────────────┘

   ---- Running Routines ----   
[14:45:28] Skipped import stage
[14:45:28] Dimension Reduction
 ...
[14:45:29] Dimension Reduction completed
[14:45:29] No arguments provided for clustering, clustering aborted

This process is much quicker, as the samples have already been loaded and subsampled. Note that this only works, if the stages before the selected start stage (here: REFERENCE_RUN and IMPORT_RUNS have already been processed before.

Clustering and outlier detection

The dimred tool has an integrated functionality to cluster your results and detect outliers. Following cluster algorithms from the scikit sklearn library have been implemented:

kmeans
SpectralClustering
DBScan
Optics

Additionally, you have access to the following outlier detection algorithms from the same library:

IsolationForest
LocalOutlierFactor
OneClassSVM

These classes have additional optional arguments, specified in their respective documentation. Here is an example on how to find two clusters with kmeans and detect oultiers with LocalOutlierFactor:

$ python -m lasso/dimred/run \
$    user/tutorial/plots \
$    --project-dir user/tutorial/projectfolder \
$    --reference-run user/tuorial/referenceplot \
$    --start-stage REDUCTION \
$    --cluster-arg kmeans n_clusters int 2 \
$    --outlier-args LocalOutlierFactor

!!! note> The argument n_clusters is followed by its type and then its value! The type of a keyword can be referenced by the appropriate sklearn documentation. Some types are not supported, make sure that they are either float, int or str.

We skip the import stage again and add the --cluster-args and --outlier-args arguments, followed by the desired algorithm and optional additional arguments and their values.

Accessing the results

All results, as well as the subsamples and reference subsample are saved in the project_buffer.hdf5 file. You can access these in your own python scripts:

import numpy as np
import h5py

h5py_path = "user/tutorial/project_buffer.hdf5"
h5file = h5py.File(h5py_path)

# access the reference subsamples:
ref_sample = h5file["subsample"][:]

# create a numpy array containing displacement of all subsamples
# this returns an array of shape (samples, timesteps, nodes, dims)
np.stack(h5file["subsampled_runs"][entry][:]
         for entry in h5file["subsampled_runs"].keys()])

# create a numpy array containg the right reduced order basis for projection:
v_rob = h5file["v_rob"][:]

# the subsampled runs are projected into the right reduced order basis and called betas:
betas = np.stack([h5file["betas"][entry][:] for entry in h5file["betas"].keys()])

These betas are used for the visulatization and if specified, clustering or outlier detection. In the visulatization, only the first 3 coefficients (betas) and only the last timestamp are accounted for.

If you have provided clustering and outlier detection arguments, you can also access the different clusters:

cluster_index = np.stack([
    h5file["betas"][entry].attrs["cluster"] for entry in h5file["betas"].keys()
])

beta_clusters = []

for cluster in range(h5file["betas"].attrs["nr_clusters"]):
    beta_clusters.append(betas[np.where(cluster_index == cluster)[0]])

The beta_clusters list contains lists of betas for each cluster. If outlier arguments have been provided, the first list contains all detected outliers.

FAQ

How to specify a path to the displayed images?

In the final HTML there is a menu on the left side. In it you can specify a path to the image folder, as well as the file-ending. Be aware: The names of the images must be a number starting at 0.