Ex2 - Labels Datacube¶
This notebook provides a quick demo on how to create a datacube of training labels to complement a datacube of imagery. The training label datacube is critical for supervised machine learning. Having the ability to ingest labels in a datacube format provides flexibility for ML engineers to train models easily without worrying about underlying remote sensing image formats, which can be sometimes daunting for non experts.
import os
from pathlib import Path
import glob
import numpy as np
import pandas as pd
import rasterio
from icecube.bin.labels_cube.create_json_labels import CreateLabels
import icecube
Currently as of version 1.0, icecube
provides the ability to ingest labels that are both vector and in raster format. These two formats should cover almost all of the use cases needed for training machine learning models for supervised training. Vector labels (e.g. bounding boxes, polygons) are useful for training object detectors while raster images are often (but not always) leveraged for segmentation wokflows.
# set paths here.
icecube_abspath = str(Path(icecube.__file__).parent.parent)
resource_dir = os.path.join(icecube_abspath, "tests/resources")
grd_dir = os.path.join(resource_dir, "grd_stack/")
vector_labels_save_fpath = os.path.join(icecube_abspath, "icecube/dataset/temp/dummy_vector_labels.json")
raster_labels_save_fpath = os.path.join(icecube_abspath, "icecube/dataset/temp/dummy_raster_labels.json")
cube_save_path = os.path.join(icecube_abspath, "icecube/dataset/temp/my_awesome_labels_cube.nc")
Path(str(Path(vector_labels_save_fpath).parent)).mkdir(parents=True, exist_ok=True)
grd_fpaths = glob.glob(grd_dir+"*")
Creating JSON Labels¶
In order to populate labels in icecube
, labels must be converted to a specific JSON structure. The section walks one through the script that can be used to create such labels.
1. Example with vector labels¶
Let's first go through an example where we will use the assets in tests/resources
to demonstrate the example with vector labels.
The below example showcase ingesting bounding boxes inside datacubes as labels.
For the examples below, we will use assets inside tests/resources/grd_stack/*.tif
# let's create some random bounding boxes for dummy training sample
random_classes = ["rand-a", "rand-b", "rand-c"]
def create_random_bboxes(N, I):
random_bboxes = []
I_shape = I.shape
for _, i in enumerate(range(N)):
xmin, ymin = np.random.randint(0,I_shape[0]), np.random.randint(0,I_shape[1])
xmax, ymax = np.random.randint(xmin, I_shape[0]), np.random.randint(ymin,I_shape[1])
random_bboxes.append([xmin, ymin, xmax, ymax])
return random_bboxes
# For demo purposes, we will generate some random samples for bounding boxes.
labels_collection = []
for grd_fpath in grd_fpaths:
grd_values = rasterio.open(grd_fpath).read(1)
grd_product = rasterio.open(grd_fpath).tags()["PRODUCT_FILE"]
bboxes_seq = create_random_bboxes(np.random.randint(30,45), grd_values)
for each_bbox in bboxes_seq:
labels_collection.append([grd_product, each_bbox[0],
each_bbox[1],
each_bbox[2],
each_bbox[3],
random_classes[np.random.randint(0,3)]])
labels_df = pd.DataFrame(labels_collection, columns=["file_name", "xmin", "ymin", "xmax", "ymax", "class"])
labels_df.head(5)
file_name | xmin | ymin | xmax | ymax | class | |
---|---|---|---|---|---|---|
0 | ICEYE_GRD_54549_20210427T215124_hollow_10x10pi... | 5 | 8 | 9 | 9 | rand-c |
1 | ICEYE_GRD_54549_20210427T215124_hollow_10x10pi... | 5 | 3 | 6 | 6 | rand-c |
2 | ICEYE_GRD_54549_20210427T215124_hollow_10x10pi... | 8 | 2 | 9 | 9 | rand-b |
3 | ICEYE_GRD_54549_20210427T215124_hollow_10x10pi... | 7 | 2 | 9 | 4 | rand-a |
4 | ICEYE_GRD_54549_20210427T215124_hollow_10x10pi... | 3 | 5 | 4 | 9 | rand-c |
# Now we can easily convert these bounding boxes to icecube-friendly JSON strucutre using `CreateLabels` class
create_labels = CreateLabels("vector")
for i, df_row in labels_df.T.iteritems():
product_labels_seq = []
product_name = df_row.iloc[0]
# instance label contains bounding box for a unit label.
instance_label = {"xmin": df_row.iloc[1],
"ymin": df_row.iloc[2],
"xmax": df_row.iloc[3],
"ymax": df_row.iloc[4],
}
class_name = df_row.iloc[5]
# product_labels_seq contains sequence of WKT geom vectors
product_labels_seq.append(
create_labels.create_instance_bbox(class_name, instance_label)
)
create_labels.populate_labels(str(product_name), product_labels_seq)
create_labels.write_labels_to_json(vector_labels_save_fpath, ensure_ascii=True)
# Here is a glimpse at how labels look like for a single image in the stack:
create_labels.labels_collection[0]
{'product_file': 'ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_2.tif', 'labels': {'objects': [{'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 8, 'xmax': 9, 'ymax': 9}}, [{'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 3, 'xmax': 6, 'ymax': 6}}], [{'class': 'rand-b', 'bbox': {'xmin': 8, 'ymin': 2, 'xmax': 9, 'ymax': 9}}], [{'class': 'rand-a', 'bbox': {'xmin': 7, 'ymin': 2, 'xmax': 9, 'ymax': 4}}], [{'class': 'rand-c', 'bbox': {'xmin': 3, 'ymin': 5, 'xmax': 4, 'ymax': 9}}], [{'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 8, 'xmax': 7, 'ymax': 9}}], [{'class': 'rand-c', 'bbox': {'xmin': 3, 'ymin': 5, 'xmax': 5, 'ymax': 6}}], [{'class': 'rand-b', 'bbox': {'xmin': 1, 'ymin': 1, 'xmax': 2, 'ymax': 4}}], [{'class': 'rand-b', 'bbox': {'xmin': 6, 'ymin': 3, 'xmax': 6, 'ymax': 8}}], [{'class': 'rand-a', 'bbox': {'xmin': 0, 'ymin': 3, 'xmax': 3, 'ymax': 9}}], [{'class': 'rand-b', 'bbox': {'xmin': 4, 'ymin': 7, 'xmax': 8, 'ymax': 8}}], [{'class': 'rand-c', 'bbox': {'xmin': 9, 'ymin': 1, 'xmax': 9, 'ymax': 1}}], [{'class': 'rand-a', 'bbox': {'xmin': 0, 'ymin': 9, 'xmax': 0, 'ymax': 9}}], [{'class': 'rand-a', 'bbox': {'xmin': 4, 'ymin': 2, 'xmax': 5, 'ymax': 4}}], [{'class': 'rand-b', 'bbox': {'xmin': 9, 'ymin': 8, 'xmax': 9, 'ymax': 8}}], [{'class': 'rand-c', 'bbox': {'xmin': 6, 'ymin': 0, 'xmax': 6, 'ymax': 6}}], [{'class': 'rand-b', 'bbox': {'xmin': 7, 'ymin': 7, 'xmax': 8, 'ymax': 8}}], [{'class': 'rand-a', 'bbox': {'xmin': 6, 'ymin': 7, 'xmax': 9, 'ymax': 7}}], [{'class': 'rand-a', 'bbox': {'xmin': 8, 'ymin': 6, 'xmax': 9, 'ymax': 7}}], [{'class': 'rand-b', 'bbox': {'xmin': 8, 'ymin': 2, 'xmax': 9, 'ymax': 4}}], [{'class': 'rand-b', 'bbox': {'xmin': 4, 'ymin': 2, 'xmax': 7, 'ymax': 2}}], [{'class': 'rand-b', 'bbox': {'xmin': 6, 'ymin': 8, 'xmax': 8, 'ymax': 9}}], [{'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 5, 'xmax': 7, 'ymax': 6}}], [{'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 2, 'xmax': 9, 'ymax': 4}}], [{'class': 'rand-a', 'bbox': {'xmin': 9, 'ymin': 2, 'xmax': 9, 'ymax': 6}}], [{'class': 'rand-a', 'bbox': {'xmin': 2, 'ymin': 1, 'xmax': 2, 'ymax': 7}}], [{'class': 'rand-a', 'bbox': {'xmin': 4, 'ymin': 6, 'xmax': 8, 'ymax': 8}}], [{'class': 'rand-c', 'bbox': {'xmin': 1, 'ymin': 8, 'xmax': 7, 'ymax': 8}}], [{'class': 'rand-b', 'bbox': {'xmin': 6, 'ymin': 2, 'xmax': 8, 'ymax': 5}}], [{'class': 'rand-b', 'bbox': {'xmin': 4, 'ymin': 9, 'xmax': 5, 'ymax': 9}}], [{'class': 'rand-c', 'bbox': {'xmin': 3, 'ymin': 1, 'xmax': 7, 'ymax': 4}}], [{'class': 'rand-a', 'bbox': {'xmin': 7, 'ymin': 0, 'xmax': 7, 'ymax': 3}}], [{'class': 'rand-a', 'bbox': {'xmin': 1, 'ymin': 0, 'xmax': 3, 'ymax': 1}}], [{'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 8, 'xmax': 5, 'ymax': 9}}]]}}
Please note that we can follow similar structure for other WKT geometries like Polygons, Points and easily create JSON files with labels. Sample vector labels can be found under tests/resources/labels/dummy_vector_labels.json
for reference.
2. Example with Raster Labels¶
Similar to above example with vector labels, we can ingest rasters as segmentation labels too. This example quickly highlights the workflow to create icecube-friendly raster JSON structure.
Creating JSON structure for raster labels is relatively straightforward. We simply maintain a dictionary where key represents the product-file (or the image) and value represents the raster as label.
For this example, we will use the sample masks inside tests/resources
masks_dir = os.path.join(resource_dir, "masks/")
raster_dir = os.path.join(resource_dir, "grd_stack/")
raster_names = [
"ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_0.tif",
"ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_1.tif",
"ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_2.tif",
]
masks_names = [
"ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_0.png",
"ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_1.png",
"ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_2.png",
]
masks_fpaths = [os.path.join(masks_dir, fpath) for fpath in masks_names]
# Create a dictionary where key:value pair represents raster:mask
raster_mask_dict = {}
for raster_name, mask_fpath in zip(raster_names, masks_fpaths):
raster_mask_dict[raster_name] = mask_fpath
create_labels = CreateLabels("raster")
for product_name, mask_fpath in raster_mask_dict.items():
seg_mask = create_labels.create_instance_segmentation(mask_fpath)
create_labels.populate_labels(product_name, seg_mask)
create_labels.write_labels_to_json(raster_labels_save_fpath)
This is how our JSON file looks like for raster labels.
/home/user/runner/
simply indicates the local filepath of rasters
[
{
"product_file": "ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_0.tif",
"labels": {
"segmentation": "/home/user/runner/icecube/tests/resources/masks/ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_0.png"
}
},
{
"product_file": "ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_1.tif",
"labels": {
"segmentation": "/home/user/runner/icecube/tests/resources/masks/ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_1.png"
}
},
{
"product_file": "ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_2.tif",
"labels": {
"segmentation": "/home/user/runner/icecube/tests/resources/masks/ICEYE_GRD_SLED_54549_20210427T215124_hollow_10x10pixels_fake_2.png"
}
}
]
Populating Datacubes with Labels¶
Once we have create the icecube formatted JSON structure either for vector geometries or for raster labels, it is fairly straightforward to convert them to an xr.Dataset
or append them to an already created xr.Dataset
# First thing first, some imports
from icecube.bin.labels_cube.labels_cube_generator import LabelsDatacubeGenerator
from icecube.bin.config import CubeConfig
from icecube.bin.datacube import Datacube
# Let's create a Datacube from our labels.json file. For demo purposes, we will use only vector labels.
config_dir = os.path.join(resource_dir, "json_config/")
default_config_fpath = os.path.join(config_dir, "config_use_case_default.json")
raster_dir = os.path.join(resource_dir, "grd_stack")
dummy_vector_labels_fpath = os.path.join(
resource_dir, "labels/dummy_vector_labels.json"
)
cc = CubeConfig()
product_type = "GRD"
cc.load_config(default_config_fpath)
labels_datacube = LabelsDatacubeGenerator.build(
cc, product_type, dummy_vector_labels_fpath, raster_dir
)
labels_datacube.to_file(cube_save_path)
09/07/2021 05:28:38 PM - sar_datacube_metadata.py - [INFO] - Building the metadata from the folder /mnt/xor/ICEYE_PACKAGES/icecube/tests/resources/grd_stack using GRD processing rasters for labels cube: 100%|██████████| 3/3 [00:00<00:00, 3634.58it/s] 09/07/2021 05:28:38 PM - common_utils.py - [INFO] - create running time is 0.0163 seconds
And that was it, we have a labels datacube generated! 🎉
.
For inspecting the elements of labels datacube, it is recommended to convert associated xr.Dataset
to the Datacube
core class, as it provides ready methods to process datacubes. More details on the Datacube
core class be found in demo notebook: Ex4_Datacube
# We can see that returned object is an instance of class VectorLabels
type(labels_datacube)
icecube.bin.labels_cube.vector_labels.VectorLabels
# We can throw `labels_datacube.xrDataset` to Datacube core class to easily access useful operations on the cube
dc = Datacube().set_xrdataset(labels_datacube.xrdataset)
print(dc.get_data_variables())
['Labels']
# Finally we can easily see what is inside our datacube easily for one of our products
labels_xrarray = dc.get_xrarray("Labels")
POI = dc.get_all_products(labels_xrarray)[0]
print("Associated labels with product-file: {} are \n".format(POI))
dc.get_product_values(POI, labels_xrarray)
Associated labels with product-file: ICEYE_GRD_54549_20210427T215124_hollow_10x10pixels_fake_1.tif are
{'objects': [{'class': 'rand-b', 'bbox': {'xmin': 1, 'ymin': 0, 'xmax': 5, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 8, 'ymin': 9, 'xmax': 8, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 6, 'ymin': 9, 'xmax': 8, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 9, 'ymin': 0, 'xmax': 9, 'ymax': 3}}, {'class': 'rand-b', 'bbox': {'xmin': 5, 'ymin': 7, 'xmax': 9, 'ymax': 8}}, {'class': 'rand-c', 'bbox': {'xmin': 9, 'ymin': 7, 'xmax': 9, 'ymax': 8}}, {'class': 'rand-a', 'bbox': {'xmin': 6, 'ymin': 5, 'xmax': 6, 'ymax': 5}}, {'class': 'rand-c', 'bbox': {'xmin': 2, 'ymin': 9, 'xmax': 2, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 2, 'ymin': 7, 'xmax': 6, 'ymax': 7}}, {'class': 'rand-a', 'bbox': {'xmin': 5, 'ymin': 5, 'xmax': 8, 'ymax': 8}}, {'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 3, 'xmax': 5, 'ymax': 4}}, {'class': 'rand-a', 'bbox': {'xmin': 9, 'ymin': 5, 'xmax': 9, 'ymax': 5}}, {'class': 'rand-a', 'bbox': {'xmin': 0, 'ymin': 5, 'xmax': 7, 'ymax': 8}}, {'class': 'rand-c', 'bbox': {'xmin': 8, 'ymin': 4, 'xmax': 8, 'ymax': 8}}, {'class': 'rand-c', 'bbox': {'xmin': 3, 'ymin': 2, 'xmax': 5, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 7, 'ymin': 2, 'xmax': 7, 'ymax': 6}}, {'class': 'rand-b', 'bbox': {'xmin': 9, 'ymin': 2, 'xmax': 9, 'ymax': 9}}, {'class': 'rand-a', 'bbox': {'xmin': 2, 'ymin': 1, 'xmax': 6, 'ymax': 2}}, {'class': 'rand-b', 'bbox': {'xmin': 7, 'ymin': 5, 'xmax': 8, 'ymax': 9}}, {'class': 'rand-a', 'bbox': {'xmin': 9, 'ymin': 2, 'xmax': 9, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 6, 'ymin': 6, 'xmax': 9, 'ymax': 7}}, {'class': 'rand-c', 'bbox': {'xmin': 6, 'ymin': 3, 'xmax': 6, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 8, 'ymin': 3, 'xmax': 8, 'ymax': 3}}, {'class': 'rand-a', 'bbox': {'xmin': 8, 'ymin': 3, 'xmax': 8, 'ymax': 8}}, {'class': 'rand-a', 'bbox': {'xmin': 4, 'ymin': 7, 'xmax': 7, 'ymax': 8}}, {'class': 'rand-a', 'bbox': {'xmin': 2, 'ymin': 1, 'xmax': 2, 'ymax': 7}}, {'class': 'rand-b', 'bbox': {'xmin': 3, 'ymin': 7, 'xmax': 6, 'ymax': 7}}, {'class': 'rand-b', 'bbox': {'xmin': 1, 'ymin': 4, 'xmax': 3, 'ymax': 4}}, {'class': 'rand-c', 'bbox': {'xmin': 6, 'ymin': 1, 'xmax': 7, 'ymax': 6}}, {'class': 'rand-b', 'bbox': {'xmin': 8, 'ymin': 4, 'xmax': 9, 'ymax': 9}}, {'class': 'rand-a', 'bbox': {'xmin': 4, 'ymin': 3, 'xmax': 7, 'ymax': 9}}, {'class': 'rand-a', 'bbox': {'xmin': 9, 'ymin': 3, 'xmax': 9, 'ymax': 6}}, {'class': 'rand-a', 'bbox': {'xmin': 4, 'ymin': 4, 'xmax': 5, 'ymax': 4}}, {'class': 'rand-a', 'bbox': {'xmin': 6, 'ymin': 7, 'xmax': 6, 'ymax': 7}}, {'class': 'rand-b', 'bbox': {'xmin': 4, 'ymin': 3, 'xmax': 5, 'ymax': 8}}, {'class': 'rand-c', 'bbox': {'xmin': 5, 'ymin': 7, 'xmax': 9, 'ymax': 8}}, {'class': 'rand-c', 'bbox': {'xmin': 7, 'ymin': 9, 'xmax': 9, 'ymax': 9}}, {'class': 'rand-b', 'bbox': {'xmin': 9, 'ymin': 7, 'xmax': 9, 'ymax': 9}}, {'class': 'rand-a', 'bbox': {'xmin': 2, 'ymin': 2, 'xmax': 8, 'ymax': 7}}, {'class': 'rand-b', 'bbox': {'xmin': 7, 'ymin': 8, 'xmax': 7, 'ymax': 8}}]}
Great, our vector geometries are preserved inside datacube. Similarly we can show it for our raster labels. We will leave it as an exercise for you to get your hands dirty on the code for that part.
Happy Coding :)