Post

Opening and writing raster dataset with GDAL Rust

Opening and writing raster dataset with GDAL Rust

I’ve been trying to learn Rust lately. For some reason there is something that I find very appealing in this language. From all I could read, it’s very efficient in resources and memory, it’s fast, and it doesn’t let you mess around with memory. Regarding performances, it places itself next to C and C++, and when looking into the world of language performances, I learned that Python was kind of slow, around 100x slower according to some benchmarks. Obviously this value in itself doesn’t mean much, and many more aspects of a language must be taken account when choosing one for the task, but this information kind of planted its seed. It might be because I have a problem with inefficient and overly complicated process, but I find the idea to work with a system language very elegant.

But we’re not here to talk about the Rust language in itself. We’ll see how to read and write raster data using the Rust binding for GDAL. Also, this is about how I tried to compare the Rust and Python bindings for GDAL and how I learned that it doesn’t change a thing in terms of performance.

Wait what?

Well I started by thinking “Oh I can’t wait to see how faster GDAL in Rust is compared to Python”. So I started by reading a raster file and print some basic information. I use a band from a Sentinel-2 image.

How to open a raster file?

Let’s do it. We’ll be working with the GDAL crate from crates.io (cargo add gdal) and the GDAL package prom PyPi (pip install GDAL).

In Rust:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
use gdal::Dataset;

// Open dataset
let ds = Dataset::open("IMG.jp2").unwrap();
println!("This raster has {} bands.", ds.raster_count());

// Read band
let band = ds.rasterband(1).unwrap();

// Print statistics
if let Some(stats) = band.get_statistics(true, true).unwrap() {
    println!(
        "[ STATS ] =  Minimum={}, Maximum={}, Mean={}, StdDev={}",
        stats.min, stats.max, stats.mean, stats.std_dev
    );
};

I’m using a lot of .unwrap(), which in very short will cause the code to panic if a problem occurs, this is for a quick and dirty test. Within a “production” environment you would handle errors and stuff.

And in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from osgeo import gdal
gdal.UseExceptions()

# Open dataset
dataset = gdal.Open('IMG.jp2')
nband = dataset.RasterCount
print(f'This raster contains {nband} bands.') 

# Read band
srcband = dataset.GetRasterBand(1)

# Print statistics
stats = srcband.GetStatistics( True, True )
print(f'[ STATS ] =  Minimum=%.3f, Maximum=%.3f, Mean=%.3f, StdDev=%.3f' 
    % ( stats[0], stats[1], stats[2], stats[3] ))

File opened and information read. We can compare their runtime.

Comparing performances

Building the Rust program, then running and timing these two codes multiple times, we get results like follows:

RustPython
real 0m0.072sreal 0m0.156s
user 0m0.154suser 0m0.752s
sys 0m0.275ssys 0m1.689s

This is a naive approach on testing and comparing performances, not a full-fledged benchmark.

Twice as fast without any processing or multi-threading, at that point I though “Oh it’s gonna be good”.

Let’s add some writing then

It’s only the start, we’ll now read the data in our raster dataset and, since my source is in .jp2, write it in a new raster dataset in the GeoTiff format.

How to write a raster dataset with GDAL bindings?

We will be using the Create() approach rather than the CreateCopy() (see the nuance in the doc). Geospatialized rasters come with series of metadata and overviews. The CreateCopy() automatically takes those from the source dataset and copy them into the destination. While this obviously simplifies the work if we don’t need to bring any change to these information, using Create() helps to better understand the structure of a raster. Also I have a quite important performance difference between the two and I’ve found Create() more efficient for my use case.

So the code now becomes as follow.

In Rust:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
use gdal::{Dataset, DriverManager};

// Open dataset
let ds = Dataset::open("IMG.jp2").unwrap();
println!("This raster has {} bands.", ds.raster_count());

// Get source dataset metadata
let projection = ds.projection();
let spatial_ref = ds.spatial_ref().unwrap();
let geo_transform = ds.geo_transform().unwrap();
let (x, y) = ds.raster_size();

// Read band
let band = ds.rasterband(1).unwrap();

// Get GeoTiff driver  
let d = DriverManager::get_driver_by_name("GTiff").unwrap();

// Create the destination dataset as mutable with 1 band
let mut dst_ds = d.create_with_band_type::<u16, _>("IMG_as_tiff.gtiff", 
    x as isize, y as isize, 1).unwrap();

// Set metadata from source dataset
dst_ds.set_geo_transform(&geo_transform).unwrap();
dst_ds.set_projection(&projection).unwrap();
dst_ds.set_spatial_ref(&spatial_ref).unwrap();

// Get the destination band as mutable
let mut dst_band = dst_ds.rasterband(1).unwrap();

// Read the source data as Buffer<u16>
let buf = band.read_as::<u16>((0,0), (x, y), (x, y), None).unwrap();

// Write the data in the destination band
dst_band.write((0,0), (x, y), &buf).unwrap();

// Close dataset for good conscience
ds.close().unwrap();
dst_ds.close().unwrap();

And in Python:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Open dataset
ds = gdal.Open('IMG.jp2')
nband = ds.RasterCount
print(f'This raster has {nband} bands.')

# Get dataset metadata
srcprojection = ds.GetProjection()
srcspatialref = ds.GetSpatialRef()
srcgeotransform = ds.GetGeoTransform()
(xsize, ysize) = (ds.RasterXSize, ds.RasterYSize)

# Get GeoTiff driver
fileformat = "GTiff"
driver = gdal.GetDriverByName(fileformat)

# Create the destination dataset with 1 band
dst_ds = driver.Create('IMG_as_tiff.gtiff', xsize=xsize, ysize=ysize,
                    bands=nband, eType=gdal.GDT_UInt16)

# Set metadata from source dataset
dst_ds.SetProjection(srcprojection)
dst_ds.SetGeoTransform(srcgeotransform)
dst_ds.SetSpatialRef(srcspatialref)

# Get the destination band
srcband = ds.GetRasterBand(1)

# Read the source data as Array
srcdata = srcband.ReadAsArray()

# Write the data in the destination band
dst_ds.GetRasterBand(1).WriteArray(srcdata)

# Close dataset for good conscience
ds = None
dst_ds = None

Both codes should create a new raster file that resembles our source, but in GeoTiff. And that’s it, you can convert a raster file into any format. We’re just scratching the surface, you can subset data, reproject, transpose it if you want etc.

But let’s compare performances again

Now as “la cerise sur le gâteau”, let’s see how powerful Rust can be.

RustPython
real 0m1.024sreal 0m1.133s
user 0m17.150suser 0m17.335s
sys 0m1.629ssys 0m2.089s

Wait let me rerun.

RustPython
real 0m1.053sreal 0m1.141s
user 0m17.076suser 0m17.324s
sys 0m1.608ssys 0m2.134s

Hmm.

Wait what? (bis)

Well that’s not what I expected.

In the process I forgot something important, we are working with bindings. Asking Wikipedia what a binding is:

In programming and software design, binding is an application programming interface (API) that provides glue code specifically made to allow a programming language to use a foreign library or operating system service (one that is not native to that language).

So from my understanding, working with Rust or Python or any other language bindings is just working with the API that let us call the underlying code, for GDAL written in C and C++. Thus the overhead that we generate is only the interaction with the library, not during the process itself, since it’s running C/C++ code. The difference that we observe is probably the same that we were observing when we just opened a file.

Conclusion

GDAL raster processing is very similar in Rust and Python. When working with its bindings, it doesn’t change much which language you use to perform the GDAL operations in themself, since in any case you’re running C/C++ code from your program. But everything that is around need to be considered when choosing the right tool. If your program needs multi-threading then you’l probably chose Rust, if you need quick development without much optimization then Python might be a best fit. But it’s comfortable to see that the GDAL processing time won’t change significantly.

I’m not at all trying to say that Rust is not worth it, I’m excited to go further into it and work on projects, but it remains a tool and there should be no injunction to use it if it doesn’t bring much to the table for your use case.

Also it’s totally possible that I totally misunderstood something and that those comparisons are meaningless, happy to learn more about it.

This post is licensed under CC BY 4.0 by the author.