A Primer on GDAL - Geospatial Data Abstraction Library
What is GDAL?
GDAL is Geospatial Data Abstraction Library for various raster and vector formats. This is an Open Source Library licensed by Open Source Geospatial Foundation. GDAL has a different module to handle vector operations which is OGR (OpenGIS Simple Feature Reference).
GDAL is the base for many proprietary and Open Source GIS Softwares like ArcGIS by ESRI, QGIS, ERDAS and many more
What can you do with it?
This library gives you power to translate/transform various complex geospatial formats and extract useful insights. There are many scenario where people don't have access to basic GIS tools and are unable to figure out on how to work with geospatial data formats. Particularly when you are an Analyst/Data Scientist and you have no knowledge about handling geospatial formats.
GDAL has API support for C/C++, Python and JAVA. Beside this there are various command line utility which can be used directly on geospatial formats. I'm going to discuss most common used command line utility for handling Raster and Vector Data.
GDAL Command line utility
1. gdalinfo & ogrinfo
It is always good to know about the data before even loading it on the memory and plan our analysis.
This utility helps us in getting metadata about spatial formats.
gdalbeing used for Raster format and
ogrbeing used for vector data.
gdalinfogives the output like, Spatial reference, Geographical bound, Measurement Unit and more. Check example below for a GeoTiff image.
On the other hand,
ogrinfohave various parameters to access information from vector data.
-soparamater which will only give you summary and will not load all features which is inside vector data on command prompt.
If you are looking to quickly transform raster data you can use
gdalwarp. This helps in mosaic, re-projection and resampling raster formats.
Some most common parameters to look out for in
gdalwarp would be:
- resampling method
-crop_to_cutlinefor clipping raster file.
-trfor output file resolution.
-destnodatato mask values out.
To efficiently use huge raster dataset locally or from a local network this command help us to build a virtual mosaic of raster data set.
This will create a .vrt extension file which contains XML like tagged references to all the data set provided to build a mosaic vrt.
This eventually helps us to avoid creation of huge/big mosaic from multiple data set which saves disk memory.
We can use this virtual datasets directly in our analysis without actually loading the whole mosaic and concentrate only on our area of interest.
- Also, you should check Cloud Optimized GeoTIFF which is based on the similar idea but for cloud native geospatial processing.
4. Other utilities
Some of the other utility worth looking for are:
- gdal_translate : Translate one format of raster to another.
- ogr2ogr : Convert to different vector formats, provides different parameters to perform selective conversion or create a completely new feature set.
- gdal_merge : Merge multiple raster file to one file.
- gdal_calc : Perform raster calculation on pixel.
- gdal_polygonize : Convert raster to vector format.
You can check the complete list here : GDAL Utility
- Use of GDAL command line utility always come handy as it is quick to help you know about your data, transform it and know about your data even without starting/downloading a GIS Software or even before installing/importing a Python module.
- There are many high level libraries which are built on GDAL libraries which help us dive more deep into the Geospatial Analysis. If you are experienced in Python do check the Python API cookbook for GDAL, it is amazing.
If you are going to use GDAL for first time, I would recommend you guys to use docker image to explore the API first. Here is the list of docker Images available.
Installing GDAL on Linux is quite easy, but let me warn you if you are using Windows it is going to take a while to figure out on how to setup a proper environment which does not messes up your other python environments.
Since this is my first tech blog I will also be writing more on how to work with Geospatial data with Python, share about high level python libraries to work with different Geospatial formats and Visualisation of Geospatial data. Do let me know if there is anything specific you are looking for in Geospatial Domain I can surely help or write about it.