Overview of AWS pipeline for processing TIF images for downstream analytics.

AWS Pipeline for Processing TIF Images (w/ Vegetation Masking)

Ingest
- Amazon S3: Store raw TIF images in a designated input bucket.
- AWS Lambda: Trigger the processing pipeline when new TIF is uploaded to the S3 bucket
Preprocess
- AWS Lambda: Read the TIF images using libraries like Rasterio
- Data Normalization: Apply necessary corrections (atmospheric, normalize bands)
NDVI/NDWI Calculation
- Use the Near-Infrared (NIR) and Red bands to compute NDVI
Vegetation Masking
- The Lambda function fetches the current threshold dynamically (API Gateway) and applies masking
- The threshold used will vary from site to site
  - Pixels above the threshold are classified as vegetation
  - Generate a binary mask to overlay on the original image
Data Transformation
- Convert the masked TIF data into Parquet format using PyArrow for each (row,col)
- Also record the (x,y) coordinates using the TIF file’s CRS and the indices
AWS Glue Data Crawler
- Scans processed Parquet files and updates Glue Catalog
- Schema can change– i.e., new features added
AWS Glue ETL Jobs (Further transformation)
- Further transformation by merging elevation and textures from associated TIF file
Storage (Load)
- Load back to S3 is a “processed” subfolder
Analyst Interaction
- AWS API Gateway + Lambda: Provide an API where analysts/modelers can update the NDVI threshold via POST request
Monitoring
- AWS CloudWatch: Monitor Lambda functions

This post is licensed under CC BY 4.0 by the author.