Overview of AWS pipeline for processing TIF images for downstream analytics.
AWS Pipeline for Processing TIF Images (w/ Vegetation Masking)
- Ingest
- Amazon S3: Store raw TIF images in a designated input bucket.
- AWS Lambda: Trigger the processing pipeline when new TIF is uploaded to the S3 bucket
- Preprocess
- AWS Lambda: Read the TIF images using libraries like Rasterio
- Data Normalization: Apply necessary corrections (atmospheric, normalize bands)
- NDVI/NDWI Calculation
- Use the Near-Infrared (NIR) and Red bands to compute NDVI
- Vegetation Masking
- The Lambda function fetches the current threshold dynamically (API Gateway) and applies masking
- The threshold used will vary from site to site
- Pixels above the threshold are classified as vegetation
- Generate a binary mask to overlay on the original image
- Data Transformation
- Convert the masked TIF data into Parquet format using PyArrow for each (row,col)
- Also record the (x,y) coordinates using the TIF file’s CRS and the indices
- AWS Glue Data Crawler
- Scans processed Parquet files and updates Glue Catalog
- Schema can change– i.e., new features added
- AWS Glue ETL Jobs (Further transformation)
- Further transformation by merging elevation and textures from associated TIF file
- Storage (Load)
- Load back to S3 is a “processed” subfolder
- Analyst Interaction
- AWS API Gateway + Lambda: Provide an API where analysts/modelers can update the NDVI threshold via POST request
- Monitoring
- AWS CloudWatch: Monitor Lambda functions