Python project for analyzing atmospheric CO₂ and surface temperature anomalies, predicting future statistic. It reads raw and processed datasets, creates combined dataframes, generates visualizations for climate research and exploration, trains a Gradient Boosting regression to predict future temperature anomalies.
OCO2 GES DISC, NASA L2 : Column-averaged CO₂ (XCO₂) measurements with temporal and geospatial metadata. The data was transformed from HDF5 format into a processable dataframe for analysis.
MODIS Land Cover Type (MCD12Q1) : Global land cover types at yearly intervals with geospatial metadata using supervised classifications of MODIS Terra and Aqua reflectance data.
GISTEMP, NASA : Surface temperature anomalies with temporal and geospatial metadata. Values are expressed in K.
A Gradient Boosting regression model was trained to predict future surface temperature anomalies using historical temperature anomalies, column‑averaged CO₂ (XCO₂) levels, and MODIS land cover types as input features. The analysis and model-building are implemented in the notebook ml_data_analysis.ipynb and use scikit‑learn for modeling and evaluation.
Model experiments, parameters, metrics, and artifacts are tracked with MLflow and can be inspected via the MLflow UI.
- Notebook:
notebooks/ml_data_analysis.ipynb - Inputs: historical anomalies, XCO₂, land cover features
- Model:
GradientBoostingRegressor(scikit-learn) with hyperparameter search and validation - Tracking: MLflow (default store:
./notebooks/mlruns) - To view results: run
mlflow ui --backend-store-uri ./notebooks/mlruns --port 5000and open http://localhost:5000
Trained model artifacts and exported model files are saved with the experiment artifacts (see MLflow UI for locations and detailed run metadata).
Data exploration notebooks in land_type_exploration/ download and process MODIS hdf files (2024-2025 satellite readings). Processed data optionally exported to data/processed/land_cover_types.parquet after cell execution.
| Land Cover Types | Land Cover Types Interactive Map |
|---|---|
![]() |
![]() |
Static land cover projection with matplotlib, cartopy, geopandas |
Detailed exploration of land cover types using lonboard, geopandas |
Several visualizations are generated from OCO2 CO₂ measurements:
| Areas with Highest CO₂ Concentrations | Areas with Lowest CO₂ Concentrations |
|---|---|
![]() |
![]() |
| Areas with Highest CO₂ Concentrations (>425 ppm) | Areas with Lowest CO₂ Concentrations (<417 ppm) |
XCO₂ represents the column-averaged CO₂ concentration from ground to upper atmosphere (~60km), measured in parts per million (ppm).
Data exploration notebooks in co2_data_exploration/ download and process NASA L2 nc4 files (2024-2025 satellite readings) with configurable data volume limits.
Processed data optionally exported to data/processed/co2.parquet after cell execution.
Temperature anomaly data is clustered by geographic proximity using the K-Means algorithm. Locations within a user-specified latitude/longitude range are grouped into a configurable number of clusters (default: 5). This allows for exploration of regional anomaly patterns over time.
The figure legend indicates the approximate geographic centroid of each cluster.
Data exploration notebooks in tempanomalies_exploration/ download and process GISTEMP nc files (2024-2025 satellite readings). Processed data optionally exported to data/processed/tempanomalies.parquet after cell execution.
Prerequisites:
- Conda package manager
- Create environment from file:
conda env create -f environment.yml - Activate environment:
conda activate climate_analysis
- From project root:
fastapi run src/api.py- server should start at
http://127.0.0.1:8000
This project allows local storage option or creating AWS infrastructure with S3 bucket through Terraform. This would allow much greater performance and possibility to run model more efficently.
You should have your aws cli credentials available locally (aws profile or key/secret pair as environment variables, see AWS docs for more details).
terraform and docker should be installed as well.
Alternatively you could fork repository and simply provide different
role-to-assume: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<ROLE_NAME> in terraform.yml, that way GitHub Actions will create infrastructure for you.
- From
infradirectory:terraform initterraform plan- ensure all planned services are ok for youterraform apply- docker image in AWS ECR will be created, S3, lambda and required permissions.
Typical usage:
- From project root:
python src/main.py- Example with common options:
python src/main.py --year-range 5 --lon -122.4194 --lat 37.7749 --loc-range 10
Notes:
- Run
python src/main.py --helpto see all supported arguments - Key arguments:
--year-range: Number of years to analyze (default: 1)--lon: Longitude coordinate to center analysis (optional)--lat: Latitude coordinate to center analysis (optional)--loc-range: Range in degrees around location coordinates (default: 10)
- Output figures are written to
outputs/plots/by default
Prerequisites:
- JupyterLab is included in environment.yml
- Ensure conda environment is activated:
conda activate climate_analysis
Start JupyterLab with the project config:
- From project root:
jupyter lab --config=.jupyter/jupyter_lab_config.py
- From project root:
mlflow ui --backend-store-uri ./notebooks/mlruns --port 5000









