sntutils is an R package developed by AHADI to support the Subnational
Tailoring (SNT) of malaria interventions. It provides utility functions
that standardize and streamline data preparation, cleaning, management,
visualization, and analysis, facilitating evidence-based decision-making
at district level or below. This is an overview of the available
functions in this version of sntutils:
| Category | Function | Description |
|---|---|---|
| Data Import/Export | read() |
Reads data from various file formats (CSV, Excel, Stata, RDS, shp) |
write() |
Exports data to various file formats | |
| Download Chirps Data | download_chirps() |
Downloads monthly CHIRPS rainfall rasters for a given region and date range |
| Project Structure | create_data_structure() |
Creates AHADI-style hierarchical data folders under 01_data/ |
initialize_project_structure() |
Sets up full project folder structure with data, scripts, outputs, and reports | |
| Date Handling | autoparse_dates() |
Automatically detects and standardizes various date formats |
available_date_formats |
List of supported date formats for parsing | |
| Geolocation Name Cleaning | prep_geonames() |
Standardizes administrative names across different levels |
| Data Extraction | process_raster_collection() |
Extract values from multiple rasters against ashapefile |
| Reporting Rate Checks | calculate_reporting_metrics() |
Aggregates facility reporting/missing rates over time and space |
reporting_rate_plot() |
Visualizes reporting/missing rates by two variables | |
| Outlier Detection | detect_outliers() |
Flags outliers in a numeric column using mean ± 3 SD, Median, and Tukey’s IQR methods |
outlier_plot() |
Generates time‐series plots of flagged outliers (faceted by admin area, colored by method) | |
| Consistency Checks | consistency_check() |
Validates logical coherence in malaria care cascade (inputs >= outputs) |
| Translation | translate_text() |
Translates text with persistent file cache |
translate_text_vec() |
Vectorized version of translate_text function |
|
translate_yearmon() |
Converts date to yearmon format with month names in multiple langs | |
| Image Processing | compress_png() |
Reduces PNG file size while maintaining quality |
| Numeric Utilities | big_mark() |
Formats numbers with thousand separators |
sum2() |
Sum with automatic NA removal | |
mean2() |
Mean with automatic NA removal | |
median2() |
Median with automatic NA removal | |
| Hashing Utilities | vdigest() |
Vectorized version of digest::digest function |
The package can be installed using pak in R. The steps are as follows:
# 1) Install pak if you haven't already
install.packages("pak")
# 2) Install the sntutils package from GitHub
pak::pkg_install("ahadi-analytics/sntutils")The read() and write() functions provide a simplified interface for
importing and exporting data in various formats, inspired by the rio
package.
# Load the sntutils package
library(sntutils)
# Import data in various formats
df_csv <- read("path/to/file.csv", sep = ",")
df_excel <- read("path/to/file.xlsx", sheet = 1)
df_excel2 <- read("path/to/file.xls", sheet = 1)
df_stata <- read("path/to/file.dta")
df_spss <- read("path/to/file.sav")
df_rds <- read("path/to/file.rds")
# Import spatial data
sf_geojson <- read("path/to/file.geojson")
sf_shapefile <- read("path/to/file.shp")
# Export data to different formats
write(df, "path/to/export.csv")
write(df, "path/to/export.xlsx")
write(df, "path/to/export.xls")
write(df, "path/to/export.dta")
write(df, "path/to/export.rds")
# Export spatial data
write(sf_data, "path/to/export.shp")
write(sf_data, "path/to/export.geojson")
# Export multiple datasets as sheets in Excel
write(
list(data1 = df1, data2 = df2, data3 = df3),
"path/to/multi_sheet.xlsx"
)The download_chirps() function allows you to fetch CHIRPS monthly
rainfall raster data for any supported region and time period. It pulls
data directly from the UCSB Climate Hazards
Group FTP archive and supports
automatic unzipping. Only .tif.gz monthly rasters are supported, and
the function avoids re-downloading existing files. To view all supported
CHIRPS datasets, use chirps_options(). To check the available years
and months for a specific CHIRPS dataset (e.g., africa_monthly), use the
check_chirps_available() function.
# View available CHIRPS datasets
chirps_options()
#># A tibble: 4 × 4
#> dataset frequency label subdir
#> <chr> <chr> <chr> <chr>
#>1 global_monthly monthly Global (Monthly) global_monthly/tifs
#>2 africa_monthly monthly Africa (Monthly) africa_monthly/tifs
#>3 camer-carib_monthly monthly Caribbean & Central America (Monthly) camer-carib_monthly/tifs
#>4 EAC_monthly monthly East African Community (Monthly) EAC_monthly/tifs
# check available years and months for the africa_monthly
check_chirps_available(dataset_code = "africa_monthly")
#> ✔ africa_monthly: Data available from Jan 1981 to Mar 2025.
#># A tibble: 531 × 4
#> file_name year month dataset
#> <chr> <chr> <chr> <chr>
#> 1 chirps-v2.0.2025.01.tif.gz 2025 01 africa_monthly
#> 2 chirps-v2.0.2025.02.tif.gz 2025 02 africa_monthly
#> 3 chirps-v2.0.2025.03.tif.gz 2025 03 africa_monthly
#> 4 chirps-v2.0.2024.01.tif.gz 2024 01 africa_monthly
#> 5 chirps-v2.0.2024.02.tif.gz 2024 02 africa_monthly
#> 6 chirps-v2.0.2024.03.tif.gz 2024 03 africa_monthly
#> 7 chirps-v2.0.2024.04.tif.gz 2024 04 africa_monthly
#> 8 chirps-v2.0.2024.05.tif.gz 2024 05 africa_monthly
#> 9 chirps-v2.0.2024.06.tif.gz 2024 06 africa_monthly
#>10 chirps-v2.0.2024.07.tif.gz 2024 07 africa_monthly
# Download Africa monthly rainfall for Jan to Mar 2022
download_chirps(
dataset = "africa_monthly",
start = "2022-01",
end = "2022-03",
out_dir = "data/chirps"
)This will download the following files to the data/chirps/ folder (and unzip them if requested):
-
chirps-v2.0.2022.01.tif -
chirps-v2.0.2022.02.tif -
chirps-v2.0.2022.03.tif
Two functions are provided to help set up a consistent, hierarchical folder structure for SNT projects following AHADI’s recommended layout. Each key data domain in 01_data/ includes two subfolders: raw/ for storing the original, untouched data as received, and processed/ for storing cleaned or transformed versions ready for analysis. This structure is applied consistently across all domains.
create_data_structure()
# Create only the data structure under 01_data/
create_data_structure(base_path = ".")01_data/
├── 1.1_foundational/
│ ├── 1.1a_admin_boundaries/
│ ├── 1.1b_health_facilities/
│ └── 1.1c_population/
│ ├── 1.1ci_national/
│ └── 1.1cii_worldpop_rasters/
├── 1.2_epidemiology/
│ ├── 1.2a_routine_surveillance/
│ ├── 1.2b_pfpr_estimates/
│ └── 1.2c_mortality_estimates/
├── 1.3_interventions/
├── 1.4_drug_efficacy_resistance/
├── 1.5_environment/
│ ├── 1.5a_climate/
│ ├── 1.5b_accessibility/
│ └── 1.5c_land_use/
├── 1.6_health_systems/
│ └── 1.6a_dhs/
├── 1.7_entomology/
├── 1.8_commodities/
02_scripts/
03_outputs/
│ └── plots/
04_reports/
metadata_docs/
initialize_project_structure()
Sets up the full AHADI project structure, including organized folders for data, scripts, outputs, reports, and metadata. This structure is purposefully designed to support the full analytical workflow by ensuring that every project component has a clear, dedicated place. This organization makes it straightforward to locate files, reduces confusion, and ensures the project remains traceable, reproducible, and easy to maintain from start to finish.
# Initialize full project structure at specified path
initialize_project_structure(base_path = "my_snt_project")my_snt_project/
├── 01_data/
│ └── [Hierarchical data folders as above]
├── 02_scripts/
├── 03_outputs/
│ └── plots/
├── 04_reports/
└── metadata_docs/
The autoparse_dates() function parses and standardizes date columns in
a data frame, ensuring consistency in date formats. This is particularly
useful when working with datasets containing multiple date formats or
ambiguous date entries.
# Example with mixed date formats
df <- data.frame(
mixed_dates = c("2023-10-03", "11.09.2022", "25-12-21 23:59", "2020-08-15T00:00:00Z"),
iso8601_dates = c("2021-03-20T00:01:00.513+01:00", "2022-11-05T23:15:59.123Z")
)
# Parse dates to standard format
parsed_df <- autoparse_dates(
data = df,
date_cols = c("mixed_dates", "iso8601_dates"),
output_format = "%Y-%m-%d"
)
parsed_df$mixed_dates
#> [1] "2023-10-03" "2022-09-11" "2021-12-25" "2020-08-15"
parsed_df$iso8601_dates
#> [1] "2021-03-20" "2022-11-05"
# With custom format output
parsed_df <- autoparse_dates(
data = df,
date_cols = c("mixed_dates", "iso8601_dates"),
output_format = "%d/%m/%Y"
)
parsed_df$mixed_dates
#> [1] "03/10/2023" "11/09/2022" "25/12/2021" "15/08/2020"The prep_geonames() function combines algorithmic matching with user
interactivity to clean and standardize administrative names. It uses
string distance calculations for initial matching and allows users to
make final corrections interactively, with all decisions saved for
future use. The function supports a user-provided lookup dataset as a
reference or defaults to internal WHO geonames data if no lookup is
provided. Additionally, it supports hierarchical stratification across
up to six administrative levels. It also caches user decisions to
improve consistency and efficiency in subsequent sessions. For users who
prefer to run the code without interactivity, the function can be
executed with interactive = FALSE.
# Example data with inconsistent admin names
dhis2_dummy <- data.frame(
country = c("ANGOLA", "UGA", "ZAMBIA", "KEN"),
province = c("CABONDA", "TESO", "LUSAKA", "NAIROBY"),
district = c("BALIZE", "BOKEDEA", "RAFUNSA", "KIBRA")
)
# custom lookup data
my_lookup <- data.frame(
country = c("Angola", "Uganda", "Zambia", "Kenya"),
province = c("Cabinda", "Teso", "Lusaka", "Nairobi"),
district = c("Belize", "Bukedea", "Rufunsa", "Kibera")
)
# Harmonize admin names (interactive mode)
cleaned_df <- prep_geonames(
target_df = dhis2_dummy,
lookup_df = my_lookup,
level0 = "country",
level1 = "province",
level2 = "district",
interactive = TRUE
)Here is a short video to demonstrate the full interactivity of
prep_geonames:
prep_geoname_demo.mov
The process_raster_collection() function automates zonal statistics
extraction across multiple raster files (e.g., CHIRPS monthly rainfall
.tif files). It detects dates from filenames, aligns CRS between
rasters and shapefiles, and computes statistics like mean, sum, or
median using exact geometry-aware extraction. It returns a tidy data
frame indexed by administrative unit and time.
This is especially useful for climate data workflows that require aggregating high-resolution rasters to subnational geographies over time.
# Dummy example — replace with your shapefile and actual raster directory
adm3_shp <- sf::st_read(system.file("extdata", "sle_adm3_example.geojson",
package = "sntutils"))
# Raster files must be named with detectable dates
# (e.g. africa_monthly_chirps-v2.0.2023.05.tif)
# For test purposes, use sample files stored in test fixtures:
raster_dir <- system.file("extdata", "chirps_test_rasters",
package = "sntutils")
# Process rasters and extract mean rainfall
rainfall_df <- sntutils::process_raster_collection(
directory = raster_dir,
shapefile = adm3_shp,
id_cols = c("adm0", "adm1", "adm2", "adm3"),
aggregations = c("mean"),
pattern = "\.tif$"
)
rainfall_df
#> file_name adm1 adm2 adm3 year month mean
#>1 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN DEA 2020 1 12.871692
#>2 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN JAHN 2020 1 9.820749
#>3 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN JAWIE 2020 1 12.042542
#>4 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN KISSI KAMA 2020 1 8.951293
#>5 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN KISSI TENG 2020 1 8.494733
#>6 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN KISSI TONGI 2020 1 9.013873
#>7 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN KPEJE BONGRE 2020 1 10.587007
#>8 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN KPEJE WEST 2020 1 9.824939
#>9 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN LUAWA 2020 1 11.760721
#>10 africa_monthly_chirps-v2.0.2020.01.tif EASTERN KAILAHUN MALEMA 2020 1 13.839043The sntutils::calculate_reporting_metrics() function computes the
completeness of routine health data reporting by assessing whether
health facilities have submitted valid data for a specified set of
indicators (key_indicators) over time. It then calculates reporting
rates for one or more target indicators (vars_of_interest) based on
this information.
Scenario 1: Facility-Level Reporting/Missing Rate
This scenario calculates the proportion of active facilities that reported any data (across a specified set of variables) for each time-unit and geographic group (e.g. year-month by district).
A facility is counted as reporting (vars_of_interest is non-missing. It is included in the
denominator (key_indicators at or before that year-month. This
ensures facilities are only expected to report after becoming active.
Formula
Let:
-
$a$ = administrative unit (e.g. district, LGA, region) -
$t$ = time period (e.g. year-month, such as “2022-03”) -
$f$ = a health facility in administrative unit$a$ -
key_indicators= a set of variables used to determine whether a facility is active e.g. “test”, “treat”, “conf”, “pres”, “allout” -
vars_of_interest= variables of interest that the reporting rate will be calculated for e.g. “conf”, “pres”
For each administrative unit
Where:
-
$o_{a,t}$ (observed) = number of facilities in$a$ that reported any value invars_of_interestduring$t$ -
$e_{a,t}$ (expected) = number of facilities in$a$ whose first-ever report on anykey_indicatorsoccurred on or before$t$ (i.e. expected to report)
This filtering avoids overestimating non-reporting rates by excluding newly opened or late-starting facilities from the denominator in earlier periods.
Worked example
Suppose we are calculating the reporting rate for district
Let:
-
$K$ be the set of key indicators (key_indicators) -
$v$ be the variable of interest (vars_of_interest) (e.g., “conf”)
Observed data:
- 6 facilities in total
- 6 facilities have reported at least once on any
$k \in K$ on or before March - 4 of those 6 reported on variable
$v = \text{conf}$ in March
Now to implement this in code:
# Example data with inconsistent admin names
sl_dhis2 <- readRDS("inst/extdata/sl_exmaple_dhis2.rds") |>
dplyr::rename(year_mon = date) |>
dplyr::filter(year_mon >= "2020.01") |>
dplyr::mutate(
# Generate consistent HF IDs
hf_uid = sntutils::vdigest(
paste0(adm1, adm2, hf),
algo = "xxhash32"
),
# Generate consistent HF-date IDs
record_id = sntutils::vdigest(
paste(hf_uid, year_mon),
algo = "xxhash32"
)
)
# Calculate monthly reporting rates by district
calculate_reporting_metrics(
data = sl_dhis2,
vars_of_interest = c("conf", "pres"), # Variables to check if a facility reported
x_var = "year_mon", # Temporal unit: year-month
y_var = "adm2", # Spatial unit: district
hf_col = "hf_uid", # Health facility ID column
key_indicators = c( # Used to determine denominator
"allout", "test", "treat",
"conf", "pres")
)Attaching package: 'sntutils'
The following object is masked from 'package:base':
write
# A tibble: 6 × 6
year_mon adm2 rep exp reprate missrate
<chr> <chr> <int> <int> <dbl> <dbl>
1 2023-12 Moyamba District Council 106 108 0.981 0.0185
2 2023-12 Port Loko City Council 2 2 1 0
3 2023-12 Port Loko District Council 99 103 0.961 0.0388
4 2023-12 Pujehun District Council 96 104 0.923 0.0769
5 2023-12 Tonkolili District Council 109 115 0.948 0.0522
6 2023-12 Western Area Rural District Council 62 64 0.969 0.0312
Scenario 2: Reporting/Missing Rate by Two Dimensions
This scenario calculates the frequency of valid (non-missing, non-zero) reports across two grouping variables (e.g., time period and location) for specified variables of interest:
# Calculate reporting rates by date and district
calculate_reporting_metrics(
data = sl_dhis2,
vars_of_interest = c("conf", "pres"), # Confirmed and presumptive malaria cases
x_var = "year_mon", # Time dimension
y_var = "adm2" # Geographic dimension
)# A tibble: 6 × 7
year_mon adm2 variable exp rep reprate missrate
<chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 2021-01 Bo City Council conf 39 28 0.718 0.282
2 2021-01 Bo City Council pres 39 28 0.718 0.282
3 2021-01 Bo District Council conf 129 113 0.876 0.124
4 2021-01 Bo District Council pres 129 113 0.876 0.124
5 2021-01 Bombali District Council conf 81 73 0.901 0.0988
6 2021-01 Bombali District Council pres 81 73 0.901 0.0988
Scenario 3: Reporting/Missing Rates Over Time
This scenario calculates reporting data rates along one key dimension—typically time, making it useful for identifying when different variables are reported and spotting gaps over time.
# Evaluate reporting completeness over time
calculate_reporting_metrics(
data = sl_dhis2,
vars_of_interest = c("conf", "pres", "test"), # Key indicators
x_var = "year_mon" # Time dimension only
)# A tibble: 6 × 6
year_mon variable exp rep reprate missrate
<chr> <chr> <int> <int> <dbl> <dbl>
1 2021-01 conf 1702 1354 0.796 0.204
2 2021-01 pres 1702 1356 0.797 0.203
3 2021-01 test 1702 1355 0.796 0.204
4 2021-02 conf 1702 1351 0.794 0.206
5 2021-02 pres 1702 1356 0.797 0.203
6 2021-02 test 1702 1351 0.794 0.206
The reporting_rate_plot() function plots the reporting rates for
health facility data, making it easy to identify patterns, gaps, and
trends in data completeness across time and geographic areas. This
visualization function works with health facility data, using
calculate_reporting_metrics() internally to support all three
reporting scenarios discussed above.
Scenario 1: Facility-Level Reporting/Missing Rate
reporting_rate_plot(
data = sl_dhis2,
vars_of_interest = "conf", # Variables to check if a facility reported
x_var = "year_mon", # Temporal unit: year-month
y_var = "adm2", # Spatial unit: adm3
hf_col = "hf_uid", # Health facility ID column
key_indicators = c( # Used to determine denominator
"allout", "test", "treat",
"conf", "pres")
)Scenario 2: Reporting/Missing Rate by Two Dimensions
reporting_rate_plot(
data = sl_dhis2,
vars_of_interest = "conf", # Variables to check if a facility reported
x_var = "year_mon", # Temporal unit: year-month
y_var = "adm2", # Spatial unit: adm3
target_language = "fr" # This time we translate it to French
)Scenario 3: Reporting/Missing Rates Over Time
# get the variables of interest
vars <- c(
"test", "test_u5", "test_5_14", "test_ov15",
"susp", "susp_u5", "susp_5_14", "susp_ov15",
"pres", "pres_u5", "pres_5_14", "pres_ov15",
"conf", "conf_u5", "conf_5_14", "conf_ov15",
"maltreat", "maltreat_u5", "maltreat_5_14", "maltreat_ov15"
)
reporting_rate_plot(
sl_dhis2,
full_range = FALSE, # use actual data range
vars_of_interest = vars, # Variables to check if reported overtime
x_var = "year_mon", # Temporal unit: year-month
target_language = "fr" # This time we translate it to French
)The consistency_check() function identifies and visualizes logical
inconsistencies in malaria surveillance data by validating that
upstream events (inputs) are greater than or equal to downstream
events (outputs) in the care cascade. This is useful for data quality
assessment.
Common validation checks include:
- All outpatients >= suspected malaria
- Malaria tests >= confirmed cases
- Confirmed cases >= cases treated
- All admissions >= malaria admissions
- Malaria admissions >= malaria deaths
# Check consistency between tests (inputs) and confirmed cases
# (outputs)
consistency_check(
sl_dhis2,
inputs = c("test"),
outputs = c("conf")
)
# Save the plot
consistency_check(
sl_dhis2,
inputs = c("test"),
outputs = c("conf"),
save_plot = TRUE,
plot_path = "plots/consistency_check_plots"
)
# With translated labels (French)
consistency_check(
sl_dhis2,
inputs = c("test"),
outputs = c("conf"),
target_language = "fr"
)
# Multiple cascade validations at once
consistency_check(
sl_dhis2,
inputs = c("test", "conf", "alladm"),
outputs = c("conf", "maltreat", "maladm"),
target_language = "fr"
)The detect_outliers() function helps identify unusual values in
numeric variables using three complementary statistical methods:
- Mean ± 3 SD (parametric approach)
- Median Identifier (median ± 15 × MAD, robust to extreme values)
- Tukey’s Fences (based on IQR, with adjustable sensitivity)
Outliers are assessed within groups defined by administrative area (adm1, adm2), health facility, and year. This grouping ensures context-sensitive detection, especially for health data varying by region and time.
The function returns a data frame with the record ID, the variable of interest, and whether each method flags the value as an outlier. It also includes bounds used by each method for transparency.
outlier_results <- detect_outliers(
data = sl_dhis2,
column = "conf", # The var to check for outliers
yearmon = "year_mon", # Column showing the time like year and month
record_id = "record_id", # Unique row ID
adm1 = "adm1", # First-level admin area (e.g. region)
adm2 = "adm2", # Second-level admin area (e.g. district)
iqr_multiplier = 2 # Controls how strict the outlier check is for IQR test
)The detect_outliers() function returns a table with outlier results
for each row in your dataset. The key columns of interest are record_id,
the value being checked, and the method-specific flags: outliers_iqr,
outliers_median, and outliers_mean. Each method marks the value as
either “outlier” or “normal value”. You can join this output back to
your original data using record_id to flag values for review or action.
outlier_results |>
dplyr::select(
record_id, value,
outliers_iqr,
outliers_median,
outliers_mean) |>
tail()# A tibble: 6 × 5
record_id value outliers_iqr outliers_median outliers_mean
<chr> <dbl> <chr> <chr> <chr>
1 e8947016 321 normal value normal value normal value
2 28b6ea90 353 normal value normal value normal value
3 8aa281d9 246 normal value normal value normal value
4 8b337b53 305 normal value normal value normal value
5 7358f600 284 normal value normal value normal value
6 499f0390 309 normal value normal value normal value
The outlier_plot() function builds on detect_outliers() to generate
time series plots that help visualize where and when outliers occur in
your data. Each method returns a separate ggplot object, with points
colored by whether they were flagged as “outlier” or “normal value”. The
plots are faceted by district (adm2), and facet labels summarize the
percentage of outliers in each group.
# Generate the outlier plots
plots <- sntutils::outlier_plot(
data = sl_dhis2,
column = "conf",
record_id = "record_id",
adm1 = "adm1",
adm2 = "adm2",
yearmon = "year_mon",
methods = c("iqr", "median", "mean")
)IQR method
plots$iqrMedian method
plots$medianMean method
plots$meanIn cases where output file size—such as for PDFs or Word
documents—becomes a concern, compressing images can significantly reduce
size without noticeably affecting quality. The compress_png() function
helps with this by reducing PNG file sizes while preserving visual
fidelity.
Both reporting_rate_plot() and consistency_check() include built-in
support for image compression during saving. Additionally, users can
manually compress individual PNGs or entire folders using
compress_png():
# Compress a single PNG file
compress_png(
"path/to/large_image.png",
output_path = "path/to/consistency_plot.png"
)
#> ── Compression Summary ──
#>
#> ✔ Successfully compressed: consistency_plot.png
#> ℹ Total compression: 200.21 KB (71.54% saved)
#> ℹ Excellent compression!
#>
#> ── File Size
#> Before compression: 279.87 KB
#> After compression: 79.66 KB
# Compress all PNGs in a directory
compress_png(
"path/to/image_folder/",
output_path = "path/to/compressed_folder/"
verbose = TRUE
)The translate_text() function uses Google Translate API through the
gtranslate package and implements a sophisticated caching system to
improve efficiency and consistency for future usage:
# Translate a single text from English to French
translate_text("Reporting rate by district",
target_language = "fr",
source_language = "en")
#> "Taux de rapportage par district"
# Translate with custom cache location
translate_text("Malaria cases",
target_language = "pt", # Portuguese
cache_path = "~/translation_cache")
#> "Casos de malária"For bulk translation of multiple strings, the vectorized version
translate_text_vec() offers better performance and works easily with
data frames when used in a piped workflow:
library(dplyr)
df <- tibble::tibble(
label = c("Confirmed cases", "Presumed cases", "Tests performed")
)
df |>
dplyr::mutate(label_es = translate_text_vec(label, target_language = "es"))# A tibble: 3 × 2
label label_es
<chr> <chr>
1 Confirmed cases Casos confirmados
2 Presumed cases Casos presuntos
3 Tests performed Pruebas realizadas
When working with time series data, properly formatting dates in the
local language improves report readability. The translated_yearmon()
function supports this by using locale-aware month-year formatting:
# Convert dates to localized month-year format
dates <- seq(as.Date("2022-01-01"), as.Date("2022-03-01"), by = "month")
# French localized dates
translated_yearmon(dates, language = "fr")
#> [1] "janv. 2022" "févr. 2022" "mars 2022"
# Full month names in Spanish
translated_yearmon(dates, language = "es", format = "%B %Y")
#> [1] "enero 2022" "febrero 2022" "marzo 2022"These translation functions are integrated throughout the package,
allowing functions like reporting_rate_plot() and
consistency_check() to generate outputs in the users preferred
language through their target_language parameter.
Several helper functions make working with numeric data easier:
# Format numbers with thousands separator
big_mark(1234567.89)
#> [1] "1,234,567.89"
big_mark(c(1234.56, 7890123.45), decimals = 1, big_mark = " ")
#> [1] "1 234.6" "7 890 123.5"
# NA-safe numeric functions
sum2(c(1, 2, NA, 4)) # Sum with automatic NA removal
#> [1] 7
mean2(c(1, 2, NA, 4)) # Mean with automatic NA removal
#> [1] 2.333333
median2(c(1, 2, NA, 4, 5)) # Median with automatic NA removal
#> [1] 3The vdigest() function provides a vectorized implementation of the
digest::digest() function, making it efficient to generate hash values
for entire columns or vectors in a data frame. This is particularly
useful for creating unique identifiers, tracking data changes, or
anonymizing sensitive information.
sl_dhis2 |>
dplyr::distinct(adm3) |>
dplyr::mutate(
# Hash personal identifiers
adm3_hash = vdigest(adm3)
) |> head()# A tibble: 6 × 2
adm3 adm3_hash
<chr> <chr>
1 Bo City c810b59ec12efb2ac8b5cc84f46857ce
2 Kakua Chiefdom 27fd84f751fac150c2f8a8f42b71c3da
3 Baoma Chiefdom 462ef3c87dc9b40b2ec2e0e0a54dd63e
4 Valunia Chiefdom df394518e6987ed686d76e83a409f090
5 Bagbwe Chiefdom 3aa7a61247e34ab397ff813fe520c8b7
6 Wonde Chiefdom 196dc9792e2038b41411ec2afae37e61
Contributions to sntutils are welcome! Please feel free to submit
issues or pull requests on our GitHub
repository.
This package is licensed under CC BY 4.0.






