Background

MambaETL, also known simply as Mamba, serves as an implementation of the OpenMRS (Open Electronic Medical Records system) for Extracting, Loading, and Transforming (ETL) data into a more de-normalized format, facilitating faster data retrieval and analysis.

In the OpenMRS structure, patient observational data is stored longitudinally, resulting in multiple rows being recorded in the OpenMRS Obs table for each encounter type a patient undergoes. In some cases, a single encounter may generate as many as 50 or more rows in the Obs table.

As a consequence, the Obs table rapidly accumulates millions of records, particularly in moderately sized facilities, leading to a substantial slowdown in reporting and data analysis. Currently, there is no widely adopted community-driven solution within OpenMRS to tackle this prevalent challenge. Existing 'solutions' are often specific to particular implementations and represent in-house attempts to address a problem affecting the entire community. Unfortunately, this has resulted in significant duplication, and these solutions are frequently not robust or generic enough to effectively address the broader challenges faced by the community.

What is MambaETL?

MambaETL encompasses a set of familiar OpenMRS tools, including SQL scripts (functions & stored procedures), an automation engine with bash/shell scripts, Maven configurations, and Java programs. These components are bundled into a cohesive OpenMRS module named: openmrs-module-ohrimaba-core

This module provides ready-to-use functionalities for flattening/transposing databases and abstracting repetitive reporting tasks. This allows implementers, analysts, data scientists, or report-building teams to concentrate on creating reports without being concerned about performance bottlenecks or the intricate details of extracting data from the primary source to the reporting destination.

The Mamba core module can be deployed seamlessly, similar to any other OpenMRS module. It can function independently as a standalone module or be integrated into other (reporting) modules within a specific implementation.

Getting Started with MambaETL: A deep-dive Technical approach

Mamba is a Java library (also known as an omod) for OpenMRS, encapsulating various artifacts within a module named "openmrs-module-ohrimamba-core." It serves as an abstraction layer for the fundamental functions of MambaETL.

This library offers essential building blocks for constructing an extended ETL system for OpenMRS. It provides foundational features, and with minimal configuration, it facilitates table transposing out of the box.

Module structure

Untitled

Figure 1: Shows the project structure of the ‘openmrs-module-ohrimamba-core’ core module

The mamba core module typically contains the api and omod submodules.

Under api is a connection pool manager that will provide a way to connect to the custom analysis database.

Untitled

Figure 2: Shows the connection pool manager for connection to analysis database

The analysis database is the destination database where MambaETL engine deploys all the ETL stored procedures, views and functions. It also contains the ETL tables and views that are created when the ETL runs.

Untitled

Figure 3: shows logic for reading the analysis database configuration parameters

When setting up MambaETL in your implementation (reporting module), you need to set the global properties in the figure above for your analysis database configurations. i.e.

mambaetl.analysis.db.url

mambaetl.analysis.db.username

mambaetl.analysis.db.password

mambaetl.analysis.db.driver

api sub-module:

The api sub-module contains the Mamba generic service/API layer logic. Mamba offers a service layer through which Mamba reports can be retrieved once generated. It offers report definition and configuration capabilities to allow for report filters and report customization. More about this service layer later in this document.

Untitled

Figure 4: Shows the service layer implementation logic

It also contains the ETL scheduler. MambaETL allows for scheduling of when the ETL should run. This can be done via the Legacy UI if the Legacy module is installed. Implementers have control over when the ETL should run and perform the transposing of tables and preparation of the ETL reports.

Untitled.png

Figure 5: shows the MambaETL task in OpenMRS Legacy UI. It allows for configurations of when to run and how often

resources folder under api sub-module:

The resources folder under the api sub-module contains the most critical components of Mamba.

Untitled

Figure 6: shows an exploded folder structure of the resources folder under the api submodule

The resources folder contains the _core folder which contains 2 main folders:

compiler and database folders.

The compiler folder contains the bash/shell scripts that compile the rest of the SQL scripts for the target system either Linux or Windows.

resources/_core/database

Under the database folder each of the contained folders represents a supported database engine with all the compliant SQL scripts and is a replication of the other but specifically targeting the given database.

Untitled

Figure 7: shows contents of the database folder under resources

Contents of the database folder contain the different database engines representing logic specific to the engine.

We will dive deep into the mysql folder (engine):

Untitled

Figure 8: shows the contents of mysql - one of the supported databases

mysql/dimensions

We leverage the star schema approach, so we have defined a folder called dimensions that mainly contains attribute data or descriptive data about the Facts or Encounters - we will look at Facts later. Dimension data examples include but not limited to; concepts, persons etc.

If you are familiar with the OpenMRS data model you will quickly be able to identify most of the dimensional data in the diagram below.

Untitled

Figure 9: shows the contents of the dimension folder

Each of the dimension attribute data folder is prefixed with dim_ as a convention rather than a requirement.

We will look now in depth at one of the dimension data attributes in this case concept under the dim_concept folder.

Untitled

Figure 10: shows contents of the dim_concept folder

Under each of these dim_* folders are usually up-to 4 SQL compliant scripts, namely;

sp_mamba_dim_*.sql

sp_mamba_dim_*_create.sql

sp_mamba_dim_*_insert.sql

sp_mamba_dim_*_update.sql

NB: Replace the * with the dimension data table name, in our example concept

sp_mamba_dim_concept.sql

This is the main script that calls all the other scripts under the dim_concept folder. Its main purpose is to invoke in cascading order of execution the given scripts.

Untitled

Figure 11: shows the contents of the sp_mamba_dim_concept.sql script

This script just like all the other scripts under the dim_* folders begins with a -- $BEGIN tag and ends with -- $END tag.

These tags are specific to the Mamba ETL engine and are interpreted at build time by the Mamba ETL compiler. Their main purpose is to to define the points at which the compiler inserts boiler plate code which automatically converts each script into an SQL compliant Stored Procedure that can be invoked.

See below an example conversion of the above script:

Untitled

Figure 12: shows the new version of the script in fig viii once processed

As you can see from the figure above this is a fully compliant MYSQL stored procedure script. The compiler at build time has been able to insert some SQL code before and after the tags to make the previous script a syntactically correct MYSQL Stored Procedure. Later in this document, we will be looking in-depth at the build process and the build script to see how this works.

sp_mamba_dim_concept_create.sql This script is supposed to re-create the dimension table in question. It is a replication of the source/transactional table but with only the relevant fields. The table is created in the target/destination data source.

In the example below, the sp_mamba_dim_concept_create.sql contains logic for creating a table named mamba_dim_concept and adds some indexes.

Untitled

Figure 13: shows SQL logic for creating a mamba_dim_concept table

sp_mamba_dim_concept_insert.sql After the table above has been created, this script illustrated below inserts data from the source data-source table equivalent into this table when invoked.

Untitled

Figure 14: shows SQL logic for inserting into a mamba_dim_concept table

NB: concept is the source table.

sp_mamba_dim_concept_update.sql

Sometimes you may need to do some modifications or updates on a dim table. This script is designed for that specific purpose.

In the example above, the mamba_dim_concept table datatype is updated with the datatype_name from another table.

Untitled