Skip to content

DBgen

DBgen

Test Publish

Package version


Documentation: https://dbgen.modelyst.com

Github: https://github.com/modelyst/dbgen


❗ Please note that this project is actively under major rewrites and installations are subject to breaking changes.


DBgen (Database Generator) is an open-source Python library for connecting raw data, scientific theories, and relational databases. The package was designed with a focus on the developer experience at the core. DBgen was initially developed by Modelyst.

What is DBgen?

DBgen was designed to support scientific data analysis with the following characteristics:

  1. Transparent

    • Because scientific efforts ought be shareable and mutually understandable.
  2. Flexible

    • Because scientific theories are under continuous flux.
  3. Maintainable

    • Because the underlying scientific models one works with are complicated enough on their own, we can't afford to introduce any more complexity via our framework.

DBGen is an opinionated ETL tool. While many other ETL tools exist, they rarely give the tools necessary for a scientific workflow. DBGen is a tool that helps populate a single postgresql database using a transparent, flexible, and mainatable data pipeline.

Alternative tools

Many tools exist to orchestrate python workflows. However, these tools often often are too general to help the average scientist wrangle their data or are so specific to storing a given computational workflow type they lack the flexibility needed to address the specifics of a scientist's data problems. Many other tools also come packaged with powerful yet complex scheduling systems (such as airflow and prefect) that can be quite complex to setup and can make the initial development very difficult for scientists without extensive devops experience.

General Orchestration Tools

  1. Airflow
  2. Prefect
  3. Luigi

Computational Science Workflow Tools

  1. Fireworks
  2. AiiDA
  3. Atomate

What isn't DBgen?

  1. An ORM tool (see Hibernate for Java or SQLAlchemy for Python)

  2. DBGen utilizes the popular SQLAlchemy ORM to operate at an even higher level of abstraction, allowing the users to build pipelines and schema without actively thinking about the database tables or SQL insert and select statements required to populate the database.

  3. A database manager (see MySQLWorkbench, DBeaver, TablePlus, etc.)

  4. An tool that can only be used with specific schemas.