Skip to content

Defining the Schema

Introduction

In this tutorial, we walk through a series of lessons that demonstrate the usage of DBgen's core functionality.

In this tutorial, Alice and Bob run a lab in which researchers make temperature measurements. First, we'll add the researchers' names into the database from a CSV. Next, we'll show how to use custom parsers to import temperature data stored in a directory of text files. Finally, we'll show a very simple analysis of the data: converting the temperature from F to C.

Full Files

The code snippets in this section are taken from schema.py. The full file is shown below:

alice_bob_model/schema.py
from typing import Optional

from dbgen import Entity, IDType


class Person(Entity, table=True):
    __tablename__ = "person"
    __identifying__ = {"first_name", "last_name"}
    first_name: str
    last_name: str
    age: Optional[int]


class TemperatureMeasurement(Entity, table=True):
    __tablename__ = "temperature_measurement"
    __identifying__ = {"person_id", "ordering"}
    temperature_F: Optional[float]
    temperature_C: Optional[float]
    ordering: Optional[int]
    person_id: IDType = Person.foreign_key()

Entities

Using DBgen, entities are defined by subclassing dbgen.core.entity.Entity with table=True.

Let's define a table called Person to store information about Alice and Bob's researchers. In the example below, we define the Person table with just three attributes: first_name, last_name, and age.

DBgen Entity is a subclass of Pydantic's SQLModel. Therefore, the syntax for adding attributes to DBgen entities is the same as adding properties to Pydantic SQLModels (attribute_name: data_type).

Additionally, we can give the table a name that will be used when creating the database by setting the __tablename__ attribute. This way, the name of your python class and your database table don't need to be the same.

We can make columns optional (null values allowed) by wrapping the data type with Optional[], where Optional is imported from python's typing built-in library.

from typing import Optional


class Person(Entity, table=True):
    __tablename__ = "person"
    __identifying__ = {"first_name", "last_name"}
    first_name: str
    last_name: str
    age: Optional[int]

Identifying Attributes

Finally, you'll notice that we have set an attribute called __identifying__.

The distinction between identifying and non-identifying attributes is important in DBgen. The identifying attributes answer the question, "what defines one row in this table?" The identifying information is the minimum set of information that is guaranteed to return one row. In more technical terms, the primary key for each row in a table created using DBgen is a hash of that row's identifying attributes. In this case, we have specified that a row in this table fully identified by the person's first and last name. With this design decision, it is impossible for us to insert two people with the same first and last name into the database.

Adding foreign keys

Foreign keys specify the relationships between the entities in the schema. For example, in this example, the same person may make many temperature measurements. Let's define a table to capture temperature measurements, which order they were captured in, and who recorded them.

The foreign key can be named anything, but it is very common to name the foreign key using the following pattern TABLENAME_id where TABLENAME is the name of the table the foreign key is referencing. The column type of a foreign key is the same as id on the referenced table, which is an Optional[UUID]. This type can be directly imported from dbgen as shown below. Finally, the value for the foreign key column can be automatically generated using the Entity.foreign_key() method.

from dbgen import Entity, IDType
...
class TemperatureMeasurement(Entity, table=True):
    __tablename__ = "temperature_measurement"
    __identifying__ = {"person_id", "ordering"}
    temperature_F: Optional[float]
    temperature_C: Optional[float]
    ordering: Optional[int]
    person_id: IDType = Person.foreign_key()