Skip to content

Database Configuration

LAPIS and SILO need a database_config.yaml. It’s main purpose is to define the database schema for the sequence metadata. See the tutorial for an example, or use our config generator to generate your own config. More examples can be found in our tests.

The database config is considered static configuration that doesn’t change with data updates. This page contains the technical specification of the database config.

The Schema Object

The database_config.yaml must contain a schema object on top level. It permits the following fields:

KeyTypeRequiredDescription
instanceNamestringtrueThe name assigned to the instance. Only used for diplay purposes.
metadataarraytrueA list of metadata objects that is available on the underlying sequence data.
opennessLevelenumtruePossible values: OPEN. To be extended in the future.
primaryKeystringtrueThe field that serves as the primary key in SILO for the data.
dateToSortBystringfalseThe field used to sort the data by date. Queries on this column will be faster.
partitionBystringfalseThe field used to partition the data. Used by SILO for overall query optimization.
featuresarrayfalseA list of feature objects.

The Metadata Object

The metadata object permits the following fields:

KeyTypeRequiredDescription
namestringtrueThe name of the metadata field.
typeenumtrueThe type of the metadata.
generateIndexbooleanfalseSee Generating an index below
lapisAllowsRegexSearchbooleanfalseIf true, LAPIS will autogenerate a filter ${name}.regex. See String search.

Metadata Types

SILO currently supports the following metadata types:
  • string
  • int
  • float
  • pango_lineage: Systematic classification of lineage with inheritance structure that can be computed for some pathogens. Also see here .
  • date: Values must be valid dates in the form YYYY-MM-DD.
  • insertion: A comma separated list of nucleotide insertions. Each insertion has the form <segment>:<position>:<symbols>. Example value: segment1:123:CCG,segment2:501:AAAGGG. If there is only one segment, the segment name can be omitted: 123:CCG,501:AAAGGG.
  • aaInsertion: A comma separated list of amino acid insertions. Each insertion has the form <gene>:<position>:<symbols>. Example value: S:123:CCG,ORF1A:501:AAAGGG.
Generating an Index

Columns of type string support generating an index. For columns of type pango_lineage, an index is always generated. SILO internally stores precomputed bitmaps for those columns so that a query on that column becomes a trivial lookup.

Features

The feature object permits the following fields:

KeyTypeRequiredDescription
namestringtrueThe name of the feature.

Currently, there is only one available feature: sarsCoV2VariantQuery. This enables a specialized query language for SARS-CoV-2 instances.

See variant queries.