Sourcify Database
Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the Verifier Alliance Schema as its base with few modifications.
On a high level, these modifications are:
- Sourcify DB does accept contracts without the deployment details such as
block_number,transaction_hashas well as without an onchain creation bytecode (contracts.creation_code_hash). - Stores the Solidity metadata separately in the
sourcify_matchestable. - Introduces tables for other purposes.
You can follow the services/database/migrations folder for the initial schema and the changes made to it. These are not necessarily the differences between Sourcify DB and the Verified Alliance Schema, but any changes made to the schema over time.
Schema
You can access the live schema of the database here or in the embedded frame below.
In short:
- Every verified contract is a coupling between a deployed contract (
contract_deployments) and a compilation (compiled_contracts) - "Transformations" are applied to reach the final matching onchain bytecode from a bytecode from a compilation.
- Bytecodes and sources are dedeplicated. The bytecode and the sources of a popular contract like
ERC20.solwill only be stored once insourcesandcoderespectively.
If the contract has "unlinked libraries", the placeholder strings like __$53ae...a537$__ in bytecodes will be normalized to 0000...0000s. This is required since the code column is a bytea type in the DB.
Therefore, the bytecode string from the DB will not be identical to the output of the compilation. You can "de-normalize" these fields by looking at the library transformations and filling the placeholders with the library identifier.
For more information about the schemas of the json fields below check the Verifier Alliance repo.
JSON fields of verified_contracts table:
creation_valuescreation_transformationsruntime_valuesruntime_transformations
The transformations and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode.
JSON fields of compiled_contracts table:
sources: Source code files of a contractcompiler_settingscompilation_artifacts: Fields from the compilation output JSON. Fields:abi,userdoc,devdoc,sources(AST identifiers),storageLayoutcreation_code_artifacts: Fields underevm.bytecodefield. Fields:sourceMap,linkReferences,cborAuxdataruntime_code_artifacts: Fields underevm.deployedBytecodefield. Fields:sourceMap,linkReferences,cborAuxdata,immutableReferences
Notes on the data
For the issues on the data we are aware of and plan to fix, see this issue: https://github.com/argotorg/sourcify/issues/2276
Other known inconsistencies in the data below (not planned to fix) are documented below:
- Compiler versions: Keep in mind the vyper version build strings are not consistent (details here):
- version =< 0.3.0: The commit hash has 7 characters
0.3.0+commit.8a23feb - version 0.3.1: No commit hash:
0.3.1 - version >= 0.3.2: The commit hash has 8 characters
0.3.2+commit.3b6a4117
- version =< 0.3.0: The commit hash has 7 characters
Download
See Download the Dataset for instructions on downloading the database in Parquet format.
BigQuery Dataset
We also provide a public BigQuery dataset for convenient querying and exploration:
The dataset is updated continuously as new contracts are verified. You need a Google account to access it.