Sourcify Database
Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the Verified Alliance Schema as its base with few modifications.
On a high level, these modifications are:
- Sourcify DB does accept contracts without the deployment details such as
block_number,transaction_hashas well as without an onchain creation bytecode (contracts.creation_code_hash). - Stores the Solidity metadata separately in the
sourcify_matchestable. - Introduces tables for other purposes.
You can follow the services/database/migrations folder for the initial schema and the changes made to it. These are not necessarily the differences between Sourcify DB and the Verified Alliance Schema, but any changes made to the schema over time.
Schema
You can access the live schema of the database here or in the embedded frame below.
In short:
- Every verified contract is a coupling between a deployed contract (
contract_deployments) and a compilation (compiled_contracts) - "Transformations" are applied to reach the final matching onchain bytecode from a bytecode from a compilation.
- Bytecodes and sources are dedeplicated. The bytecode and the sources of a popular contract like
ERC20.solwill only be stored once insourcesandcoderespectively.
If the contract has "unlinked libraries", the placeholder strings like __$53ae...a537$__ in bytecodes will be normalized to 0000...0000s. This is required since the code column is a bytea type in the DB.
Therefore, the bytecode string from the DB will not be identical to the output of the compilation. You can "de-normalize" these fields by looking at the library transformations and filling the placeholders with the library identifier.
For more information about the schemas of the json fields below check the Verifier Alliance repo.
JSON fields of verified_contracts table:
creation_valuescreation_transformationsruntime_valuesruntime_transformations
The transformations and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode.
JSON fields of compiled_contracts table:
sources: Source code files of a contractcompiler_settingscompilation_artifacts: Fields from the compilation output JSON. Fields:abi,userdoc,devdoc,sources(AST identifiers),storageLayoutcreation_code_artifacts: Fields underevm.bytecodefield. Fields:sourceMap,linkReferences,cborAuxdataruntime_code_artifacts: Fields underevm.deployedBytecodefield. Fields:sourceMap,linkReferences,cborAuxdata,immutableReferences
Notes on the data
For the issues on the data we are aware of and plan to fix, see this issue: https://github.com/argotorg/sourcify/issues/2276
Other known inconsistencies in the data below (not planned to fix) are documented below:
- Compiler versions: Keep in mind the vyper version build strings are not consistent (details here):
- version =< 0.3.0: The commit hash has 7 characters
0.3.0+commit.8a23feb - version 0.3.1: No commit hash:
0.3.1 - version >= 0.3.2: The commit hash has 8 characters
0.3.2+commit.3b6a4117
- version =< 0.3.0: The commit hash has 7 characters
Download
The current parquet download format will be deprecated. A new /v2 endpoint will be introduced with an updated format. Documentation for the new format will be added once it is live. Feel free to use the export in its current form, but be aware that it will be replaced.
We dump the whole database daily in Parquet format and upload it to a Cloudflare R2 storage. You can access the manifest file at https://export.sourcify.dev ( .dev redirects to .app domain, which also belongs to Sourcify). The script that does the dump is at sourcifyeth/parquet-export.
export.sourcify.dev will redirect to a manifest.json file:
manifest.json
{
"timestamp": 1726030203254,
"dateStr": "2024-09-11T04:50:03.254904Z",
"files": {
"code": [
"code/code_0_100000.parquet",
"code/code_100000_200000.parquet",
...
"code/code_2700000_2800000.parquet"
],
"contracts": [
"contracts/contracts_0_1000000.parquet",
...
"contracts/contracts_4000000_5000000.parquet"
],
"contract_deployments": [
"contract_deployments/contract_deployments_0_1000000.parquet",
...
"contract_deployments/contract_deployments_5000000_6000000.parquet"
],
"compiled_contracts": [
"compiled_contracts/compiled_contracts_0_5000.parquet",
...
"compiled_contracts/compiled_contracts_815000_820000.parquet"
],
"verified_contracts": [
"verified_contracts/verified_contracts_0_1000000.parquet",
...
"verified_contracts/verified_contracts_5000000_6000000.parquet"
],
"sourcify_matches": [
"sourcify_matches/sourcify_matches_0_100000.parquet",
...
"sourcify_matches/sourcify_matches_5300000_5400000.parquet"
]
}
}
You can download all the files and use a parquet client to query, inspect, or process the data.
-
Download the manifest file (
-Lto follow redirects):curl -L -O https://export.sourcify.dev/manifest.json -
Download all the tables listed in the manifest:
jq -r '.files | keys[] as $k | .[$k][]' manifest.json | xargs -I {} curl -L -O https://export.sourcify.dev/{}
For example you can install the parquet-cli to do basic inspection:
brew install parquet-cli
parquet meta compiled_contracts_0_5000.parquet
alternatively use your favorite data processing tool or import this data into a database.
BigQuery Datasets
We also provide public BigQuery datasets for convenient querying and exploration: