Migration from Filesystem to Database
This page describes the process of migrating all the contracts from the filesystem based repository to a new storage structure. It is meant for anyone running their own Sourcify instance.
Current repository limitations
Currently Sourcify stores verified contracts into a repository:
- in which there are contracts verified with old verification and storage strategies: #902, #1071
- in which the sources are stored mirroring the original file system, causing naming path problems: #515
- filesystem based, causing a difficult maintenance and expansive advanced searches (e.g. by contract's name)
New storage structure
We cannot move completely to a database dropping the filesystem because we want to keep serving metadata files and sources over IPFS. So the new storage structure includes:
- RepositoryV1 (deprecated): a refined version of the old repository containing contracts stored with new verification and storage strategies.
- RepositoryV2: a filesystem-based repository where sources for each contract are stored in its
sources/
folder, without replicating the contract's original filesystem structure. Instead, it names files usingkeccak256(file: string)
. - Database: a PostgreSQL database based on the schema of the Alliance Database customized to better fit Sourcify. In particular the Alliance Database doesn't allow storing contracts without creation bytecode and it doesn't explicitly support the concept of full vs partial verification.
Migration process
Sourcify will keep storing and serving matches from the old repository until the migration is completed. The migration process is structured as follows:
- Enable RepositoryV2 and Database in
sourcify-server
,sourcify-server
will store the new matches into RepositoryV2 and the Database (including the tablesourcify_sync
).sourcify-server
will continue writing into current_repositoryV1. - Import all the contracts from current_repositoryV1 into
sourcify_sync
- Start the migration script, the script reads all
synced=false
contracts fromsourcify_sync
and sends for each of them a verification request to the dedicatedsourcify-server-migration
instance that writes to:- the same RepositoryV2 as
sourcify-server
- the same Database as
sourcify-server
- the new_repositoryV1
- the same RepositoryV2 as
- Start using the new_repositoryV1: After the migration is completed, on the dedicated
sourcify-server-migration
instance, the new version of repositoryV1 will be produced (new_repositoryV1). Before using new_repositoryV1 insourcify-server
, all the contracts added to the current_repositoryV1 after starting the migration script need to be moved to new_repositoryV1.
A temporary new Sourcify’s server is used in the migration process to not overload the main server with all the previous requests, we call this new server sourcify-server-migration
.
The migration process is tracked and managed by using a table named sourcify_sync
that contains:
chain_id
address
created_at
synced
: the status of the contract's migration, "has the contract been migrated to the new storage structure?"
As explained above, after we complete the migration, RepositoryV1 will be a refined version of the current repository containing contracts stored with new verification and storage strategies. In the process of the migration:
sourcify-server
will keep writing RepositoryV1’s contracts to the current repository, in this document we will refer to the current repository as current_repositoryV1.sourcify-server-migration
will store RepositoryV1’s contracts to a new empty repository, in this document we will refer to this new repository as new_repositoryV1.
1. Enabling RepositoryV2 and Database
a. Start PostgreSQL
b. Migrate the database schema
We first need to write the existing schemas defined under
services/database/migrations
into our fresh new Postgres DB:- I. Clone the Sourcify monorepo and run
npm install
- II.
cd services/database
- III. Edit
database.json
to configure the connection to the PostgreSQL database - IV.
npm run migrate:up -- --env=YOUR_ENV
: Instead ofYOUR_ENV
, put the JSON key under which you defined the database connection indatabase.json
in the previous step.
- I. Clone the Sourcify monorepo and run
c. Enable RepositoryV2
Make sure that in your configuration file inside
services/server/src/config
there is therepositoryV2
configuration. Set the path accordingly:repositoryV2: {
path: "/home/app/repositoryV2",
}d. Enable Database
Make sure to provide the environment variables to enable the database. Set the variables accordingly in sourcify-server’s
.env
file:SOURCIFY_POSTGRES_HOST=postgresql
SOURCIFY_POSTGRES_DB=sourcify
SOURCIFY_POSTGRES_USER=sourcify
SOURCIFY_POSTGRES_PORT=5432
SOURCIFY_POSTGRES_PASSWORD=sourcifye. Check that everything is working
- Restart
sourcify-server
- Make sure that for every match:
- There is a row into
sourcify_matches
,verified_contracts
(and related tables), as well assourcify_sync
- There is the respective folder in
repositoryV2/contracts/:match_type/:chain_id/:address
- There is a row into
- Restart
2. Importing existing contracts
- a. Importing contracts from the old repository
- I.
cd services/database
- II. Make sure to create a
.env
file to configure the connection to the PostgreSQL database. There is already a.env.template
for reference. - III.
npm run sourcify:database import-repo /data/repository/contracts/
: Instead of/data/repository/contracts/
, put the current repository’s path
- I.
- b. Wait
- This process can take hours. It’s normal to not see any output after running the script for a while.
- Wait until the scripts exits with
successfully imported from “:repo_path” :number_imported_contracts contracts.
3. Start the migration instance and the migration script
- a. Start a dedicated
sourcify-server-migration
instance- I. ⚠ You need to checkout 6cdcdce8f186cdf3c90438393fee1efcbe272622 or
git checkout tags/sourcify-server@1.6.0
- II. Configure it accordingly using the same steps described in 1. Enabling RepositoryV2 and Database. Both
sourcify-server
andsourcify-server-migration
should write to the same repoV2 and Database. - III. ⚠ In the configuration of
sourcify-server-migration
, change the path of repositoryV1 to a new path: this is the new_repositoryV1. Don’t write in current_repositoryV1! - IV. Start the new
sourcify-server-migration
instance
- I. ⚠ You need to checkout 6cdcdce8f186cdf3c90438393fee1efcbe272622 or
- b. Find unsupported chains
- I. From
services/server/src/sourcify-chains-default.json
or thesourcify-chains
file that you are using, find all the chainIds of unsupported chains. - II. Prepare them formatted in a list separated by a comma. E.g.
0,3,4,42,77,300,534,421613,4216137055,420,28,43,486217935,356256156,103090,420666,420420,99,592,8217,28528,14,421611,69,167005,167006
- III. ⚠ Remember to include chainId 0 in this list
- I. From
- c. Start the synchronization script
- I.
cd services/database
- II. Edit
database.json
to configure the connection to the PostgreSQL database - III.
npm run sourcify:database sync URL_TO_SOURCIFY_MIGRATION_INSTANCE /data/repository/contracts/ -- -c LIST_OF_UNSUPPORTED_CHAINS --limit 5
- Instead of
/data/repository/contracts/
, put the current repository’s path. The script needs to access the old repo to get the file used to send the verification request. - Instead of
LIST_OF_UNSUPPORTED_CHAINS
, put the formatted list prepared in the previous step. The script will not send verification requests to these chains. - You can configure
--limit 5
: limit is the number of concurrent verifications that are going to be sent to thesourcify-server-migration
for each chain. If you set 5, 5 requests are going to be sent for each chain.
- Instead of
- I.
- d. Wait and monitor
- The script will update migrated contracts into
sourcify_sync
settingsynced=true
. - The script will take days and a small portion of old contracts will not be verified because of the new verification and storage strategies.
- If the script is interrupted for any reason, you can restart it.
- You can use this query to keep track of the current status:
select
chain_id,
case (chain_id in (0,3,4,42,77,300,534,421613,4216137055,420,28,43,486217935,356256156,103090,420666,420420,99,592,8217,28528,14,421611,69,167005,167006)) when true then 'unsupported' else '' end, -- paste here the list of unsupported chains
count(case ss.synced when true then 1 else 1 end) as total,
count(case ss.synced when false then 1 else null end) as to_sync,
count(case ss.synced when true then 1 else null end) as synced
from sourcify_sync ss
where 1=1
group by ss.chain_id
order by chain_id asc; - The script will output a verification per second insight. The verification speed is a good compromise between the amount of concurrent requests sent for each chain and the amount of requests the rpc can handle. If you see that you are receiving a lot of rpc errors decrease the number of concurrent verifications, if the verifications per second are too low, increase the number of concurrent verifications. You can set the number of concurrent verifications using the
--limit
parameter. - When the script was successful, the
sourcify-server-migration
instance can be stopped.
- The script will update migrated contracts into
4. Start using the new deprecated repositoryV1
While the migration script was running on
sourcify-server-migration
, new matches coming fromsourcify-server
got saved in Database, RepositoryV2 and current_repositoryV1 .There are no issues for the Database and RepositoryV2 because
sourcify-server-migration
andsourcify-server
are writing in the same database/fsThere is a complication regarding RepositoryV1.
sourcify-server
is storing new matches into the current_repositoryV1, whilesourcify-server-migration
in the new_repositoryV1.A procedure is needed to transfer all the contracts in current_repositoryV1 that were saved after RepositoryV2 and Database were enabled on
sourcify-server
.sourcify-server
stored the matches insourcify_sync
with the status set toperfect
instead ofperfect_match
andpartial
instead ofpartial_match
. We can use this information to list all the contracts that need to be transferred from current_repositoryV1 to new_repositoryV1.Extract all the contracts described in the previous point from the database in a
rows.txt
file. For each row output the path in the repository. You can use the following SQL query:SELECT
'/contracts/' || CASE WHEN match_type = 'perfect' THEN 'full_match'
WHEN match_type = 'partial' THEN 'partial_match'
ELSE 'partial_match'
END || '/' || chain_id || '/' || convert_from(address, 'UTF8') as path
FROM sourcify_sync
WHERE match_type = 'perfect' OR match_type = 'partial'
order by id desc;The
rows.txt
file will look like this:/contracts/partial_match/11155111/0x6C5f9712ee69E76908D3E3Ef62Ce33c04d748f61
/contracts/full_match/11155111/0x1fDe294eA2C5A81bC24fa117994436A0B30A8F12
/contracts/partial_match/8453/0x85b1180AfC0c45fb074Baee447B8F4aBb5BF84A9
/contracts/full_match/11155111/0x4bf0CE51d78041103eCAA6281503546D00721A2e
...Use the following script to copy all the contracts from current_repositoryV1 to new_repositoryV1:
#!/bin/bash
# Path to the file containing the lines
FILE_PATH="./rows.txt"
# Count the total number of lines in the file
total_lines=$(wc -l < "$FILE_PATH")
echo "Total files to process: $total_lines"
# Variable to keep track of processed lines
processed_lines=0
# Read each line from the file
while IFS= read -r line
do
# Update processed lines count
((processed_lines++))
# Construct source and destination paths
src="/home/app/repository_current$line"
dest="/home/app/repository$line"
# Create the destination directory structure
mkdir -p "$(dirname "$dest")"
# Copy the files from source to destination without overwriting existing files
cp -rn "$src" "$dest"
# Calculate and display the percentage of completion
percent=$(($processed_lines * 100 / $total_lines))
echo "Processed $processed_lines / $total_lines files ($percent%)"
done < "$FILE_PATH"In the script, modify
src
to match your current_repositoryV1 path, anddest
to match your new_repositoryV1 path.
In your configuration file inside
services/server/src/config
, switch the RepositoryV1 config to the new_repositoryV1’s path.repositoryV1: {
path: "/home/app/repositoryV1",
...
}Restart
sourcify-server
What’s next
We are still working on finding solutions for the following points:
- Find a solution for unsupported chains contracts: we migrated all the contracts that could be re-verified. Unfortunately, contracts from non-supported chains cannot be verified because the on-chain’s bytecode is no longer served by RPCs. This is currently work-in-progress and will come soon.
- Using the DB as the source of truth: after the migration the database will be used only as a storing service, the repository APIs are still using RepositoryV1. This is currently work-in-progress and will come soon.