Skip to main content

Migration from Filesystem to Database

info

This page describes the process of migrating all the contracts from the filesystem based repository to a new storage structure. It is meant for anyone running their own Sourcify instance.

Current repository limitations

Currently Sourcify stores verified contracts into a repository:

  • in which there are contracts verified with old verification and storage strategies: #902, #1071
  • in which the sources are stored mirroring the original file system, causing naming path problems: #515
  • filesystem based, causing a difficult maintenance and expansive advanced searches (e.g. by contract's name)

New storage structure

We cannot move completely to a database dropping the filesystem because we want to keep serving metadata files and sources over IPFS. So the new storage structure includes:

  • RepositoryV1 (deprecated): a refined version of the old repository containing contracts stored with new verification and storage strategies.
  • RepositoryV2: a filesystem-based repository where sources for each contract are stored in its sources/ folder, without replicating the contract's original filesystem structure. Instead, it names files using keccak256(file: string).
  • Database: a PostgreSQL database based on the schema of the Alliance Database customized to better fit Sourcify. In particular the Alliance Database doesn't allow storing contracts without creation bytecode and it doesn't explicitly support the concept of full vs partial verification.

Migration process

Sourcify will keep storing and serving matches from the old repository until the migration is completed. The migration process is structured as follows:

  1. Enable RepositoryV2 and Database in sourcify-server, sourcify-server will store the new matches into RepositoryV2 and the Database (including the table sourcify_sync). sourcify-server will continue writing into current_repositoryV1.
  2. Import all the contracts from current_repositoryV1 into sourcify_sync
  3. Start the migration script, the script reads all synced=false contracts from sourcify_sync and sends for each of them a verification request to the dedicated sourcify-server-migration instance that writes to:
    1. the same RepositoryV2 as sourcify-server
    2. the same Database as sourcify-server
    3. the new_repositoryV1
  4. Start using the new_repositoryV1: After the migration is completed, on the dedicated sourcify-server-migration instance, the new version of repositoryV1 will be produced (new_repositoryV1). Before using new_repositoryV1 in sourcify-server, all the contracts added to the current_repositoryV1 after starting the migration script need to be moved to new_repositoryV1.
sourcify-server-migration

A temporary new Sourcify’s server is used in the migration process to not overload the main server with all the previous requests, we call this new server sourcify-server-migration.

sourcify_sync

The migration process is tracked and managed by using a table named sourcify_sync that contains:

  • chain_id
  • address
  • created_at
  • synced: the status of the contract's migration, "has the contract been migrated to the new storage structure?"
current_repositoryV1 vs new_repositoryV1

As explained above, after we complete the migration, RepositoryV1 will be a refined version of the current repository containing contracts stored with new verification and storage strategies. In the process of the migration:

  • sourcify-server will keep writing RepositoryV1’s contracts to the current repository, in this document we will refer to the current repository as current_repositoryV1.
  • sourcify-server-migration will store RepositoryV1’s contracts to a new empty repository, in this document we will refer to this new repository as new_repositoryV1.

1. Enabling RepositoryV2 and Database

  • a. Start PostgreSQL

  • b. Migrate the database schema

    We first need to write the existing schemas defined under services/database/migrations into our fresh new Postgres DB:

    • I. Clone the Sourcify monorepo and run npm install
    • II. cd services/database
    • III. Edit database.json to configure the connection to the PostgreSQL database
    • IV. npm run migrate:up -- --env=YOUR_ENV: Instead of YOUR_ENV, put the JSON key under which you defined the database connection in database.json in the previous step.
  • c. Enable RepositoryV2

    Make sure that in your configuration file inside services/server/src/config there is the repositoryV2 configuration. Set the path accordingly:

    repositoryV2: {
    path: "/home/app/repositoryV2",
    }
  • d. Enable Database

    Make sure to provide the environment variables to enable the database. Set the variables accordingly in sourcify-server’s .env file:

    SOURCIFY_POSTGRES_HOST=postgresql
    SOURCIFY_POSTGRES_DB=sourcify
    SOURCIFY_POSTGRES_USER=sourcify
    SOURCIFY_POSTGRES_PORT=5432
    SOURCIFY_POSTGRES_PASSWORD=sourcify
  • e. Check that everything is working

    • Restart sourcify-server
    • Make sure that for every match:
      • There is a row into sourcify_matches , verified_contracts (and related tables), as well as sourcify_sync
      • There is the respective folder in repositoryV2/contracts/:match_type/:chain_id/:address

2. Importing existing contracts

  • a. Importing contracts from the old repository
    • I. cd services/database
    • II. Make sure to create a .env file to configure the connection to the PostgreSQL database. There is already a .env.template for reference.
    • III. npm run sourcify:database import-repo /data/repository/contracts/: Instead of /data/repository/contracts/, put the current repository’s path
  • b. Wait
    • This process can take hours. It’s normal to not see any output after running the script for a while.
    • Wait until the scripts exits with successfully imported from “:repo_path” :number_imported_contracts contracts.

3. Start the migration instance and the migration script

  • a. Start a dedicated sourcify-server-migration instance
    • I. ⚠ You need to checkout 6cdcdce8f186cdf3c90438393fee1efcbe272622 or git checkout tags/sourcify-server@1.6.0
    • II. Configure it accordingly using the same steps described in 1. Enabling RepositoryV2 and Database. Both sourcify-server and sourcify-server-migration should write to the same repoV2 and Database.
    • III. ⚠ In the configuration of sourcify-server-migration, change the path of repositoryV1 to a new path: this is the new_repositoryV1. Don’t write in current_repositoryV1!
    • IV. Start the new sourcify-server-migration instance
  • b. Find unsupported chains
    • I. From services/server/src/sourcify-chains-default.json or the sourcify-chains file that you are using, find all the chainIds of unsupported chains.
    • II. Prepare them formatted in a list separated by a comma. E.g. 0,3,4,42,77,300,534,421613,4216137055,420,28,43,486217935,356256156,103090,420666,420420,99,592,8217,28528,14,421611,69,167005,167006
    • III. ⚠ Remember to include chainId 0 in this list
  • c. Start the synchronization script
    • I. cd services/database
    • II. Edit database.json to configure the connection to the PostgreSQL database
    • III. npm run sourcify:database sync URL_TO_SOURCIFY_MIGRATION_INSTANCE /data/repository/contracts/ -- -c LIST_OF_UNSUPPORTED_CHAINS --limit 5
      • Instead of /data/repository/contracts/, put the current repository’s path. The script needs to access the old repo to get the file used to send the verification request.
      • Instead of LIST_OF_UNSUPPORTED_CHAINS, put the formatted list prepared in the previous step. The script will not send verification requests to these chains.
      • You can configure --limit 5: limit is the number of concurrent verifications that are going to be sent to the sourcify-server-migration for each chain. If you set 5, 5 requests are going to be sent for each chain.
  • d. Wait and monitor
    • The script will update migrated contracts into sourcify_sync setting synced=true.
    • The script will take days and a small portion of old contracts will not be verified because of the new verification and storage strategies.
    • If the script is interrupted for any reason, you can restart it.
    • You can use this query to keep track of the current status:
      select 
      chain_id,
      case (chain_id in (0,3,4,42,77,300,534,421613,4216137055,420,28,43,486217935,356256156,103090,420666,420420,99,592,8217,28528,14,421611,69,167005,167006)) when true then 'unsupported' else '' end, -- paste here the list of unsupported chains
      count(case ss.synced when true then 1 else 1 end) as total,
      count(case ss.synced when false then 1 else null end) as to_sync,
      count(case ss.synced when true then 1 else null end) as synced
      from sourcify_sync ss
      where 1=1
      group by ss.chain_id
      order by chain_id asc;
    • The script will output a verification per second insight. The verification speed is a good compromise between the amount of concurrent requests sent for each chain and the amount of requests the rpc can handle. If you see that you are receiving a lot of rpc errors decrease the number of concurrent verifications, if the verifications per second are too low, increase the number of concurrent verifications. You can set the number of concurrent verifications using the --limit parameter.
    • When the script was successful, the sourcify-server-migration instance can be stopped.

4. Start using the new deprecated repositoryV1

  • While the migration script was running on sourcify-server-migration, new matches coming from sourcify-server got saved in Database, RepositoryV2 and current_repositoryV1 .

    • There are no issues for the Database and RepositoryV2 because sourcify-server-migration and sourcify-server are writing in the same database/fs

    • There is a complication regarding RepositoryV1. sourcify-server is storing new matches into the current_repositoryV1, while sourcify-server-migration in the new_repositoryV1.

    • A procedure is needed to transfer all the contracts in current_repositoryV1 that were saved after RepositoryV2 and Database were enabled on sourcify-server.

      • sourcify-server stored the matches in sourcify_sync with the status set to perfect instead of perfect_match and partial instead of partial_match. We can use this information to list all the contracts that need to be transferred from current_repositoryV1 to new_repositoryV1.

      • Extract all the contracts described in the previous point from the database in a rows.txt file. For each row output the path in the repository. You can use the following SQL query:

        SELECT
        '/contracts/' || CASE WHEN match_type = 'perfect' THEN 'full_match'
        WHEN match_type = 'partial' THEN 'partial_match'
        ELSE 'partial_match'
        END || '/' || chain_id || '/' || convert_from(address, 'UTF8') as path
        FROM sourcify_sync
        WHERE match_type = 'perfect' OR match_type = 'partial'
        order by id desc;
      • The rows.txt file will look like this:

        /contracts/partial_match/11155111/0x6C5f9712ee69E76908D3E3Ef62Ce33c04d748f61
        /contracts/full_match/11155111/0x1fDe294eA2C5A81bC24fa117994436A0B30A8F12
        /contracts/partial_match/8453/0x85b1180AfC0c45fb074Baee447B8F4aBb5BF84A9
        /contracts/full_match/11155111/0x4bf0CE51d78041103eCAA6281503546D00721A2e
        ...
      • Use the following script to copy all the contracts from current_repositoryV1 to new_repositoryV1:

        #!/bin/bash

        # Path to the file containing the lines
        FILE_PATH="./rows.txt"

        # Count the total number of lines in the file
        total_lines=$(wc -l < "$FILE_PATH")

        echo "Total files to process: $total_lines"

        # Variable to keep track of processed lines
        processed_lines=0

        # Read each line from the file
        while IFS= read -r line
        do
        # Update processed lines count
        ((processed_lines++))

        # Construct source and destination paths
        src="/home/app/repository_current$line"
        dest="/home/app/repository$line"

        # Create the destination directory structure
        mkdir -p "$(dirname "$dest")"

        # Copy the files from source to destination without overwriting existing files
        cp -rn "$src" "$dest"

        # Calculate and display the percentage of completion
        percent=$(($processed_lines * 100 / $total_lines))
        echo "Processed $processed_lines / $total_lines files ($percent%)"
        done < "$FILE_PATH"

        In the script, modify src to match your current_repositoryV1 path, and dest to match your new_repositoryV1 path.

  • In your configuration file inside services/server/src/config, switch the RepositoryV1 config to the new_repositoryV1’s path.

    repositoryV1: {
    path: "/home/app/repositoryV1",
    ...
    }
  • Restart sourcify-server

What’s next

We are still working on finding solutions for the following points:

  • Find a solution for unsupported chains contracts: we migrated all the contracts that could be re-verified. Unfortunately, contracts from non-supported chains cannot be verified because the on-chain’s bytecode is no longer served by RPCs. This is currently work-in-progress and will come soon.
  • Using the DB as the source of truth: after the migration the database will be used only as a storing service, the repository APIs are still using RepositoryV1. This is currently work-in-progress and will come soon.