Machine learning has to work with large datasets. Normally, researchers get access to large datasets by capturing data from large websites or downloading datasets from public repositories.
However, Multi-Chain Storage (MCS) is another perfect match for datasets collection (and storage). MCS uses decentralized storage, where users can monitor the history of changes in data and have data secured. MCS also allows large file storage (which git cannot handle).
The python MCS SDK allows users retrieving datasets from MCS and uploading files to MCS in an efficient way and provides users a convenient interface for working with the MCS API.
The dataset used for training and testing the model can be downloaded from here.
There is a simple to step setup that is required for using MCS for machine learning:
● Setup MetaMask wallet
● Install MCS SDK
Step 1: Machine learning datasets can be retrieved from the IPFS server and Filecoin Network and allow easy access to data.
def data_fetch(ipfs_uri, dataset_path, file_name, dir_name, extract=False):
isdir = os.path.isdir(dataset_path)
print("Dataset found on local drive: ", dataset_path)
print("Start downloading dataset")
link = ipfs_uri
with open(file_name, "wb") as f:
print("Downloading %s" % file_name)
response = requests.get(link, stream=True)
total_length = response.headers.get('content-length')
if total_length is None: # no content length header
dl = 0
total_length = int(total_length)
for data in response.iter_content(chunk_size=4096):
dl += len(data)
done = int(50 * dl / total_length)
sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50 - \
Step 2: Users can also encrypt data before uploading to make sure data is secured. To read the data, secret keys must be present. This is demonstrated in the monkey classification demo.
def encrypt_file(key, filename, chunk_size=64 * 1024):
print("Data %s encryption starts " % filename)
file_to_encrypt = filename
buffer_size = 65536 # 64kb
input_file = open(file_to_encrypt, 'rb')
output_file = open(file_to_encrypt + '.encrypted', 'wb')
cipher_encrypt = AES.new(key, AES.MODE_CFB)
buffer = input_file.read(buffer_size)
while len(buffer) > 0:
ciphered_bytes = cipher_encrypt.encrypt(buffer)
buffer = input_file.read(buffer_size)
print("Data %s encrypted: %s " % (filename, file_to_encrypt + '.encrypted'))
Step 3: Upload file using python MCS SDK
def upload_to_mcs(wallet_address, private_key, web3_api, file_path):
# upload to mcs
w3_api = ContractAPI(web3_api)
api = McsAPI()
w3_api.approve_usdc(wallet_address, private_key, "1")
# upload file to mcs
father_path = os.path.abspath(os.path.dirname(__file__))
upload_file = api.upload_file(wallet_address, father_path + file_path)
file_data = upload_file["data"]
payload_cid, source_file_upload_id, nft_uri, file_size, w_cid = file_data['payload_cid'], file_data[
'source_file_upload_id'], file_data['ipfs_url'], file_data['file_size'], file_data['w_cid']
# get the global variable
params = api.get_params()["data"]
# get filcoin price
rate = api.get_price_rate()["data"]
# test upload_file_pay contract
w3_api.upload_file_pay(wallet_address, private_key, file_size, w_cid, \
return payload_cid, source_file_upload_id
Step 4: MCS allows users to track deal information, and deal log with source file upload id.
deal_detail = api.get_deal_detail(wallet_address, source_file_upload_id)
download_url = deal_detail["data"]["source_file_upload_deal"]["ipfs_url"]
Step 5: Lastly, the model and training results can also be stored on blockchain using MCS, which is secured. The files can then be tracked using payload cid and source file upload id.
payload_cid, source_file_upload_id = upload_to_mcs(wallet_address, private_key, web3_api, filepath)
For encrypted file storage
1. Fetch datasets from MCS.
2. Encrypt data for security.
3. Upload encrypted data to the blockchain using MCS.
4. Pay gas fee & USDC for storage.
For machine learning
1. Access data information using source file upload id.
2. Retrieve data from the blockchain.
3. Train ML model using fetched data.
4. Process results of ML on a new set of data.
5. Upload result to MCS.
6. Pay gas fee & USDC for storage.
MCS allows users to store large ML datasets on the blockchain with easy access to those files. Users can track the history of the change in their data, and make sure that the important dataset is not corrupted.
You can check out a simple demonstration of using python MCS SDK to support the machine learning process.
Follow Us On
· Discord: https://discord.com/invite/KKGhy8ZqzK
· Telegram: https://t.me/filswan
· Twitter: https://twitter.com/0xfilswan
· GitHub: https://github.com/filswan
· FilSwan Website: https://filswan.com/
· YouTube: https://www.youtube.com/channel/UCcvrZdNqFWYl3FwfcHS9xIg