"ERROR: Chunk hash mismatch while attempting multi-part upload

  • 1
  • Problem
  • Updated 4 weeks ago
  • (Edited)
I am using python and the ShareFile Rest API to do a multi-part upload. I was able to successfully upload a test file (a csv containing randomly generated text data) using a chunk size of 1200000 bytes. However, attempts to upload files containing actual data using this code or uploading the test file with a different chunk size always fails. The POSTs for each chunk returns a 200 response with a message: "ERROR: Chunk hash mismatch." I can't figure out why the hashes wouldn't match.
def upload_file(token, folder_id, data_str, filename):
    """ Uploads a File using the Standard upload method with a multipart/form mime encoded POST.
 
    Args:
    dict json token acquired from authenticate function
    string folder_id - where to upload the file
    string data_str - string containing the file contents to be uploaded
    string filename - the name of the file to be stored in ShareFile """
     
    uri_path = '/sf/v3/Items(%s)/Upload?method=streamed&raw=true&fileName=%s'%(folder_id,filename)
    http_ = http.client.HTTPSConnection(get_hostname(token))
    
    http_.request('POST', uri_path, headers=get_authorization_header(token))
 
    response = http_.getresponse()
    upload_config = json.loads(response.read())
    if 'ChunkUri' in upload_config:
        upload_response = multipart_form_post_upload(token, upload_config['ChunkUri'], data_str, filename)
        print(upload_response.read(), upload_response.status, upload_response.reason)
    else:
        print('No Upload URL received')


def multipart_form_post_upload(token, url, data_str, filename):
    """ Does a multipart form post upload of a file to a url.
     
    Args:
    dict json token acquired from authenticate function
    string url - the url to upload file to
    string data_str - string containing the file contents to be uploaded
    string filename - the name of the file to be stored in ShareFile
     
    Returns:
    the http response """
    # CHUNK_LENGTH = 4 * 1024 * 1024
    CHUNK_LENGTH = 1200000
    data_byte = data_str.encode('utf-8')
    fileHash = md5(data_byte).hexdigest()

    headers = get_authorization_header(token)
    headers['content-type'] = 'multipart/form-data'
    index = 0
    size = 0
    while len(data_byte) > CHUNK_LENGTH:
        chunk = data_byte[:CHUNK_LENGTH]
        data_byte = data_byte[CHUNK_LENGTH:]
        headers['content-length'] = CHUNK_LENGTH
        fullUrl = url+'&index={index}&byteOffset={offset}&hash={chunkHash}'.format(index=index,offset=index*CHUNK_LENGTH,chunkHash=md5(chunk).hexdigest())
        uri = urllib.parse.urlparse(fullUrl)
        http_ = http.client.HTTPSConnection(uri.netloc)
        http_.putrequest('POST', '%s?%s'%(uri.path, uri.query))
        for hdr_name, hdr_value in headers.items():
            http_.putheader(hdr_name, hdr_value)
        http_.endheaders()
        http_.send(chunk)
        r = http_.getresponse()
        print(r.read().decode())
        size += CHUNK_LENGTH
        index += 1

    lastChunk = data_byte
    size += len(lastChunk)
    lastChunkHash = md5(lastChunk).hexdigest()
    headers['content-length'] = len(lastChunk)
    fullUrl = url+'&index={index}&byteOffset={offset}&hash={chunkHash}&filehash={fileHash}&fileSize={fileSize}&finish=true'.format(index=index,offset=index*CHUNK_LENGTH,chunkHash=lastChunkHash,fileSize=size,fileHash=fileHash)
    uri = urllib.parse.urlparse(fullUrl)
    http_ = http.client.HTTPSConnection(uri.netloc)
    http_.putrequest('POST', '%s?%s'%(uri.path, uri.query))
    for hdr_name, hdr_value in headers.items():
        http_.putheader(hdr_name, hdr_value)
    http_.endheaders()
    http_.send(lastChunk)
    return http_.getresponse()
Photo of Nick DiGiulio

Nick DiGiulio

  • 1 Post
  • 0 Reply Likes

Posted 4 weeks ago

  • 1

Be the first to post a reply!