Transforming Unstructured Data

Unstructured DataCopied!

Unstructured data, such as images, is transformed by sending a transaction to the /transform-unstructured endpoint.

  • Multiple Images: If your data spans across several images, include all relevant images in a single API call. We’ll process them together to generate the desired output.

  • File Size Limit: Each image or document page should be under 4MB. We’ll attempt to resize larger images, but files exceeding this limit may result in an error.

  • Processing Time: Our current process typically results in images being processed in 2 - 4 minutes. Extremely large schemas (many keys or nested data structures), or unstructured data that results in extremely large structures may result in much longer processing times.

Getting the parameters togetherCopied!

  1. Pipeline ID: From your dashboard, select the workspace with your existing pipeline for unstructured data. Within that workspace you’ll have a list of created pipelines (if there are any) - copy the pipeline ID for the specific unstructured data pipeline you want to process data through:

  2. API Key: You can grab your current API key for all of your workspace from the account settings page available by the user menu in the top right. This is passed as a header via the "Ocp-Apim-Subscription-Key" key.

  3. Raw unstructured data: For documents that are not raw strings, you may use multi-part upload to pass the raw data to the end point, regardless of file type. For string input you may simple use the “text_input” parameter instead. Refer to the API reference here.

Responses:Copied!

Your first response from the input will include the following details:

{ "transactionId": "…", "status": "…", "message": "…", "docCount": 1, "textCount": 1 }

Importantly you will want to store the “transactionId” as this is how you will retrieve data for this specific input when the processing is complete.

Retrieving Processed Image DataCopied!

Once the images are processed, you can retrieve the results by sending a GET request to the /get-unstructured-result endpoint. You’ll need to include the correct pipelineId and transactionId.

  • In Progress?: If the image is still being processed, you’ll receive a 202 response, indicating the data is not yet ready.

  • Already Retrieved?: If the image data has already been retrieved, you’ll receive a 200 response with a “retrieved” status, but no image data. This occurs because Arbizal deletes the data once it has been successfully retrieved.

Note: Since processed data is deleted once it’s retrieved, you won’t be able to access the same transaction again. Be sure to store the output when you first retrieve it.