Building a pipeline for unstructured data
How to start transforming your first datasetsCopied!
For each new data source (whether it’s a table, file, or endpoint), you'll start by creating a new pipeline to transform the data into a specific output. This process begins in the Arbizal dashboard and within a data workspace.
Accessing or adding a workspaceCopied!
-
Once logged in, head to the Arbizal dashboard at https://arbizal.com/arbizal-dashboard or click the “Arbizal Dashboard” link in the upper-right user menu.
-
To create a new workspace, click the “Add Workspace” button.
-
To access an existing workspace, click the “Manage” button next to the workspace instance you want to load.
Creating a new pipelineCopied!
-
On the workspace management page, select “Add New Unstructured Pipeline.” This will open the configuration form for creating a pipeline for unstructured data.
-
Begin by naming your pipeline and specifying a schema. If you want to use an existing schema, select “Import Existing Schema” and choose from your list of previously created schemas.
-
If you’re creating a new schema, click “Add Root Key.” Every schema is structured as a dictionary. As you add keys, you’ll be prompted to provide:
-
A key name (double-click to edit the name)
-
A data type (you can choose a dictionary if you need to nest data)
-
A description—this helps our system understand the kind of information that will be contained in this key.
-
Saving and Updating Your Pipeline
Once you’ve defined the schema, you can save the pipeline. If you’re updating an existing pipeline, simply submit the changes. Once saved, unstructured data pipelines are immediately ready for use.
Output preview, and under the hoodCopied!
As you build your schema, you’ll see a small preview of what the output will look like. This helps you visualize what to expect from your transformation process.