Process files in batches by using the Unstructured Ingest Python library
The Unstructured Ingest Python library enables you to use Python code to send files in batches to Unstructured API services for processing, and to tell Unstructured API services where to deliver the processed data.
The following 3-minute video shows how to use the Unstructured Ingest Python library to send multiple PDFs from a local directory in batches to be ingested by Unstructured API services for processing:
Installation
One approach to get started quickly with the Unstructured Ingest Python library is to install Python and then run the following command:
This default installation option enables the ingestion of plain text files, HTML, XML, JSON and emails that do not require any extra dependencies. This default option also enables you to specify local source and destination locations.
You might also need to install additional dependencies, depending on your needs. Learn more.
For additional installation options, and information about v2 and v1 implementations in this library, see the Unstructured Ingest Python library in the Ingest section.
pip install unstructured
, see the migration guide.Usage
For example, to use the Unstructured Ingest Python library to ingest files from a local source (input) location and to deliver the processed data to an Azure Storage account destination (output) location:
To learn how to use the Unstructured Ingest Python library to work with a specific source (input) and destination (output) location, see the Python code examples for the source and destination connectors that are available for you to choose from.
See also the ingest configuration settings that enable you to further control how batches are sent and processed.