Neo4j
Batch process all your records to store structured outputs in a Neo4j account.
The requirements are as follows.
-
The following video shows how to set up a Neo4j Aura deployment:
-
The username and password for the user who has access to the Neo4j deployment. The default user is typically
neo4j
.- For a Neo4j Aura instance, the defaut user’s is typically set when the instance is created.
- For an AWS Marketplace, Microsoft Azure Marketplace, or Google Cloud Marketplace deployment of Neo4j, the default user is typically set during the deployment process.
- For a local Neo4j deployment, you can set the default user’s initial password or recover an admin user and its password.
-
The connection URI for the Neo4j deployment, which starts with
neo4j://
,neo4j+s://
,bolt://
, orbolt+s://
; followed bylocalhost
or the host name; and sometimes ending with a colon and the port number (such as:7687
). For example:- For a Neo4j Aura deployment, browse to the target Neo4j instance in the Neo4j Aura account and click Connect > Drivers to get the connection URI, which follows the format
neo4j+s://<host-name>
. A port number is not used or needed. - For an AWS Marketplace, Microsoft Azure Marketplace, or Google Cloud Marketplace deployment of Neo4j, see Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP for details about how to get the connection URI.
- For a local Neo4j deployment, the URI is typically
bolt://localhost:7687
- For other Neo4j deployment types, see the deployment provider’s documentation.
- For a Neo4j Aura deployment, browse to the target Neo4j instance in the Neo4j Aura account and click Connect > Drivers to get the connection URI, which follows the format
-
The name of the target database in the Neo4j deployment. A default Neo4j deployment typically contains two standard databases: one named
neo4j
for user data and another namedsystem
for system data and metadata. Some Neo4j deployment types support more than these two databases per deployment; Neo4j Aura instances do not.- Create additional databases for a local Neo4j deployment that uses Enterprise Edition; or for Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP deployments.
- Get a list of additional available databases for a local Neo4j deployment that uses Enterprise Edition; or for Neo4j on AWS, Neo4j on Azure, or Neo4j on GCP deployments.
The Neo4j connector dependencies:
You might also need to install additional dependencies, depending on your needs. Learn more.
The following environment variables:
NEO4J_USERNAME
- The name of the target user with access to the target Neo4j deployment, represented by--username
(CLI) orusername
(Python).NEO4J_PASSWORD
- The user’s password, represented by--password
(CLI) orpassword
(Python).NEO4J_URI
- The connection URI for the deployment, represented by--uri
(CLI) oruri
(Python).NEO4J_DATABASE
- The name of the database in the deployment, represented by--database
(CLI) ordatabase
(Python).
Now call the Unstructured CLI or Python. The source connector can be any of the ones supported. This example uses the local source connector.
This example sends files to Unstructured API services for processing by default. To process files locally instead, see the instructions at the end of this page.
For the Unstructured Ingest CLI and the Unstructured Ingest Python library, you can use the --partition-by-api
option (CLI) or partition_by_api
(Python) parameter to specify where files are processed:
-
To do local file processing, omit
--partition-by-api
(CLI) orpartition_by_api
(Python), or explicitly specifypartition_by_api=False
(Python).Local file processing does not use an Unstructured API key or API URL, so you can also omit the following, if they appear:
--api-key $UNSTRUCTURED_API_KEY
(CLI) orapi_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) orpartition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)- The environment variables
UNSTRUCTURED_API_KEY
andUNSTRUCTURED_API_URL
-
To send files to Unstructured API services for processing, specify
--partition-by-api
(CLI) orpartition_by_api=True
(Python).Unstructured API services also requires an Unstructured API key and API URL, by adding the following:
--api-key $UNSTRUCTURED_API_KEY
(CLI) orapi_key=os.getenv("UNSTRUCTURED_API_KEY")
(Python)--partition-endpoint $UNSTRUCTURED_API_URL
(CLI) orpartition_endpoint=os.getenv("UNSTRUCTURED_API_URL")
(Python)- The environment variables
UNSTRUCTURED_API_KEY
andUNSTRUCTURED_API_URL
, representing your API key and API URL, respectively.
Graph Output
The graph ouput of the Neo4j destination connector is represented in the following diagram:
View the preceding diagram in full-screen mode.
In the preceding diagram:
- The
Document
node represents the source file. - The
UnstructuredElement
nodes represent the source file’s UnstructuredElement
objects, before chunking. - The
Chunk
nodes represent the source file’s UnstructuredElement
objects, after chunking. - Each
UnstructuredElement
node has aPART_OF_DOCUMENT
relationship with theDocument
node. - Each
Chunk
node also has aPART_OF_DOCUMENT
relationship with theDocument
node. - Each
UnstructuredElement
node has aPART_OF_CHUNK
relationship with aChunk
element. - Each
Chunk
node, except for the “last”Chunk
node, has aNEXT_CHUNK
relationship with its “next”Chunk
node.
Learn more about document elements and chunking.