Kafka
Ingest your files into Unstructured from Kafka.
The requirements are as follows.
-
A Kafka cluster, such as ones offered by the following providers:
-
Confluent Cloud (Create a cluster.)
The following video shows how to set up a Kafka cluster in Confluent Cloud:
-
Amazon Managed Streaming for Apache Kafka (Amazon MSK) (Create an Amazon MSK Provisioned cluster, or create an Amazon MSK Serverless cluster.)
-
[Apache Kafka in Azure HDInsight] (Create an Apache Kafka cluster in Azure HDInsight.)
-
Google Cloud Managed Service for Apache Kafka (Create a cluster.)
-
-
The hostname and port number of the bootstrap Kafka cluster to connect to.
- For Confuent Cloud, get the hostname and port number.
- For Amazon MSK, get the hostname and port number.
- For Apache Kafka in Azure HDInsight, get the hostname and port number.
- For Google Cloud Managed Service for Apache Kafka, get the hostname and port number.
-
The name of the topic to read messages from and write messages to on the cluster.
- For Confluent Cloud, create a topic and access available topics.
- For Amazon MSK, create a topic on an Amazon MSK Provisioned cluster, or create a topic on an Amazon MSK Serverless cluster.
- For Apache Kafka in Azure HDInsight, create a topic and access available topics.
- For Google Cloud Managed Service for Apache Kafka, create a topic and access available topics.
-
If you use Kafka API keys and secrets for authentication, the key and secret values.
- For Confluent Cloud, create an API key and secret.
- For Amazon MSK, create an API key and secret.
- For Apache Kafka in Azure HDInsight, create an API key and secret.
- For Google Cloud Managed Service for Apache Kafka, create an API key and secret.
To create the source connector:
- On the sidebar, click Connectors.
- Click Sources.
- Cick New or Create Connector.
- Give the connector some unique Name.
- In the Provider area, click Kafka.
- Click Continue.
- Follow the on-screen instructions to fill in the fields as described later on this page.
- Click Save and Test.
Fill in the following fields:
- Name (required): A unique name for this connector.
- Bootstrap Server (required): The hostname of the bootstrap Kafka cluster to connect to.
- Port: The port number of the cluster.
- Group ID: The ID of the consumer group, if any, that is associated with the target Kafka cluster.
(A consumer group is a way to allow a pool of consumers to divide the consumption of data
over topics and partitions.) The default is
default_group_id
if not otherwise specified. - Topic (required): The unique name of the topic to read messages from and write messages to on the cluster.
- Number of messages to consume: The maximum number of messages to get from the topic. The default is
1
if not otherwise specified. - Batch Size: The maximum number of messages to send in a single batch.
- API Key (required): The Kafka API key value.
- Secret (required): The secret value for the Kafka API key.