Generate a JSON schema for a file
Task
You want to generate a schema for a JSON file that Unstructured produces, so that you can validate, test, and document related JSON files across your systems.
Approach
Use a Python package such as genson to generate schemas for your JSON files.
genson
package is not owned or supported by Unstructured. For questions and
requests, see the Issues tab of the
genson
repository in GitHub.Generate a schema from the terminal
Install jq
By default, genson
generates the JSON schema as a single string without any line breaks or indented whitespace.
To pretty-print the schema that genson
produces, install the jq utility.
jq
utility is not owned or supported by Unstructured. For questions and
requests, see the Issues tab of the
jq
repository in GitHub.Generate the schema
-
Run the
genson
command, specifying the path to the input (source) JSON file, and the path to the output (target) JSON schema file to be generated. Usejq
to pretty-print the schema’s content into the file to be generated. -
You can find the generated JSON schema file in the output path that you specified.
Generate a schema from Python code
Install dependencies
In your Python project, install the genson package.
Add and run the schema generation code
-
Set the following local environment variables:
- Set
LOCAL_FILE_INPUT_PATH
to the local path to the input (source) JSON file. - Set
LOCAL_FILE_OUTPUT_PATH
to the local path to the output (target) JSON schema file to be generated.
- Set
-
Add the following Python code file to your project:
-
Run the Python code file.
-
Check the path specified by
LOCAL_FILE_OUTPUT_PATH
for the generated JSON schema file.