AI Generated Metadata Enrichments for Unstructured Data with IBM Spectrum Discover & watsonx.ai

Generative AI has high utility for generating metadata for both structured and unstructured data and is relevant in the storage domain where data discoverability drives the value of data across the enterprise including for downstream AI projects.

In a recent IBM Client Engineering project we extended IBM Fusion with the Spectrum Discover Fusion SDK to create a data pipeline for AI generated metadata. We created a metadata policy in IBM Fusion to filter images with missing metadata tags and published the image reference to a Kafka topic for the Spectrum Discover Application to consume. We used the watson machine learning SDK with a basic prompt to generate metadata tags associated with the image that catalogued in IBM Fusion. We integrated IBM Knowledge Catalog for enterprise wide data cataloging and watsonx.ai for querying and to enable downstream AI building.

We deployed the IBM Spectrum Discover Application to OpenShift for a highly scalable, high-throughput data pipeline.

System View

system view

IBM Spectrum Discover Query Builder

Example IBM Spectrum Discover Application

https://github.com/IBM/Spectrum_Discover_Example_Application

James
James
Architect | AI / ML Engineer | BSI ART1 Artificial Intelligence Committee Member / Expert | Follow me for updates on building trustworthy AI

My research interests include Artificial Intelligence, Semantic Models and Distributed Systems.