Meteor — A metadata collection framework


What is Metadata ?

Metadata is a relevant form of information that describes one or more aspects of particular data. It summarises basic information about data, making finding & working with particular instances of data easier. Various systems, databases, message queue have its own definition and structure for data. If we take a look into the DE ecosystem in Gojek, the message queuing system like Kafka, which is used to stream real time data to data store with the help of topic information. The table is part of the database that group data. Similarly Similarly, dashboard is a part of Metabase and Grafana with the information about the charts present in it.
It describes the information like where does the data come from, who created this table, what are similar tables, when it was last updated. These information are useful to understand the data in efficient manner.

About Meteor

The effective collection of metadata was a major challenge that team was facing as there was no easy to use, reusable, portable, and flexible enough to be modified to meet generic use cases. This led to evolution of meteor. But the challenges does not ends here. The old meteor as a service was not able to cope up with the new demands, as it was quite non-trivial to jump to a newer approach that includes a single solution for all the available data sources whether it is a bucket, topic, database, dashboard, table whatever possible, and to be able to process the metadata and sink it down easily to the destination.This made the decision to move with plugin-based metadata extractor.

Basic workflow of meteor
  1. Extraction : It is the process of extracting data from a source and transforming it into a format that can be consumed by the agent
  2. Processing : It is the process of transforming the extracted data into a format that can be consumed by the agent.
  3. Sink : It is the process of sending the processed data to a single or multiple destinations as defined in recipes.
  • Job: A metadata extraction task from a single data source.
  • Recipes: A set of instructions and configurations defined by user in yaml file , they are used to define how a particular job will be performed. It should contain instruction about the source from which the metadata will be fetched, information about metadata processors and the destination is to be defined as sinks of metadata.
  • Extractor: The type of plugins that extract the source of metadata. There are currently multiple plugins supported to extract metadata from various sources including databases, dashboards, topics, etc. A single job in meteor can have only one source in meteor.
  • Processor: The type of plugins that work for processing stage to perform the enrichment or data processing for the metadata after extraction. There can be multiple processors in a single recipe.
  • Sink: The type of plugins that act as the destination of our metadata to send metadata to a variety of third party APIs and catalog services, including Columbus, HTTP, BigQuery, Kafka, and many others.

Key Features

  • No Dependency: Written in Go. It compiles into a single binary with no external dependency.
  • Extensible: Plugin system allows new sources and sinks to be easily added.
  • Ecosystem: Extract metadata for many popular services with a wide number of service plugins.
  • Customisable: Add your own processors and sinks to suit your many use cases.
  • Runtime: Meteor can run inside VMs or containers with minimal memory footprint.

About ODPF



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditya Singh Sisodiya

Aditya Singh Sisodiya

Incoming Product Engineer @Gojek | FullStack Developer