Top Menu

Jump to content
  • Momentum Documentation
    • View all projects
Home
    • Projects
    • News
    • Getting started
    • Introduction video

      Welcome to OpenProject

      Get an overview

      Get a quick overview of project management and team collaboration with OpenProject.
      You can restart this video from the help menu

    • Help and support
    • Upgrade to Enterprise Edition
    • User guides
    • Videos
    • Shortcuts
    • Community forum
    • Professional support

    • Additional resources
    • Data privacy and security policy
    • OpenProject website
    • Security alerts / Newsletter
    • OpenProject blog
    • Release notes
    • Report a bug
    • Development roadmap
    • Add and edit translations
    • API documentation
  • Sign in
      Forgot your password?
      Create a new account

Side Menu

  • Overview
  • News
  • Forums
  • Inset BI
    • Table of contents
      • Expanded. Click to collapseCollapsed. Click to showImpulse DW
        • Expanded. Click to collapseCollapsed. Click to show1. Registration and Account Management
          • Hierarchy leaf1.1 Signup
          • Hierarchy leaf1.2 Password Change
          • Hierarchy leaf1.3 Profile Management
          • Hierarchy leaf1.4 Forgot Password
        • Expanded. Click to collapseCollapsed. Click to show2. Warehouse Management
          • Hierarchy leaf2.1 Create a Warehouse
          • Hierarchy leaf2.2 Edit Warehouse
          • Hierarchy leaf2.3 Datasources In Warehouse
          • Expanded. Click to collapseCollapsed. Click to show2.4 Ingesting Data Into Tables or Datasources
            • Hierarchy leaf2.4.1 Ingesting From Momentum Data Pipeline
            • Hierarchy leaf2.4.2 Uploading File Using Impulse UI
            • Hierarchy leaf2.4.3 Ingesting From External File/Storage System
          • Expanded. Click to collapseCollapsed. Click to show2.5 Add Data to Existing Tables
            • Hierarchy leaf2.5.1 Update Existing Index
          • Hierarchy leaf2.6 Delete Table Records (Rows)
          • Hierarchy leaf2.7 Delete Tables or Datasources
          • Hierarchy leaf2.8 Monitoring Indexing Tasks
          • Hierarchy leaf2.9 View Datasource Stats
        • Expanded. Click to collapseCollapsed. Click to show3. BI Integration
          • Hierarchy leaf3.1 MVInsight Integration
          • Hierarchy leaf3.2 Tableau Integration with Impulse
        • Expanded. Click to collapseCollapsed. Click to show3. Security, Roles and Privilege Management
          • Hierarchy leaf3.1 Enable SSL
          • Hierarchy leaf3.2 Securing Backend SQL Engine
          • Hierarchy leaf3.3 Sharing and Access Control
          • Hierarchy leaf3.4 Add User
          • Hierarchy leaf3.5 Edit User
          • Hierarchy leaf3.6 Create Role
          • Hierarchy leaf3.7 Delete Role
          • Hierarchy leaf3.8 Assign Role to User
          • Hierarchy leaf3.9 Edit User Privilege
        • Expanded. Click to collapseCollapsed. Click to show4. System Administration
          • Hierarchy leaf4.1 System Configuration
          • Hierarchy leaf4.2 Managing System Services
        • Expanded. Click to collapseCollapsed. Click to show5. Impulse DW Restful API
          • Hierarchy leaf5.1 API Token
          • Hierarchy leaf5.2 API Reference
        • Expanded. Click to collapseCollapsed. Click to show9. Release Notes
          • Hierarchy leaf9.1 Open Source Software Components and Libraries
      • Expanded. Click to collapseCollapsed. Click to showInset BI
        • Hierarchy leaf1. Getting Started with Inset BI
        • Hierarchy leaf2. Connecting to Databases
        • Hierarchy leaf3. Visualizing Data
        • Hierarchy leaf4. SQL Lab
        • Hierarchy leaf5. User Management and Access Control
        • Hierarchy leaf6. Working with Semantic Layer
        • Hierarchy leaf7. Alerts and Reports
        • Hierarchy leaf8. Release Notes
      • Hierarchy leafInset BI
      • Expanded. Click to collapseCollapsed. Click to showMLOPs
        • Hierarchy leaf1. Getting Started with MLOps
        • Hierarchy leaf2. Deploying ML Models
        • Hierarchy leaf3. Predicting
        • Hierarchy leaf4. Monitoring
        • Hierarchy leaf5. Access Control
        • Hierarchy leafRelease Notes
      • Expanded. Click to collapseCollapsed. Click to showMomentum User Guide
        • Hierarchy leaf1. Getting Started With Momentum
        • Expanded. Click to collapseCollapsed. Click to show2. ETL
          • Hierarchy leaf1.1 Ingester
          • Hierarchy leaf1.2 Transformer
          • Hierarchy leaf1.3 Custom Processor
          • Hierarchy leaf1.4 Emitter
          • Hierarchy leaf1.5 Data Pipeline
        • Expanded. Click to collapseCollapsed. Click to show3. Machine Learning
          • Hierarchy leaf3.1 Model Training
          • Hierarchy leaf3.2 ML Prediction
          • Hierarchy leaf3.3 Computer Vision
          • Hierarchy leaf3.4 NLP
        • Hierarchy leaf4. Streaming
        • Hierarchy leaf5. Process Automation
        • Hierarchy leaf6. Visualization
  • Impulse DW
    • Table of contents
      • Expanded. Click to collapseCollapsed. Click to showImpulse DW
        • Expanded. Click to collapseCollapsed. Click to show1. Registration and Account Management
          • Hierarchy leaf1.1 Signup
          • Hierarchy leaf1.2 Password Change
          • Hierarchy leaf1.3 Profile Management
          • Hierarchy leaf1.4 Forgot Password
        • Expanded. Click to collapseCollapsed. Click to show2. Warehouse Management
          • Hierarchy leaf2.1 Create a Warehouse
          • Hierarchy leaf2.2 Edit Warehouse
          • Hierarchy leaf2.3 Datasources In Warehouse
          • Expanded. Click to collapseCollapsed. Click to show2.4 Ingesting Data Into Tables or Datasources
            • Hierarchy leaf2.4.1 Ingesting From Momentum Data Pipeline
            • Hierarchy leaf2.4.2 Uploading File Using Impulse UI
            • Hierarchy leaf2.4.3 Ingesting From External File/Storage System
          • Expanded. Click to collapseCollapsed. Click to show2.5 Add Data to Existing Tables
            • Hierarchy leaf2.5.1 Update Existing Index
          • Hierarchy leaf2.6 Delete Table Records (Rows)
          • Hierarchy leaf2.7 Delete Tables or Datasources
          • Hierarchy leaf2.8 Monitoring Indexing Tasks
          • Hierarchy leaf2.9 View Datasource Stats
        • Expanded. Click to collapseCollapsed. Click to show3. BI Integration
          • Hierarchy leaf3.1 MVInsight Integration
          • Hierarchy leaf3.2 Tableau Integration with Impulse
        • Expanded. Click to collapseCollapsed. Click to show3. Security, Roles and Privilege Management
          • Hierarchy leaf3.1 Enable SSL
          • Hierarchy leaf3.2 Securing Backend SQL Engine
          • Hierarchy leaf3.3 Sharing and Access Control
          • Hierarchy leaf3.4 Add User
          • Hierarchy leaf3.5 Edit User
          • Hierarchy leaf3.6 Create Role
          • Hierarchy leaf3.7 Delete Role
          • Hierarchy leaf3.8 Assign Role to User
          • Hierarchy leaf3.9 Edit User Privilege
        • Expanded. Click to collapseCollapsed. Click to show4. System Administration
          • Hierarchy leaf4.1 System Configuration
          • Hierarchy leaf4.2 Managing System Services
        • Expanded. Click to collapseCollapsed. Click to show5. Impulse DW Restful API
          • Hierarchy leaf5.1 API Token
          • Hierarchy leaf5.2 API Reference
        • Expanded. Click to collapseCollapsed. Click to show9. Release Notes
          • Hierarchy leaf9.1 Open Source Software Components and Libraries
      • Expanded. Click to collapseCollapsed. Click to showInset BI
        • Hierarchy leaf1. Getting Started with Inset BI
        • Hierarchy leaf2. Connecting to Databases
        • Hierarchy leaf3. Visualizing Data
        • Hierarchy leaf4. SQL Lab
        • Hierarchy leaf5. User Management and Access Control
        • Hierarchy leaf6. Working with Semantic Layer
        • Hierarchy leaf7. Alerts and Reports
        • Hierarchy leaf8. Release Notes
      • Hierarchy leafInset BI
      • Expanded. Click to collapseCollapsed. Click to showMLOPs
        • Hierarchy leaf1. Getting Started with MLOps
        • Hierarchy leaf2. Deploying ML Models
        • Hierarchy leaf3. Predicting
        • Hierarchy leaf4. Monitoring
        • Hierarchy leaf5. Access Control
        • Hierarchy leafRelease Notes
      • Expanded. Click to collapseCollapsed. Click to showMomentum User Guide
        • Hierarchy leaf1. Getting Started With Momentum
        • Expanded. Click to collapseCollapsed. Click to show2. ETL
          • Hierarchy leaf1.1 Ingester
          • Hierarchy leaf1.2 Transformer
          • Hierarchy leaf1.3 Custom Processor
          • Hierarchy leaf1.4 Emitter
          • Hierarchy leaf1.5 Data Pipeline
        • Expanded. Click to collapseCollapsed. Click to show3. Machine Learning
          • Hierarchy leaf3.1 Model Training
          • Hierarchy leaf3.2 ML Prediction
          • Hierarchy leaf3.3 Computer Vision
          • Hierarchy leaf3.4 NLP
        • Hierarchy leaf4. Streaming
        • Hierarchy leaf5. Process Automation
        • Hierarchy leaf6. Visualization
  • Momentum User Guide
    • Table of contents
      • Expanded. Click to collapseCollapsed. Click to showImpulse DW
        • Expanded. Click to collapseCollapsed. Click to show1. Registration and Account Management
          • Hierarchy leaf1.1 Signup
          • Hierarchy leaf1.2 Password Change
          • Hierarchy leaf1.3 Profile Management
          • Hierarchy leaf1.4 Forgot Password
        • Expanded. Click to collapseCollapsed. Click to show2. Warehouse Management
          • Hierarchy leaf2.1 Create a Warehouse
          • Hierarchy leaf2.2 Edit Warehouse
          • Hierarchy leaf2.3 Datasources In Warehouse
          • Expanded. Click to collapseCollapsed. Click to show2.4 Ingesting Data Into Tables or Datasources
            • Hierarchy leaf2.4.1 Ingesting From Momentum Data Pipeline
            • Hierarchy leaf2.4.2 Uploading File Using Impulse UI
            • Hierarchy leaf2.4.3 Ingesting From External File/Storage System
          • Expanded. Click to collapseCollapsed. Click to show2.5 Add Data to Existing Tables
            • Hierarchy leaf2.5.1 Update Existing Index
          • Hierarchy leaf2.6 Delete Table Records (Rows)
          • Hierarchy leaf2.7 Delete Tables or Datasources
          • Hierarchy leaf2.8 Monitoring Indexing Tasks
          • Hierarchy leaf2.9 View Datasource Stats
        • Expanded. Click to collapseCollapsed. Click to show3. BI Integration
          • Hierarchy leaf3.1 MVInsight Integration
          • Hierarchy leaf3.2 Tableau Integration with Impulse
        • Expanded. Click to collapseCollapsed. Click to show3. Security, Roles and Privilege Management
          • Hierarchy leaf3.1 Enable SSL
          • Hierarchy leaf3.2 Securing Backend SQL Engine
          • Hierarchy leaf3.3 Sharing and Access Control
          • Hierarchy leaf3.4 Add User
          • Hierarchy leaf3.5 Edit User
          • Hierarchy leaf3.6 Create Role
          • Hierarchy leaf3.7 Delete Role
          • Hierarchy leaf3.8 Assign Role to User
          • Hierarchy leaf3.9 Edit User Privilege
        • Expanded. Click to collapseCollapsed. Click to show4. System Administration
          • Hierarchy leaf4.1 System Configuration
          • Hierarchy leaf4.2 Managing System Services
        • Expanded. Click to collapseCollapsed. Click to show5. Impulse DW Restful API
          • Hierarchy leaf5.1 API Token
          • Hierarchy leaf5.2 API Reference
        • Expanded. Click to collapseCollapsed. Click to show9. Release Notes
          • Hierarchy leaf9.1 Open Source Software Components and Libraries
      • Expanded. Click to collapseCollapsed. Click to showInset BI
        • Hierarchy leaf1. Getting Started with Inset BI
        • Hierarchy leaf2. Connecting to Databases
        • Hierarchy leaf3. Visualizing Data
        • Hierarchy leaf4. SQL Lab
        • Hierarchy leaf5. User Management and Access Control
        • Hierarchy leaf6. Working with Semantic Layer
        • Hierarchy leaf7. Alerts and Reports
        • Hierarchy leaf8. Release Notes
      • Hierarchy leafInset BI
      • Expanded. Click to collapseCollapsed. Click to showMLOPs
        • Hierarchy leaf1. Getting Started with MLOps
        • Hierarchy leaf2. Deploying ML Models
        • Hierarchy leaf3. Predicting
        • Hierarchy leaf4. Monitoring
        • Hierarchy leaf5. Access Control
        • Hierarchy leafRelease Notes
      • Expanded. Click to collapseCollapsed. Click to showMomentum User Guide
        • Hierarchy leaf1. Getting Started With Momentum
        • Expanded. Click to collapseCollapsed. Click to show2. ETL
          • Hierarchy leaf1.1 Ingester
          • Hierarchy leaf1.2 Transformer
          • Hierarchy leaf1.3 Custom Processor
          • Hierarchy leaf1.4 Emitter
          • Hierarchy leaf1.5 Data Pipeline
        • Expanded. Click to collapseCollapsed. Click to show3. Machine Learning
          • Hierarchy leaf3.1 Model Training
          • Hierarchy leaf3.2 ML Prediction
          • Hierarchy leaf3.3 Computer Vision
          • Hierarchy leaf3.4 NLP
        • Hierarchy leaf4. Streaming
        • Hierarchy leaf5. Process Automation
        • Hierarchy leaf6. Visualization
  • Wiki
    • Table of contents
      • Expanded. Click to collapseCollapsed. Click to showImpulse DW
        • Expanded. Click to collapseCollapsed. Click to show1. Registration and Account Management
          • Hierarchy leaf1.1 Signup
          • Hierarchy leaf1.2 Password Change
          • Hierarchy leaf1.3 Profile Management
          • Hierarchy leaf1.4 Forgot Password
        • Expanded. Click to collapseCollapsed. Click to show2. Warehouse Management
          • Hierarchy leaf2.1 Create a Warehouse
          • Hierarchy leaf2.2 Edit Warehouse
          • Hierarchy leaf2.3 Datasources In Warehouse
          • Expanded. Click to collapseCollapsed. Click to show2.4 Ingesting Data Into Tables or Datasources
            • Hierarchy leaf2.4.1 Ingesting From Momentum Data Pipeline
            • Hierarchy leaf2.4.2 Uploading File Using Impulse UI
            • Hierarchy leaf2.4.3 Ingesting From External File/Storage System
          • Expanded. Click to collapseCollapsed. Click to show2.5 Add Data to Existing Tables
            • Hierarchy leaf2.5.1 Update Existing Index
          • Hierarchy leaf2.6 Delete Table Records (Rows)
          • Hierarchy leaf2.7 Delete Tables or Datasources
          • Hierarchy leaf2.8 Monitoring Indexing Tasks
          • Hierarchy leaf2.9 View Datasource Stats
        • Expanded. Click to collapseCollapsed. Click to show3. BI Integration
          • Hierarchy leaf3.1 MVInsight Integration
          • Hierarchy leaf3.2 Tableau Integration with Impulse
        • Expanded. Click to collapseCollapsed. Click to show3. Security, Roles and Privilege Management
          • Hierarchy leaf3.1 Enable SSL
          • Hierarchy leaf3.2 Securing Backend SQL Engine
          • Hierarchy leaf3.3 Sharing and Access Control
          • Hierarchy leaf3.4 Add User
          • Hierarchy leaf3.5 Edit User
          • Hierarchy leaf3.6 Create Role
          • Hierarchy leaf3.7 Delete Role
          • Hierarchy leaf3.8 Assign Role to User
          • Hierarchy leaf3.9 Edit User Privilege
        • Expanded. Click to collapseCollapsed. Click to show4. System Administration
          • Hierarchy leaf4.1 System Configuration
          • Hierarchy leaf4.2 Managing System Services
        • Expanded. Click to collapseCollapsed. Click to show5. Impulse DW Restful API
          • Hierarchy leaf5.1 API Token
          • Hierarchy leaf5.2 API Reference
        • Expanded. Click to collapseCollapsed. Click to show9. Release Notes
          • Hierarchy leaf9.1 Open Source Software Components and Libraries
      • Expanded. Click to collapseCollapsed. Click to showInset BI
        • Hierarchy leaf1. Getting Started with Inset BI
        • Hierarchy leaf2. Connecting to Databases
        • Hierarchy leaf3. Visualizing Data
        • Hierarchy leaf4. SQL Lab
        • Hierarchy leaf5. User Management and Access Control
        • Hierarchy leaf6. Working with Semantic Layer
        • Hierarchy leaf7. Alerts and Reports
        • Hierarchy leaf8. Release Notes
      • Hierarchy leafInset BI
      • Expanded. Click to collapseCollapsed. Click to showMLOPs
        • Hierarchy leaf1. Getting Started with MLOps
        • Hierarchy leaf2. Deploying ML Models
        • Hierarchy leaf3. Predicting
        • Hierarchy leaf4. Monitoring
        • Hierarchy leaf5. Access Control
        • Hierarchy leafRelease Notes
      • Expanded. Click to collapseCollapsed. Click to showMomentum User Guide
        • Hierarchy leaf1. Getting Started With Momentum
        • Expanded. Click to collapseCollapsed. Click to show2. ETL
          • Hierarchy leaf1.1 Ingester
          • Hierarchy leaf1.2 Transformer
          • Hierarchy leaf1.3 Custom Processor
          • Hierarchy leaf1.4 Emitter
          • Hierarchy leaf1.5 Data Pipeline
        • Expanded. Click to collapseCollapsed. Click to show3. Machine Learning
          • Hierarchy leaf3.1 Model Training
          • Hierarchy leaf3.2 ML Prediction
          • Hierarchy leaf3.3 Computer Vision
          • Hierarchy leaf3.4 NLP
        • Hierarchy leaf4. Streaming
        • Hierarchy leaf5. Process Automation
        • Hierarchy leaf6. Visualization
  • Documents
You are here:
  • Momentum User Guide
  • 2. ETL
  • 1.5 Data Pipeline

Content

1.5 Data Pipeline

  • More
    • Table of contents

A data pipeline is a sequence of execution of one or more data processing units. For example, a data pipeline may contain one or more ingesters, transformer, custom processing and emitter.

To create a data pipeline:

  1. Create one or more ingesters. See Instructions here.
  2. Create a transformer that may contain one or more SQL statements within it. Only one transformer per pipeline is allowed. Therefore, include all relevant SQL statements must be included in a single transformer. See instructions on how to create a transformer containing multiple SQL statements.
  3. If your data processing needs any custom processor, create one to be included in the pipeline. See instructions on how to create a processor.
  4. Create an emitter if the processed data need to be stored outside of Momentum storage (for example, index in Impulse DW, MongoDB, MySQL, Oracle etc). See instructions on how to create an emitter.
  5. Create a data pipeline and add all requirement components to it. See below for more details.

A few example pipelines:

  • one or more ingesters --> one transformer --> one or more processors --> one emitter
  • one or more ingesters --> emitter
  • one transformer --> emitter
  • one transformer --> one or more processors --> emitter
  • a single ingester --> emitter

If emitter is omitted, the processed data of the pipeline is stored within the distributed file system, e.g HFDS, Momentum is running on.

Creating A Data Pipeline

  1. Expand "Data Pipeline" menu (under ETL section) from the main menu options --> click "Pipeline Home".
  2. Click "Create New Pipeline" from the top menu options
  3. Fill out the form fields:
    1. Name: a user defined unique name to identify the pipeline
    2. Core: Number of cluster cores to execute the pipeline job in distributed and parallel mode. For a big dataset and complex pipeline execution, allocate as much core as you have it available to speed up the execution.
    3. Memory: RAM per core. 4GB default works for most cases. Tune if required.
    4. Output Format: If no emitter is attached to this pipeline, the data is stored within the Momentum's distributed file system (HDFS). Specify the output file format.
    5. Run Mode:
      1. On demand: You will need to manually execute the pipeline by clicking the "Run" button.
      2. Scheduled: Specify a Linux style cron expression to schedule the execution of the pipeline in an automated mode. Here is an online tool to create cron expressions.
    6. Storage mode: Used only if no emitter is attached to this pipeline.
    7. Log Input and output Count: If select yes, it will generate the count of processed data for auditing and inventory purpose. This is an expensive process and should be avoided if count is not necessary.
    8. Submit the form to save it.
  4. Once the pipeline form is submitted, you will need to add processing units to it. Here are the steps: 
    1. Add one or more ingesters: expand ingester menu --> click on the ingester you want to add --> a rectangular widget is added on the main canvas.
    2. Add a transformer: expand transformer menu --> click on the transformer you want to add --> a rectangular widget is added on the main canvas.
    3. To add a new processor (not already created): Click "Add Processor" button located at the top of the pipeline canvas. Fill out the form to add to the canvas.
    4. To add an existing processor: expand process menu --> click on the processor you want to add --> a rectangular widget is added on the main canvas.
    5. To add a new emitter (not already created): Click "Add Emitter" button located at the top of the pipeline canvas. Fill out the form to add to the canvas. For details on the form field, see the Emitter section of this wiki.
    6. To add an existing emitter: expand emitter menu --> click on the emitter you want to add to the canvas --> a rectangular widget is added on the main canvas.
  5. If needed, move the widgets around to organize. Widgets may overlap if the canvas size is small. Drag the overlapped widgets to separate them out.
  6. Once all widgets are laid out on the canvas, connect them by clicking on the output tip of one widget to the input tail of the other widget. See Figure 2 below for an example pipeline with connected units.
  7. To connect the units, click on the "out" tip and drag the arrow and click on the "in" tip.
  8. Save the pipeline by clicking the "Save" button. You may need to scroll down to see the "save" button.

Running Data Pipeline

To run the data pipeline:

  1. From the pipeline home page, click on the checkbox corresponding to the pipeline you want to run.
  2. Click "Run" button located at the top menu bar.
  3. When the pipeline starts running, it will show the status of execution of each units that are included in the pipeline. When all units complete execution, the pipeline status will show as "complete" and result as "success".

Figure 1: Screen showing pipeline home and menu options

Figure 2: Example pipeline with the connected units

Loading...