📚
Tech-Posts
  • README
  • Kafka + Maxwell
  • Kafka
  • Docker
  • MySQL connection via SSH
  • Python
    • Django
    • PyCharm+Docker Dev
    • Pip Tools
    • python project with local packages
  • PHP
    • PhpStorm+Docker Dev
  • Cassandra
  • AWS
    • Cheat Sheet
    • Lambda with Kinesis Event Source Mapping
  • AWS DMS
  • Lambda demo function to produce to Kinesis
  • Deploy a static web page with protection of specific static resources on AWS S3
  • Data Engineer
    • Move Salesforce Files out using Pentaho DI
  • A Pentaho DI Project Readme
  • PowerBI
    • Power BI refer to previous row
Powered by GitBook
On this page
  • Install Java
  • Install Pentaho Community Edition
  • Configuration
  • Kitchen 命令行运行job
  • 先设置os环境变量KETTLE_HOME到项目目录, 然后运行对应命令
  • Windows
  • Linux
  • Jobs in our case

Was this helpful?

A Pentaho DI Project Readme

Install Java

Installed Java

Install Pentaho Community Edition

download released build and unzip.

download page: https://wiki.pentaho.com/display/COM/Community+Edition+Downloads

direct download link: https://jaist.dl.sourceforge.net/project/pentaho/Pentaho 8.0/client-tools/pdi-ce-8.0.0.0-28.zip

p.s:

如果需要连接MySql数据库,需要下载放置连接Java驱动包

进入页面 https://dev.mysql.com/downloads/connector/j/ 下载。

direct download link: https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.zip

unzip and put into pentaho's lib dir

Configuration

  1. Pentaho环境配置

在project新建.kettle目录,在该目录下建立:

  • kettle.properties,配置环境变量;

  • repositories.xml: 配置Repository信息,特别是更改repository路径到project目录;

Kitchen 命令行运行job

先设置os环境变量KETTLE_HOME到项目目录, 然后运行对应命令

Windows

kitchen.bat /file:C:\Users\shawnwang\Downloads\pentaho_project1\crm_init_job.kjb /level:Basic /param "p_use_local_tmp=0"

with repo

kitchen.bat /rep:test_repo /job:crm_init_job /param "p_use_local_tmp=0"

Linux

kitchen.sh -file=/PRD/updateWarehouse.kjb -level=Minimal -param:p_use_local_tmp=0

with repo

kitchen.sh -rep:test_repo -job:job_test1 -param:p_use_local_tmp=0

Jobs in our case

  • J_exp_salesforce_files_data

仅把salesforce文件数据拉到本地临时表,(可配置是否同时生成文件)

kitchen.bat /rep:test_repo /job:J_exp_salesforce_files_data /param "p_gen_file=1"

参数: 1. p_gen_file:

1 : 导入Files数据的同时,会生成文件 0 (default): 不会生成文件,只导入Files数据到临时表

  1. p_inc_mode:

1 (default): 使用增量模式,会去查临时表最大时间,从该时间开始 0 : 全量模式,从一个默认很老的时间开始

e.g:

./kitchen.sh -rep:test_repo -job:J_exp_salesforce_files_data -level=Basic
  • J_gen_files_from_salesforce_tmp_table

根据salesforce本地临时表的数据生成文件

参数: 1. p_begin_id:

begin min id to process (default: 0)

kitchen.bat /rep:test_repo /job:J_gen_files_from_salesforce_tmp_table
  • crm_init_job

全量拉salesforce数据到本地CRM系统

参数: p_use_local_tmp:

1 (default): 使用已有的本地salesforce tmp文件数据 0: 先全量拉取salesforce文件数据。

kitchen.bat /rep:test_repo /job:crm_init_job /param "p_use_local_tmp=0"
PreviousMove Salesforce Files out using Pentaho DINextPower BI refer to previous row

Last updated 4 years ago

Was this helpful?