I have some hive external tables, I want to import the hive data into Splice machine and perform some updates / inserts into the data and export back to hive. My questions are
1. What is the best way to perform the import and export between hive and Splice machine
2. Do I really need to do import / export? Is there any other way in Splice machine to expose the hive data to perform inserts/updates/ deletes using Splice machine
In current release, you have to perform export from hive and import into to Splice.
We are working on adding new features in our next releases.
– Add VTI support for both ORC and Parquet files. This will allow you to directly read Hive data from Splice. And be able to import data into Splice from Hive ORC or Parquet files.
– Support for External Tables. This allows you to create External Tables in Splice pointing to files as source of data. The supported file formats are Text, ORC or Parquet. Note that data can be inserted into Splice External tables, and CANNOT be updated/deleted. So you can create External Table in Splice to point to the Hive data in either ORC or Parquet format, and add insert data from Splice.
Whether Splice import utility syscs_util.import_data help to copy data from hive to splice? In current system data transfer between hive and oracle done using sqooping. planning to replace oracle with splice as sqooping taking lot of time for data transfer.
Also want to move from splice to hive. Whether export command will help to do that?
Note: These instructions are for the Cluster deployment of Splice Machine, in CDH.
Here are the steps to import/update/export Hive table sample_01.
Assuming that table sample_01 has following columns
Data in Hive table will be exported in csv format to a folder in HDFS.
> sudo -su hdfs hadoop fs -mkdir /hiveout
> sudo -su hdfs hadoop fs -chown scott:suoergroup /hiveout
> !connect jdbc:hive2://stl-colo-srv132:10000 scott tiger
jdbc:hive2://stl-colo-srv131:10000> INSERT OVERWRITE DIRECTORY '/hiveout/120916_02' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001'NULL DEFINED AS '' select * from sample_01;
Note : The delimiter of CtrlA (\001) is used, since the char comma may be part of data
CALL SYSCS_UTIL.IMPORT_DATA('hivetest', 'sample_01', NULL, '/hiveout/120916_01', '^A', NULL, NULL, NULL, NULL, -1, '/BAD', TRUE, NULL);
Note: Make sure that the ctrlA character in the command is ctrlA and not the chars ^ followed by A, if in doubt, delete , and retype by typing CtrlU followed by CtrlA
splice> create table hivetest.sample_01 (code varchar(20), description varchar(250), total_emp int, salary int);
splice> run 'import_hive_data.sql';
Note: If you save the file with different name or path, provide the correct pathand file name
splice> select * from hivetest.sample_01;
You can update the data using update sql command from splice command shell. To launch splice command shell, type sqlshell.sh from any region server.
> sudo -su hdfs hadoop fs -mkdir /hivein
> sudo -su hdfs hadoop fs -chown hbase:hbase /hivein
splice>EXPORT('/hivein', false, null, 'utf-8', '|', null) SELECT * from hivetest.sample_01;
Note: This will create files in hdfs folder /hivein, depending on size of the table, there may be 1 or more files.
sudo -su hdfs hadoop fs -rm /hivein/_SUCCESS
> CREATE EXTERNAL TABLE sample_01_ext ( code string, description string, total_emp int, salary int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/hivein' TBLPROPERTIES('serialization.null.format'='');
Note: Use table name that is different from the original table (here just appended a prefix of _ext). ‘|’ is used as delimiter
>INSERT OVERWRITE TABLE sample_01 select * from sample_01_ext;
You must be logged in to reply to this topic.