Hive To Splice Machine

Splice Machine Community Forum Connecting to Splice Hive To Splice Machine

Tagged: ,

This topic contains 3 replies, has 3 voices, and was last updated by  Community Manager 3 months, 2 weeks ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #766227

    kshitizj
    Participant

    I have some hive external tables, I want to import the hive data into Splice machine and perform some updates / inserts into the data and export back to hive. My questions are
    1. What is the best way to perform the import and export between hive and Splice machine
    2. Do I really need to do import / export? Is there any other way in Splice machine to expose the hive data to perform inserts/updates/ deletes using Splice machine

    #766235

    In current release, you have to perform export from hive and import into to Splice.

    We are working on adding new features in our next releases.
    – Add VTI support for both ORC and Parquet files. This will allow you to directly read Hive data from Splice. And be able to import data into Splice from Hive ORC or Parquet files.
    – Support for External Tables. This allows you to create External Tables in Splice pointing to files as source of data. The supported file formats are Text, ORC or Parquet. Note that data can be inserted into Splice External tables, and CANNOT be updated/deleted. So you can create External Table in Splice to point to the Hive data in either ORC or Parquet format, and add insert data from Splice.

    #766245

    dhanyavv
    Participant

    Whether Splice import utility syscs_util.import_data help to copy data from hive to splice? In current system data transfer between hive and oracle done using sqooping. planning to replace oracle with splice as sqooping taking lot of time for data transfer.
    Also want to move from splice to hive. Whether export command will help to do that?

    #766256

    Note: These instructions are for the Cluster deployment of Splice Machine, in CDH.

    Here are the steps to import/update/export Hive table sample_01.

    Assuming that table sample_01 has following columns

    code string,
    description string,
    total_emp int,
    salary int

    Export data from Hive table

    Data in Hive table will be exported in csv format to a folder in HDFS.

    1. Create a folder in HDFS to export data to, say /hiveout using the following command from one of the cluster servers
      > sudo -su hdfs hadoop fs -mkdir /hiveout
    2. Make sure that the hive database user has permissions to write to hdfs. If the hive db user is scot, you can change the owner of the folder with
    3. > sudo -su hdfs hadoop fs -chown scott:suoergroup /hiveout
    4. Now start Hive prompt by typing beeline, and connect to the database, this is example to connect in hive prompt
    5. > !connect jdbc:hive2://stl-colo-srv132:10000 scott tiger
    6. From hive prompt, run the insert overwrite command to export the data
    7. jdbc:hive2://stl-colo-srv131:10000> INSERT OVERWRITE  DIRECTORY '/hiveout/120916_02' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001'NULL DEFINED AS '' select * from sample_01;

    Note : The delimiter of CtrlA (\001) is used, since the char comma may be part of data

    Import data into Splice

    1. First will create import command file. In one of the region servers, create import_hive_data.sql file with following line
    CALL SYSCS_UTIL.IMPORT_DATA('hivetest', 'sample_01', NULL, '/hiveout/120916_01', '^A', NULL, NULL, NULL, NULL, -1, '/BAD', TRUE, NULL);

    Note: Make sure that the ctrlA character in the command is ctrlA and not the chars ^ followed by A, if in doubt, delete , and retype by typing CtrlU followed by CtrlA

    1. Launch Splice commandline from any regionserver,
    > sqlshell.sh
    1. Create table in splice by typing the following command in sqlshell
    splice> create table hivetest.sample_01 (code varchar(20), description varchar(250), total_emp int, salary int);
    1. Now to import data, will run the command in the file
    splice> run 'import_hive_data.sql';

    Note: If you save the file with different name or path, provide the correct pathand file name

    1. Now check the table has data by doing select *
    splice> select * from hivetest.sample_01;

    Update data

    You can update the data using update sql command from splice command shell. To launch splice command shell, type sqlshell.sh from any region server.

    Export Data from Splice

    1. Create a folder in HDFS to export data to, say /hivein using the following command from one of the cluster servers
    > sudo -su hdfs hadoop fs -mkdir /hivein
    1. Make sure that the hbase user has permissions to write to ths folder by changing owner
    > sudo -su hdfs hadoop fs -chown hbase:hbase /hivein
    1. From Splice sqlshell prompt, run the export command.
    splice>EXPORT('/hivein', false, null, 'utf-8', '|', null)  SELECT * from hivetest.sample_01;

    Note: This will create files in hdfs folder /hivein, depending on size of the table, there may be 1 or more files.

    1. Export creates an additional status file, remove this file
    sudo -su hdfs hadoop fs -rm /hivein/_SUCCESS

    Import Back into Hive

    1. Create external table file pointing to the data files exported from Splice from hive prompt (start beeline, and connect to the hive database)
    > CREATE EXTERNAL TABLE sample_01_ext
    		(
        			code string,
        			description string,
        			total_emp int,
        			salary int
    		)
    			ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
    				LOCATION '/hivein'
    			TBLPROPERTIES('serialization.null.format'='');
    

    Note: Use table name that is different from the original table (here just appended a prefix of _ext). ‘|’ is used as delimiter

    1. Use the insert command to imort back to the original table. Note the command is overwriting the data
    >INSERT OVERWRITE TABLE sample_01  select * from sample_01_ext;
Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.