Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 514 Vote(s) - 3.47 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I output the results of a HiveQL query to CSV?

#1
we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?
Reply

#2
You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.

You also might have to do some trickery to convert the files from '\001' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.
Reply

#3
If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):

hive -e 'select books from table' | sed 's/[[:space:]]\+/,/g' > /home/lvermeer/temp.csv

Reply

#4
If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.
Reply

#5
I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

To make the column data tsv safe, I replaced all \t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

hive -e 'tab_replaced_hql_query' | python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)\nfor row in reader: writer.writerow(row)")'

This created a perfectly valid csv. Hope this helps those who come looking for this solution.
Reply

#6
You can use hive string function `CONCAT_WS( string delimiter, string str1, string str2...strn )`

for ex:

hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv

Reply

#7
You can use `INSERT` … `DIRECTORY` …, as in this example:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
SELECT name, salary, address
FROM employees
WHERE se.state = 'CA';

`OVERWRITE` and `LOCAL` have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to `/tmp/ca_employees`, depending on the number of reducers invoked.
Reply

#8
The default separator is "`^A`". In python language, it is "`\x01`".

When I want to change the delimiter, I use SQL like:

SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table

Then, regard delimiter+"`^A`" as a new delimiter.
Reply

#9
I had a similar issue and this is how I was able to address it.

**Step 1** - Loaded the data from Hive table into another table as follows


DROP TABLE IF EXISTS TestHiveTableCSV;
CREATE TABLE TestHiveTableCSV
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' AS
SELECT Column List FROM TestHiveTable;

**Step 2** - Copied the blob from Hive warehouse to the new location with appropriate extension

Start-AzureStorageBlobCopy
-DestContext $destContext
-SrcContainer "Source Container"
-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
-DestContainer "Destination Container"
-DestBlob "CSV/TestHiveTable.csv"
Reply

#10
Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through