Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 514 Vote(s) - 3.47 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I output the results of a HiveQL query to CSV?

#11
In case you are doing it from Windows you can use Python script [hivehoney][1] to extract table data to local CSV file.

It will:

1. Login to bastion host.
2. pbrun.
3. kinit.
4. beeline (with your query).
5. Save echo from beeline to a file on Windows.

Execute it like this:

set PROXY_HOST=your_bastion_host

set SERVICE_USER=you_func_user

set LINUX_USER=your_SOID

set LINUX_PWD=your_pwd

python hh.py --query_file=query.sql

[1]:

[To see links please register here]


Reply

#12
I tried various options, but this would be one of the simplest solution for `Python` `Pandas`:

hive -e 'select books from table' | grep "|" ' > temp.csv

df=pd.read_csv("temp.csv",sep='|')

You can also use `tr "|" ","` to convert "|" to ","
Reply

#13
Just to cover more following steps after kicking off the query:
`INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;`

In my case, the generated data under temp folder is in `deflate` format,
and it looks like this:
```
$ ls
000000_0.deflate
000001_0.deflate
000002_0.deflate
000003_0.deflate
000004_0.deflate
000005_0.deflate
000006_0.deflate
000007_0.deflate
```
Here's the command to unzip the deflate files and put everything into one csv file:

```
hadoop fs -text "file:///home/lvermeer/temp/*" > /home/lvermeer/result.csv
```
Reply

#14

hive --outputformat=csv2 -e "select * from yourtable" > my_file.csv

or

hive --outputformat=csv2 -e "select * from yourtable" > [your_path]/file_name.csv


For tsv, just change csv to tsv in the above queries and run your queries


Reply

#15
I may be late to this one, but would help with the answer:

echo "COL_NAME1|COL_NAME2|COL_NAME3|COL_NAME4" > SAMPLE_Data.csv
hive -e '
select distinct concat(COL_1, "|",
COL_2, "|",
COL_3, "|",
COL_4)
from table_Name where clause if required;' >> SAMPLE_Data.csv
Reply

#16
This shell command prints the output format in csv to `output.txt` without the column headers.

```
$ hive --outputformat=csv2 -f 'hivedatascript.hql' --hiveconf hive.cli.print.header=false > output.txt
```
Reply

#17
This is most csv friendly way I found to output the results of HiveQL.
You don't need any grep or sed commands to format the data, instead hive supports it, just need to add extra tag of outputformat.

hive --outputformat=csv2 -e 'select * from <table_name> limit 20' > /path/toStore/data/results.csv
Reply

#18
Use the command:

**hive -e "use [database_name]; select * from [table_name] LIMIT 10;" > /path/to/file/my_file_name.csv**


I had a [huge dataset][1] whose details I was trying to organize and determine the types of attacks and the numbers of each type. An example that I used on my practice that worked (and had a little more details) goes something like this:



hive -e "use DataAnalysis;
select attack_cat,
case when attack_cat == 'Backdoor' then 'Backdoors'
when length(attack_cat) == 0 then 'Normal'
when attack_cat == 'Backdoors' then 'Backdoors'
when attack_cat == 'Fuzzers' then 'Fuzzers'
when attack_cat == 'Generic' then 'Generic'
when attack_cat == 'Reconnaissance' then 'Reconnaissance'
when attack_cat == 'Shellcode' then 'Shellcode'
when attack_cat == 'Worms' then 'Worms'
when attack_cat == 'Analysis' then 'Analysis'
when attack_cat == 'DoS' then 'DoS'
when attack_cat == 'Exploits' then 'Exploits'
when trim(attack_cat) == 'Fuzzers' then 'Fuzzers'
when trim(attack_cat) == 'Shellcode' then 'Shellcode'
when trim(attack_cat) == 'Reconnaissance' then 'Reconnaissance' end,
count(*) from actualattacks group by attack_cat;">/root/data/output/results2.csv


[1]:

[To see links please register here]

Reply

#19
Although it is possible to use `INSERT OVERWRITE` to get data out of Hive, it might not be the best method for your particular case. First let me explain what `INSERT OVERWRITE` does, then I'll describe the method I use to get tsv files from Hive tables.

According to [the manual][1], your query will store the data in a directory in HDFS. The format will not be csv.

> Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the `LOCAL` keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
[lvermeer@hadoop temp]$ head 000000_0
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on [this patch-3682][2], I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;

[1]:

[To see links please register here]

[2]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through