Leave a comment

Some Bash magic, graph, csv


Unix shells are powerful. Scripts are useful.

I am working on a project for which I want to know if the master data is already present in my graph as I’ll process CSV files.

So, the idea is to get all the distinct unique values from a given column in all the csv file and count the

From this list, I then create an array that I use in a Cypher query like

MATCH (n:Label) 
WHERE n.property IN [The Array I created with the values]
RETURN count(n)

Here is the script, in Bash, the trick is to create an awk command for each file. It take an integer as parameter, 1 for the first column and so on (not 0, 0 is for the whole line for awk)

#!/bin/bash
#Get the unique values in a given csv file by providing a column number (starts at 1 in awk)
#$1 means first param value of this script, should be an integer
# will form with the $ in the string the column number for awk (again $1 = first column, because $0 is whole line)
# awk 'BEGIN {FS=';'}{print $3}END{}' $file >> sumFile.txt
theline="BEGIN{FS=\";\"}{print $"$1"}END{}"
filename=/tmp/getUniqueValues_sh.date +%F
touch $filename
for file in *_*.csv;
do
   command="/usr/bin/awk '"$theline"' "$file" >>"$filename
   eval $command
done
# display unique values
cat $filename |sort|uniq|more
rm $filename

The script writes in a temp file in /tmp that is removed after use.

Find the file in https://drive.google.com/file/d/1Jo5tY7IaEek4NFRTZWea4TvSw8KdKC45/view?usp=sharing

The column separator is ; you can change it easily

Execute it like this “getUniqueValues.sh 2” for the second column

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: