Unix shells are powerful. Scripts are useful.
I am working on a project for which I want to know if the master data is already present in my graph as I’ll process CSV files.
So, the idea is to get all the distinct unique values from a given column in all the csv file and count the
From this list, I then create an array that I use in a Cypher query like
MATCH (n:Label) WHERE n.property IN [The Array I created with the values] RETURN count(n)
Here is the script, in Bash, the trick is to create an awk command for each file. It take an integer as parameter, 1 for the first column and so on (not 0, 0 is for the whole line for awk)
#!/bin/bash
#Get the unique values in a given csv file by providing a column number (starts at 1 in awk)
#$1 means first param value of this script, should be an integer
# will form with the $ in the string the column number for awk (again $1 = first column, because $0 is whole line)
# awk 'BEGIN {FS=';'}{print $3}END{}' $file >> sumFile.txt
theline="BEGIN{FS=\";\"}{print $"$1"}END{}"
filename=/tmp/getUniqueValues_sh.date +%F
touch $filename
for file in *_*.csv;
do
command="/usr/bin/awk '"$theline"' "$file" >>"$filename
eval $command
done
# display unique values
cat $filename |sort|uniq|more
rm $filename
The script writes in a temp file in /tmp that is removed after use.
Find the file in https://drive.google.com/file/d/1Jo5tY7IaEek4NFRTZWea4TvSw8KdKC45/view?usp=sharing
The column separator is ; you can change it easily
Execute it like this “getUniqueValues.sh 2” for the second column