とらりもんHOME  Index  Search  Changes  Login

とらりもん - Text data analysis Diff

  • Added parts are displayed like this.
  • Deleted parts are displayed like this.

In watershed management, we deal with many digital data, such as meteorology, hydrology, topography, etc. Many of them are described in "text" files. Therefore, skills of handling text files are important and useful.

You may think "Oh, I can do it with Excel". Yes, sometimes. But in sometimes, Excel takes too much effort. We need a substitute approach for such cases. Let's learn it.

Our exercise uses meteorology data observed in our campus of University of Tsukuba (CRiED):

http://www.ied.tsukuba.ac.jp/~hojyo/archives1.1/monthly/

The data is described as one file for each month. Download all data of all months from 2010 to 2017.

Exercise 1: Get all data of the year 2010.

Exercise 1': Do the same thing for the years from 2010 to 2017.

Exercise 2: The data format is described in this document:

[[http://www.ied.tsukuba.ac.jp/wordpress/wp-content/uploads/other_files/data_list.txt]]

Extract daily average temperature of each day in August 2010.

Exercise 3: Extract daily average temperature of each day from January 2010 to December 2010.

Exercise 3': Do the same thing for the years from 2010 to 2017.

Exercise 4: Draw a graph of Exercise 3.

Exercise 4': Do the same thing for the years from 2010 to 2017.

Exercise 5: Get the yearly average, maximum, and minimum daily temperature of 2010.

Exercise 5': Do the same thing for the years from 2010 to 2017.

Exercise 6: Get the yearly average, maximum, and minimum hourly temperature of 2010.


You can do these analyses by Excel. However, what if you need to do it for all years from 2010 to 2017? Do you need to repeat the same things for each year? Linux shell and aw can help you!!

Exercise 1': Get all data from the year 2010 to 2017.

Exercise n' (n=3, 4, 5, 6): Do the same thing for the year from 2010 to 2017.

!!Answer 1'
for y in `seq 10 17`; do for m in `seq 1 9`; do wget http://www.ied.tsukuba.ac.jp/~hojyo/archives1.1/monthly/20${y}/D${y}-0${m}.V1.1.DAT; done; done

!!Answer 3'
for y in `seq 10 17`; do
cat D${y}*csv | awk 'BEGIN{FS=","}$1==-1{print $0}' | awk 'BEGIN{FS=","}$19==-99999{$19=tp}{print NR,$19; tp=$19}' > 20${y}_Tday.txt
done

!!Answer 4'
for y in `seq 10 17`; do
gnuplot << EOF
set terminal png
set output "20${y}_Tday.png"
set xrange [0:365]
set yrange [-5:30]
set xlabel "DOY"
set ylabel "average daily temperature"
set xtics 30
set ytics 5
set size 0.7,0.5
set key right bottom
set pointsize 2.5
set nokey
plot "20${y}_Tday.txt" using 1:2 w l
EOF
done

!!Answer 5'
for y in `seq 10 17`; do
cat 20${y}_Tday.txt | awk 'BEGIN{tmin=100; tmax=-100}$2>tmax{tmax=$2}$2<tmin{tmin=$2}{a=a+$2; n++}END{print a/n, tmax, tmin}'
done

!!Answer 6'
for y in `seq 10 17`; do cat D${y}-*csv | awk 'BEGIN{FS=","}$1>0{print $19}' | awk '$1==-99999{$1=tp}{tp=$1; print $1}' | sort -n | tail -1; done
for y in `seq 10 17`; do cat D${y}-*csv | awk 'BEGIN{FS=","}$1>0{print $19}' | awk '$1==-99999{$1=tp}{tp=$1; print $1}' | sort -n | head -1; done
cat D*csv | awk 'BEGIN{FS=","}$1>0{print $19}' | awk '$1==-99999{$1=tp}{tp=$1; print $1}' | awk '{a=a+$1; n++}END{print a/n}'

- Draw a graph of year vs average temperature of DOY=200.Exercise 6': Do the same thing for the years from 2010 to 2017.