Using awk to truncate text files

With neuroimaging analyses we deal with a lot of text files. These are often the result of program or scripts running on the MRI data; they can be statistics, volumes, error logs, or any number of other outputs. Let’s say that we have a large text file with a lot of data in it. Let’s suppose that it is at least organized neatly into rows and columns, which means it could easily be imported into Excel or another spreadsheet application. FreeSurfer’s aseg.stats is a good example of this.

For simplicity, let’s say that we have a 5×5 matrix of data (it’s really a 6×6 but we’ll just focus on the 25 numerical data values):

   X1 X2 X3 X4 X5
Y1 3000 5000 745 122 875
Y2 942 400 263 558 991
Y3 325 584 775 381 545
Y4 654 336 272 883 235
Y5 241 154 782 754 899

Notice that the column headers do not always line up with the rest of the columns; this is nothing to worry about. If you have a file like this (or are creating one), it is helpful to have a blank new line at the end (i.e., a blank extra row). This can help with the display and processing of text files.

Say you want to display the contents of the text file. Any easy way to do this is with cat {file}.txt. This will quickly display the contents on the screen. If it’s a long text file, you should use less {file}.txt instead of cat.

Here’s what this looks like with our sample file.

If you want to number the lines, you can run cat -n {file}.txt, which results in a display like this:

Or, you can use awk: awk 'FNR==1{print ""}1' {file}.txt

Now, let’s say that instead of the entire file, you just want a row of values or a single value. Here’s a way to display just one row: awk 'BEGIN { RS = "" ; FS = "\n" } ; { print $2 }' {file}.txt

The {print $2} specifies the second row in the file. This could be changed to $3, $5, $8, or whatever you want, if it’s a valid row number in the file.

You could also write that line to a new file with > or the pipe |.

Now, what if you want a column from the file?

awk '{ print $3 }' {file}.txt

Now, what if you want only a single value from the text file. I know there are shorter ways of doing this but one way is to do a two-step awk command by combining the column and row commands I already demonstrated:

awk 'BEGIN { RS = "" ; FS = "\n" } ; { print $2 }' {file}.txt | tee -a file_tmp.txt;
awk '{ print $3; exit }' file_tmp.txt | tee -a subject_volume.txt;
rm file_tmp.txt

This displays the second row in the file and writes it to a temporary file; then it reads and displays and saves the value from the 3rd column in the temp file. The “exit” part of the second command will limit the print command to a single value, should there be multiple values in the column (or row). Lastly, it deletes the temp file. In the screenshot above, I am showing the value in the second row, third column (this includes the row and coumn labels). You could also run this with the column command first, if that’s easier.

This can save you a lot of time if you have to pull out values from a lot of text files (e.g., aseg.stats) by using a for or while loop. I recently did this to pull values out of some text files. It would have taken hours to pull out the correct values from every participant by hand. With a script (series of “for” loops) it ran in seconds. Then, I had a file with all the values for all the participants, which was then imported into Excel.

I am a novice user of awk so specific questions about it are best directed elsewhere. This is just what I’ve figured out by reading various guides online.

About Jared Tanner

I have a PhD in Clinical and Health Psychology with an emphasis in neuropsychology at the University of Florida. I previously studied at Brigham Young University. I am currently a Research Assistant Professor at the University of Florida. I spend the bulk of my research time dealing with structural magnetic resonance images of the brain. My specialty is with traditional structural MR images, such as T1-weighted and T2-weighted images, as well as diffusion weighted images. I also look at the cognitive and behavioral functioning of individuals with PD and older adults undergoing orthopedic surgery. Funding for the images came from NINDS K23NS060660 (awarded to Catherine Price, University of Florida). Brain images may not be used without my written permission (grant and software requirements).