It's very often that in natural language processing, you will have to re-format your data to take as inputs to different systems. In this case, these simple linux commands will help you do it much quicker without having to write a script.
1
2
3
4
Input f2 looks like this:
a
b
c
d
Output f3 will look like this:
1 a
2 b
3 c
4 d
Command: paste f1 f2 > f3
The delimiter by default is a tab. You can also define it (for example, separated by a comma) as follows:
paste -d ',' f1 f2 > f3
Input f1:
a
b
c
d
1 aaa a
We can also use join command to join on different fields, different columns (having to sort them first). Further instructions about join can be found here.
1. Merging two files to one file with two column
Input f1 looks like this:1
2
3
4
Input f2 looks like this:
a
b
c
d
Output f3 will look like this:
1 a
2 b
3 c
4 d
Command: paste f1 f2 > f3
The delimiter by default is a tab. You can also define it (for example, separated by a comma) as follows:
paste -d ',' f1 f2 > f3
2. Create a line number to each line of a text file
Assume that you want to create an index to each line in a text file, i.e. inserting a line number and then a tab before the content of each line:Input f1:
a
b
c
d
Output f2:
1 a
2 b
3 c
4 d
Command: nl f1 > f2
3. Joining two files with a common field
Input f1:
1 aaa
1 aaa
2 bbb
3 ccc
4 ddd
Input f2:
1 a
1 a
2 b
3 c
4 d
Output f3 (joining on the first field):1 aaa a
2 bbb b
3 ccc c
4 ddd d
Command: join f1 f2 > f3We can also use join command to join on different fields, different columns (having to sort them first). Further instructions about join can be found here.
To get columns (e.g., column 3 and 5) from a text file data.txt, one can use "cut" command as follows:
ReplyDeletecut -d' ' -f3,5 < data.txt
This command is usually much faster than using shell scripts.