It's very often that in natural language processing, you will have to re-format your data to take as inputs to different systems. In this case, these simple linux commands will help you do it much quicker without having to write a script. 1. Merging two files to one file with two column Input f1 looks like this: 1 2 3 4 Input f2 looks like this: a b c d Output f3 will look like this: 1 a 2 b 3 c 4 d Command: paste f1 f2 > f3 The delimiter by default is a tab. You can also define it (for example, separated by a comma) as follows: paste -d ',' f1 f2 > f3 2. Create a line number to each line of a text file Assume that you want to create an index to each line in a text file, i.e. inserting a line number and then a tab before the content of each line: Input f1: a b c d Output f2: 1 a 2 b 3 c 4 d Command: nl f1 > f2 3. Joining two files with a common field Input f1: 1 aaa 2 bbb 3 ccc 4 ddd Input f
The most important questions of life are, for the most part, really only problems of probability ~ Laplace