.ig >> TDHkit: pjoin(1)


Tabular data handling toolkit  


.>> .TH pjoin(1) TDH "22-SEP-2003 TDH scg@jax.org" .SH NAME pjoin(1) \- relational join on two files .ig >>


.>> .SH SYNOPSIS \fCpjoin [\fIoptions\fC] \fIfile1\fC \fIkeyfields1\fC \fIfile2\fC \fIkeyfields2\fC .ig >>


.>> .SH DESCRIPTION \fBpjoin\fR performs a relational join on two sets of whitespace-delimited tabular data records. \fIfile1\fR and \fIfile2\fR are the input files. One of them may be \fC-\fR to indicate that data are to be read from standard input. The result is written to standard output and uses the same field delimitation as the input files. Comment lines (beginning with \fC//\fR) and blank lines are skipped on input and do not appear in output. .LP \fIkeyfields1\fR specifies one or more fields in \fIfile1\fR to consider when performing the join; it may be a single field specifier or a list of field specifiers delimited by commas. Likewise \fIkeyfields2\fR for \fIfile2\fR. Fields may be specified by number, e.g. \fC2\fR specifies the second field. If a file has a .ig >> .>> \0field name header, .ig >> .>> field names may be used, e.g. \fCid\fR (however the -h1 and/or -h2 options must be used so that \fBpjoin\fR knows to expect a field name header in \fIfile1\fR or \fIfile2\fR respectively). .ig >>


.>> .SH OPTIONS .LP \fB-l\fR .IP \0 Do a left join. Prevents records from being omitted from the left side (\fIfile1\fR) as the result of the join. Missing records will be filled with placeholder characters. .ig >>

.>> .LP \fB-r\fR .IP \0 Do a right join. Prevents records from being omitted from the right side (\fIfile2\fR) as the result of the join. Missing records will be filled with placeholder characters. .LP \fBNote: \fB-l\fR and \fB-r\fR may both be used to produce a loss-less join. .ig >>

.>> .LP \fB-i\fR .IP \0 Make comparisons case-insensitive. Normally they are case-sensitive. .ig >>

.>> .LP \fB-dup1\fR .IP \0 Allow multiple instances in \fIfile1\fR. The matching record from \fIfile2\fR will be replicated for each instance. .ig >>

.>> .LP \fB-dup2\fR .IP \0 Allow multiple instances in \fIfile2\fR. The matching record from \fIfile1\fR will be replicated for each instance. .ig >>

.>> .LP \fB-q\fR .IP \0 Quick option. Do not sort input; assume inputs are already in sort order. Normally the inputs are piped through an appropriate sort(1) command to sort on the key fields. \fBpjoin\fR will not give correct results if inputs are unsorted. .ig >>

.>> .LP \fB-rml\fR .IP \0 Remove the lefthand portion of the result, leaving only the records from \fIfile2\fR. .ig >>

.>> .LP \fB-rmr\fR .IP \0 Remove the righthand portion of the result, leaving only the records from \fIfile1\fR. .ig >>

.>> .LP \fB-H\fR .IP \0 Both \fIfile1\fR and \fIfile2\fR have .ig >> .>> \0field name headers, .ig >> .>> and a field name header will also be written to the output. Equivalent to -h1 -h2 -ho. .ig >>

.>> .LP \fB-h1\fR .IP \0 Indicates that \fIfile1\fR has a .ig >> .>> \0field name header. .ig >> .>> This allows fields in \fIfile1\fR to be specified by name. .ig >>

.>> .LP \fB-h2\fR .IP \0 Indicates that \fIfile2\fR has a .ig >> .>> \0field name header. .ig >> .>> This allows fields in \fIfile2\fR to be specified by name. .ig >>

.>> .LP \fB-ho\fR .IP \0 Field name header will be written as the first line of output. At least one of the input \fIfile\fRs must have a field name header. If one of the \fIfile\fRs did not have a field name header, placeholder fill characters will be written for that portion of the field name header. .ig >>

.>> .LP \fB-t\fR .IP \0 Indicates that input and output are tab delimited. Normally the join result uses a space between the left side and right side; with -t a tab is used instead. .ig >>

.>> .LP \fB-f\fIc\fR .IP \0 Set the placeholder fill character to \fIc\fR. Normally it is \fC-\fR. .ig >>


.>> .SH NOTES Using \fC-l\fR anf \fC-r\fR together results in a "loss-less" join. .LP \fC-dup1\fR and \fC-dup2\fR cannot be used together. .LP \fC-rmr\fR and \fC-rml\fR cannot be used together. .ig >>


.>> .SH EXAMPLE Suppose \fCfile1\fR looks like this: .nf 001 A red 001 B red 002 A blue 003 C yellow .fi and \fCfile2\fR looks like this: .nf 001 A Jean 002 A Jan .fi We could perform an ordinary join by issuing the command: \fCpjoin file1 1,2 file2 1,2\fR .nf 001 A red 001 A Jean 002 A blue 002 A Jan .fi Or we could perform a left join by issuing this command: \fCpjoin -l file1 1,2 file2 1,2\fR .nf 001 A red 001 A Jean 001 B red --- - ---- 002 A blue 002 A Jan 003 C yellow --- - --- .fi .ig >>


.>> .SH AUTHOR Steve Grubb, with portions developed by Sandra Reynolds and Marv Newhouse. .ig >>


Tabular data handling toolkit 
Copyright Steve Grubb
.>>