![]() |
Merge Sort |
The function merge_sort() will sort virtually any kind of file, using read, write and comparison functions supplied by the calling program.
The function call is as follows:
result = merge_sort(unsorted_file, sorted_file, read, write, compare, pointer, max_record_size, block_size, pcount);
The calling program must supply the following parameters:
- FILE *unsorted_file;
- A file pointer for the file to be sorted. The file must have been successfully opened for reading, and the file pointer must be positioned at the beginning of the file.
- FILE *sorted_file;
- A file pointer for the sorted file. The file must have been successfully opened for writing, and the file pointer must be positioned at the beginning of the file.
- int (*read)();
- A function that reads a single record.
- int (*write)();
- A function that writes a single record.
- int (*compare)();
- A function that compares two records.
- void *pointer;
- A user-defined pointer to be passed to the (*read)(), (*write)() and (*compare)() functions when they are called.
- unsigned max_record_size;
- The maximum number of bytes in a record, when it is read into memory. This need not be the same as the size of the record when it resides in a file.
- unsigned long block_size;
- The number of records to be sorted in memory, or 1L if memory sorting is not to be used.
- unsigned long *pcount;
- A pointer to a variable to receive the record count (if the sort is successful), or NULL if this information is not to be returned.
The function returns the following values:
- int result;
- Result code:
- 0 for a successful sort
- 1 for insufficient memory
- 2 for a file creation error
- 3 for a file write error
The unsorted and sorted files will not be rewound or closed. Each file will be left open, with the file pointer positioned at the end of the file. However, if the two file pointers are identical, the file will be rewound and the sorted records will be written over it.
The (*read)() function is called by merge_sort() as follows, and must read one record each time it is called (except when the end of the unsorted file is detected):
n = (*read)(fp, buffer, pointer);
- FILE *fp;
- A file pointer for the unsorted file or a temporary file created by calling tmpfile().
- void *buffer;
- A pointer to a buffer to receive one record. This buffer can hold at most max_record_size bytes.
- void *pointer;
- A copy of the pointer passed as an argument to merge_sort().
- int n;
- The number of bytes in the record, or zero if an attempt was made to read past the end of the unsorted file.
When this function returns zero, it will not be called again for the same file. No attempt will be made to read past the end of a temporary file.
A record larger than the specified maximum size must be truncated or dealt with in some other suitable manner. The function will fail catastrophically if the buffer is allowed to overflow, or if the value returned by (*read)() is larger than the maximum record size.
The record format may be changed when it is read into memory, provided that:
For example, the line terminator \r\n in DOS/Windows might be converted to \n when it is read into memory.
The (*write)() function is called as follows, and must write one record each time it is called:
n = (*write)(fp, buffer, pointer);
- FILE *fp;
- A file pointer for the sorted file or a temporary file created by calling tmpfile().
- void *buffer;
- A pointer to a buffer holding one record, as read into memory by (*read)().
- void *pointer;
- A copy of the pointer passed as an argument to merge_sort().
- int n;
- Nonzero for a successful write; zero for insufficient disk space.
Notice that the record length is not passed as a parameter. It must be contained, explicitly or implicitly, in the record itself or in some other place accessible to the (*write)() function.
The (*compare)() function is called to compare two records, as follows:
n = (*compare)(p, q, pointer);
- void *p;
- A pointer to a buffer containing the first record, as read into memory by (*read)().
- void *q;
- A pointer to a buffer containing the second record, as read into memory by (*read)().
- void *pointer;
- A copy of the pointer passed as an argument to merge_sort().
- int n;
- The comparison result, as follows:
- >0 if *p is to be after *q in sorted order
- <0 if *p is to be before *q in sorted order
- 0 if the order of *p and *q is irrelevant
The sort proceeds as follows:
The sort can fail if
Whether the sort is successful or not, all allocated memory blocks will be deallocated, and all temporary files will be closed and deleted. If the sort fails, the file pointers for the unsorted and sorted files will be left wherever they happened to be.
The merge_sort() function is reentrant if the functions that it calls are reentrant:
Under DOS/Windows, tmpfile() creates a BINARY file. However, this usually creates no problems, even if the unsorted and sorted files are text files, because DOS/Windows handles the conversions in a consistent manner.
If disk space is extremely tight, the sort can be modified to use the space occupied by the unsorted file. The unsorted file, temporary files, and sorted file can then fit into a space approximately twice as large as the unsorted file. To make this modification, just insert code to delete the unsorted file after it has been read. Notice that if this technique is used, the sorted and unsorted files must be different, and the argument passing structure may become a little less elegant because the function call to delete a file usually requires file specifications, not merely a file pointer.
The merge_sort() function is not stable; i.e., it does not preserve the relative positions of two records for which the (*compare)() function returns zero. The function stable_merge_sort() does have this property. However, the additional feature carries a price: a four-byte record number is added to each record while it is in memory or in a temporary file. This increases the disk and memory requirements, by a substantial amount if the records are small.
The SORT utility illustrates the use of stable_merge_sort(). It is a DOS or UNIX filter which sorts the lines of a text file. The maximum line length is MAX_LINE characters, not including the carriage return and line feed at the end of each line. The utility will abort the sort and display an error message if it reads a line containing more than MAX_LINE characters. The value of MAX_LINE is set at 255, but it can be changed by recompiling the utility.
The utility is called as follows:
SORT <unsorted_file >sorted_file M N
The sort includes columns number M to N, inclusive, where the leftmost column is number zero. If N is omitted, the sort includes column M and all columns to the right of it. If both M and N are omitted, all columns are used in the sort.
A line containing fewer than N+1 columns is sorted as though it were padded with nulls at the right end.
Characters are ranked by their ASCII codes, considered as UNSIGNED bytes.
This package calls on another package:
Source code in text format:
An Indonesian translation of this page has been posted at www.chameleonjohn.com/translations/mergesor-Indonesian as part of ChameleonJohn.
A Romanian translation of this page has been posted at http://www.dontpayfull.com/page/merge-sort by Irina Vasilescu.