Rearrange CSV columns alphabetically from CLI

We can use tools like KDiff3 to compare two CSV files. But, it is difficult to identify the diff when the columns are not in the same order.

For example, look at the below output of 2 simple csv files.

kdiff3-csv-compare

Even though it highlights the diff, it is difficult to identify the diff because the columns are not in the same order. Here is the same diff after rearranging the columns alphabetically.

kdiff3-csv-compare-sorted

Now, it is easy to identify the diff.

Rearrange CSV columns alphabetically

We can write a simple python script using Pandas1 as follows.

#! /usr/bin/env python3

"""
re-arrange columns in alphabetical order
"""
import sys

import pandas as pd


def colsort(df):
    cols = list(df.columns)
    cols.sort()
    return df[cols]


def main():
    input_file = sys.argv[1]
    try:
        output_file = sys.argv[2]
    except IndexError:
        output_file = input_file
    df = pd.read_csv(input_file)
    df = colsort(df)
    df.to_csv(output_file, index=False)


if __name__ == '__main__':
    main()

We can use this script as follows.

$ python3 rearrange_csv_columns.py input.csv output.csv

Instead of writing a script by ourselves, we can use miller2 tool. Miller can perform various operations on CSV files. We can use sort-within-records to sort the columns.

$ mlr --csv sort-within-records -f input.csv > output.csv

Conclusion

We can use miller to sort the columns in a CSV file. This will help us to identify the diff easily when comparing two CSV files.


Need further help with this? Feel free to send a message.