20100620

CsvFilter

"This tool provides better filtering options for CSV files. User can add unlimited amount of filters in multiple columns. E.g. MS Excel provides only 2 options for filtering a single column."

http://csv-filter.sourceforge.net/

regexxer

"regexxer is a nifty GUI search/replace tool featuring Perl-style regular expressions. If you need project-wide substitution and you’re tired of hacking sed command lines together, then you should definitely give it a try."

http://regexxer.sourceforge.net/

20100619

CSVfix

"CSVfix is a command-line stream editor specifically designed to deal with CSV data. With it, you can:

Convert fixed format, multi-line and DSV files to CSV
Reorder, remove, split and merge fields
Convert case, trim leading & trailing spaces
Search for specific content using regular expressions
Filter out duplicate data or data on exclusion lists
Perform sed/perl style editing
Enrich with data from other sources
Add sequence numbers and file source information
Split large CSV files into smaller files based on field contents
Perform arithmetic calculations on individual fields
Validate CSV data against a collection of validation rules
Convert from CSV to fixed format, XML, SQL, DSV
Summarise CSV data, calculating averages, modes, frequencies etc."


http://code.google.com/p/csvfix/

20100618

csvtool

A handy command line tool from Merjis called csvtool for handling CSV files from shell scripts. It should be available from most *nix-based repositories.

http://merjis.com/developers/csv

20100612

GNU Diffutils

"GNU Diffutils is a package of several programs related to finding differences between files.

Computer users often find occasion to ask how two files differ. Perhaps one file is a newer version of the other file. Or maybe the two files started out as identical copies but were changed by different people.

You can use the diff command to show differences between two files, or each corresponding file in two directories. diff outputs differences between files line by line in any of several formats, selectable by command line options. This set of differences is often called a ‘diff’ or ‘patch’. For files that are identical, diff normally produces no output; for binary (non-text) files, diff normally reports only that they are different.

You can use the cmp command to show the offsets and line numbers where two files differ. cmp can also show all the characters that differ between the two files, side by side.

You can use the diff3 command to show differences among three files. When two people have made independent changes to a common original, diff3 can report the differences between the original and the two changed versions, and can produce a merged file that contains both persons' changes together with warnings about conflicts.

You can use the sdiff command to merge two files interactively."


http://www.gnu.org/software/diffutils/
http://www.gnu.org/software/diffutils/manual/
http://gnuwin32.sourceforge.net/packages/diffutils.htm

20100610

note on data sets

I will no longer track data sets on this blog. They are far too numerous, easy to find and are beyond the narrow scope I had originally defined for myself.