2011-05-23

Large files and multiple cores - 2

In November 2010, I wrote a small utility program - in Go - to view and search for molecules in an MDL SDF format file. I wrote about it here. As the program turned out to be quite useful, I cleaned it up, and made it more streamlined and modular. Here is its current version of the function that prints help text.
func printHelp() {
    str := `
NAME
    stsearchsdf - view, extract from and search an SDF file

SYNOPSIS - GENERAL FORM
    stsearchsdf command [params]

SYNOPSIS - SPECIFIC FORMS
    stsearchsdf help

    stsearchsdf show in=filename [from=m] [to=n]

    stsearchsdf copy in=filename out=filename [from=m] [to=n]

    stsearchsdf searcha in=filename (symbol=count)... [comp=(eq|gt|lt)]
    [from=m] [to=n] [mx=c]

    stsearchsdf searcht in=filename (tag=tagname tagval=tagvalue)... [from=m]
    [to=n] [mx=c]

    stsearchsdf counta in=filename (symbol=count)... [comp=(eq|gt|lt)]
    [from=m] [to=n] [mx=c]

    stsearchsdf countt in=filename (tag=tagname tagval=tagvalue)... [from=m]
    [to=n] [mx=c]

DESCRIPTION
    stsearchsdf is a program that can be used to view or extract parts of an
    SDF file.  It can also be used to search an SDF file for molecules that
    contain atoms of given elements in specified number, or those with
    specified tag values.

COMMANDS
    help
        Display this help text, and exit.

    show
        Display the sequence of molecules in the specified serial number
        range.

    copy
        Copy the sequence of molecules in the specified serial number range
        to the given output file.

    searcha
        Search for and display the first matching molecule containing specified
        numbers of given atom types.

    searcht
        Search for and display the first matching molecule containing specified
        tag values.

    counta
        Display the number of matching molecules containing specified numbers
        of given atom types.

    countt
        Display the number of matching molecules containing specified tag
        values.

OPTIONS
    in
        Input SDF file name.

    out
        Output SDF file name.  Any existing output file will be overwritten.
        Applicable to only the command 'copy'.

    from
        Number of the molecule where processing should start.  Defaults to 1.

    to
        Number of the molecule where processing should stop.  Defaults to the
        last molecule in the file.

    symbol
        Atomic symbol.  This should be given in proper case, e.g. C, Na.

    comp
        The numeric comparison to be used.  Use 'eq' for equality check, 'gt'
        for greater than, and 'lt' for less than.  Defaults to 'eq'.

    tag
        Name of the tag whose value follows.

    tagval
        Value of the tag field whose name this follows.

    mx
        Maximum number of threads to use for processing.  Defaults to 2.
        Applicable to only search and count commands.
`
    fmt.Println(str)
}

No comments: