on Manufactured Functions

Table of Contents

f#+HTMLHEAD: <link rel="stylesheet" type="text/css" href="../../style.css" />

1 Context   rdb sql

Let me introduce the context, and move on to the class of functions I'm manufacturing. Where and how can these be used?

Before jumping in, realize this note is part of my Commonplace Book

My local database is /RDB, "Slash …". You can find it in the literature. It's built on two notions. Data bases are build from a collection of tables, these being TAB-separated text files, and the unix shell is the only query language you'll need, give a few string-handling primitives. Mine are in awk, many implementations of the few commands are in Perl. The point being, just a few commands are in one of these tools, but the query language is the shell. For example:

column this that other count < table.rdb | row 'that ~ /label/ && count < 1'  | justify

where the "column, row," and "justify" are written in either Perl, awk, etc, and are quite simple. The "table.rdb" is a tab-separated text file which may have some, but needn't have all of the columns which are arguments to the "column" command.

Armed with such a powerful tool, my aim is to write functions using data tables. Since a data table implies things of common attributes, this necessarily focuses the type of function one can write using this method.

I don't intend to force fit every function into this approach, but two things seem to appear. The class of function is sufficiently generic to be worth the expended effort. So, now I have a tool which challenges me to ask the question: Can this function be manufactured? So, without trying to force-feed every function through this mill, a taxonomy can develop.

2 Class of function   invest

The main use of this method is to create tools, functions to fit a data-flow model of processing. For example, after downloading my wife's and my retirement portfolios in a CSV file, I'd like to routinely update the history. And easily. If I have two browsers open on our separate account pages, where there may be some overlap (we have joint accounts), I'm pushing the "download" option on her page, then on mine, and run a single command which updates the current portfolio, with account, symbol, and quantity fields, as well as updates the recorded history of our holdings. Since I'm an investor, not a trader, monthly reporting is sufficient. But the report interval needn't bind the update frequency. For example, use the functions just before and after the dividend window for any of the stocks.

My first manufactured function does the update. I'm expecting, now with repeated use, this should take no more than a minute from opening the on-line account pages to an updated history table.

Here is an early command.lst file:

name	fid2pos
nargs	2
mode	always
output	position.rdb
command	fidToPosition
inputs	~/Downloads/Portfolio_Position_*.csv
dispose	putArchive

name	mfgcmd
nargs	1
mode	always
output	mfgcmdlib
command	cmd2fun
inputs	command.lst
dispose	backup

name	mfglibuse
nargs	1
mode	always
output	mfguselib
command	mfguse
inputs	command.lst
dispose	backup

3 the Fields

Since the above file is an instance of an RDB "list" format, the first field is a field name, a blank line separates each record.

3.1 name

simply names the generated function. It might be tied to the output file, since that's it's job, tied to an underlying function, it takes inputs and produces an output on the stderr. It may be possible in a multi-user world to have multiple named functions using the underlying command

3.2 nargs

is the minimum number of arguments to the underlying command. Usually these are files. Future enhancements may what to take account of the fact these are files. In the case of inputs using wild card names the pattern may not match any existing names, and the named function may fail the notargcount report.

3.3 mode

not implemented yet, but this will account for the real data flow properties of a process. The planned values are blank or always, then newest, where the update process is busy enough to skip if none of the inputs are newer than the output, and finally, a append mode where the generated data is appended to the output, such as a log file.

3.4 output

the name of a file, which needn't exist at the command time, but is written to on the stdout of the command argument, contingent on the state of the mode.

3.5 command

this must exist as a function. At the moment, it must fail the notfunction test. Which is to say, a command found on the user's PATH won't cut it for the command name.

3.6 inputs   browser re

the name of a file, files, or a regular expression (RE) to identify a file or files. In the current incarnation, my investment house downloads CSV files, on my iMac Yosemite into files named ~/Downloads/PortfolioPositionsomething.csv, where {something} is a browser-dependent date format. Our investment house uses a format: Mon-dd-YYYY … where the … is the browser-dependent part. For subsequent downloads (remember, my wife and I) Chrome uses " - (N)" and Safari is easier to deal with: "-N". In both cases, N > 1. Attempts to write a more specific RE proved more work than needed.

3.7 dispose

When the inputs have been digested, the output updated, then there may be some useful post-process. That's for the dispose function. It's first argument is the output followed by the inputs. backup is a convenient practice for the output. An empty argument is ignored.

4 source code

4.1 mfgcmdlib

This library is generated from the table above. Compare the code to the data in the table. The whole purpose of the table is to hide the considerable amount of boiler-plate code from routine editing. And make it easier to maintain the data-flow of the task at hand. Of the functions here, only fid2pos is a user function. The other two, mfgcmd and mfglibuse are central tools, for use by other sites or data-flow processes.

function fid2pos
{
	set -- ~/Downloads/Portfolio_Position_*.csv
	report_notargcount 2 $#		&& return 1
	report_notfunction fidToPosition     	&& return 2
	fidToPosition "$@" > position.rdb
	report_notfunction putArchive     	&& return 3
	putArchive position.rdb "$@"
	
}
function mfgcmd
{
	set -- command.lst
	report_notargcount 1 $#		&& return 1
	report_notfunction cmd2fun     	&& return 2
	cmd2fun "$@" > mfgcmdlib
	report_notfunction backup     	&& return 3
	backup mfgcmdlib "$@"
}
function mfglibuse
{
	set -- command.lst
	report_notargcount 1 $#		&& return 1
	report_notfunction mfguse     	&& return 2
	mfguse "$@" > mfguselib
	report_notfunction backup     	&& return 3
	backup mfguselib "$@"
}

4.2 mfguselib

Each of these functions pair-up with a manufactured function. e.g. fidToPosition is used in fid2pos This little table shows the /RDB layout from this command:

$ column < command.txt | column name mode output inputs | jm
name            mode    output          inputs                              
----            ----    ------          ------                              
fid2pos         always  position.rdb    ~/Downloads/Portfolio_Position_*.csv
mfgcmd          always  mfgcmdlib       command.txt                         
mfglibuse       always  mfguselib       command.txt   

Seems a convention might be established between the name and output fields. Given an output file, a function might be given a name:

$ name=$(basename ${output%.*})_mfg

We'll see

function fidToPosition
{ 
    rdb_hdr account symbol quantity;
    cat $* | fidPosition | egrep -v '(Account|download)' | sed 's/, */	/g; s/\*\*//'
}
function cmd2fun
{ 
    cat ${*:--} | column name nargs mode output command inputs dispose | tail +3 | tawk '
    function onl(tx,dn)     { printf "\t" tx "\n", dn; }
    function tnl(tx,dn,en)  { printf "\t" tx "\n", dn, en; }
    # 1 name    -- MUST have
    # 2 nargs
    # 3 mode:   always, append, newest, 
    # 4 output
    # 5 command
    # 6 inputs, an RE will do
    # 7 dispose
    $1 ~ /$^/  { next; }
               {
		 rc = 1;
                 onl("\nfunction %s\n{", $1)
                 onl("set -- %s",        $6)
                 tnl("report_notargcount %d $#\t\t&& return %d", $2, rc++)
                 tnl("report_notfunction %s     \t&& return %d", $5, rc++)
                 tnl("%s \"$@\" > %s",   $5,$4)
               }
   $7 !~ /$^/  {
                 tnl("report_notfunction %s     \t&& return %d", $7, rc++)
                 tnl("%s %s \"$@\"", $7, $4) }
               { onl("\n}", "") }
   '
}
function mfguse
{ 
    fbdy $(column command < $1 | tail +3)
}

Note, these latter functions are wrapped by their command.lst stored data, and that only the first command in either library is a user command, namely fidToPosition and it's manufactured user function: fid2pos. You may infer which investment house I use.

5 cmd2fun

The cmd2fun function is the heart of the method. It reads an /RDB list or table format, either as named files, or on the stdin, where the fields are called out in a specific order in the column command, and the header is trimmed by tail +3 and piped to a Tab-separated AWK (tawk) command. This commands formats each function from the available fields.

Care is taken to test for a non-blank name. Notice the error return code is set to 1, and incremented for each potential error.

Where the dispose field is non-blank it is taken to be the name of a function which expects the single output file followed by any number of input files.

Note, in this code, the mode (#3) is not used. Inspecting the table, you'll see it's currently always. That's the convenient default. This field anticipates using newest, the feature of the make paradigm: "if any dependent file is newer than the target, bring the target up-to-date by the following process":

newest {output} {inputs} || command {inputs} > {output}

The other papers

6 references

Author: Marty

Created: 2016-02-20 Sat 15:50

Emacs 24.4.1 (Org mode 8.2.10)

Validate