-- back-up-by-copying t --

Writing

I want to become a better writer since reading the first two references. They encourage a writer to track how much they write, and then make each word count.

I've thought about counting my written words for at least two years, so I thought I'd start by counting my words. Maybe I've even written in my dairy about this.

So first, here are some rules:

everything I write will be in an Org Mode file,
with accomodations for Wordpress, Mens Club minutes, email, everything but Tweets.
I'll either write in Org, and post to Wordpress, email, google Docs, etc, or vice-versa
I still need to think about my moleskin diary – paper

a plan

The first task before me is a Do It Yourself word count tally.

Highlighting Dorian's (How to write ..) points:

This has nothing to do with talent,
Use the power of Kaizen, improvement in tiny steps
To battle inertia, back down to basics, and
Gamify your writing experience

this last one got my attention. He's bought a writing tool, Scrivner, un-referenced here since it's a. windows only, and b. doesn't get my strong emacs preference. Which lead back to the rules. I'll find it easier to use Org Mode on my computer as the organized way to collect the daily word count:

$ wc somefile | diff - yesterday's version

is a high-level way to tally my word count.

So, now, how to?

First, I've a tool, recorg that lists all my Org files and then reports the long listing of each files latest change. It then formats the data into a master file ../../recorg.org.

So, how's this sound? The master list is here: ../../recorg.out, where the first few lines suggest what I might do:

-rw-r--r--@ 1 applemcg  staff       0 Nov  2 23:44 recorg.org
-rw-r--r--@ 1 applemcg  staff   14252 Nov  2 18:09 Family/invest/sandp.org
-rw-r--r--  1 applemcg  staff     533 Oct 31 18:03 ../talk/.steps.org
-rw-r--r--@ 1 applemcg  staff   30959 Oct 27 17:27 commonplace/software/swdiary-2016.org
-rw-r--r--@ 1 applemcg  staff  138854 Oct 23 07:07 commonplace/software/swdiary.org

Recorg.org itself is the newest file in the list. Any file newer than that one has been changed since it was written. Here's a sketch of an algorithm, a recipe to collect the daily word count:

routinely publish a word-count for each file
for files newer than the last published date, collect the updated version, subtract the prior data on the file, and publish the changes, logging the additions, and subtractions?

an important feature is not to require this to be run at any specific time. I hope the tool will encourage a routine, and daily usage, but won't require it to report the Words Per Day. A daily average will be sufficient for starters.

So, first inspect the recorg function:

$ whfn recorg; fbdy recorg

A quick inspection of the tool suggests a simple solution. I have an existing tool, /RDB, which can really simplify the process. Some steps.

pick a format for the data, a schema, probably the output from wc
routinely collect the data for each file, and here's the easy part,
record the data in a history file.

A tool I've used for twenty years, rdput, which records the time a record was inserted and deleted from an RDB table. So, any file whose record is among the most recent updates needs a report: what are the differences since the most recent update?

This reduces the bookkeeping to subtract previous word count from the prior record. It's a feature fo rdput to leave unchanged records unlatered: in the case of wc, a file's data only updates when either the number of lines, words, or characters changes.

data format

record schema:

filename – relative to the Dropbox base
wc output – lines, words, characters

in total, four fields.

The history adds fields {insert,delete}_time.

While the wc output places the data first, the filename last, an RDB table report probably wants to put the history first.

the report schema:

date&time – of last update to record schema
file – the filename from the record
wcinfo – words changed

first look

The general concept looks good. My expectation that wc's "total" and its change would be sufficient was too optimisitic. A few filters are needed:

wholesale copy of a file, either external or a local name change
what about copy file fragments, or basic "link"ed files
and code, how should that be counted, as it creeps into my writing

Do It Yourself

recording the Orgs – recorg

The first step recorg's the Org files in a "long listing" (ls -l):

recorg () 
{ 
    : date 2016-10-02;
    : date 2016-10-23;
    : date 2016-10-31;
    : date 2016-11-03;
    : date 2016-12-02;
    : exclude files, recorgy, with identical basename, wc results;
    : date 2016-12-04;
    function forg () 
    { 
        : date 2016-10-23;
        find ${*-.} -name '*.org'
    };
    function recorgx () 
    { 
        : date 2016-10-31;
        forg ../talk;
        forg $(ls | grep -v ' ') ../{doc,stonebridge,git} | nvn
    };
    function recorgy () 
    { 
        awk ' { b=$NF; gsub(/.*\//,"",b); }

            !printed[$1,$2,$3,b]++
        '
    };
    function recorgt () 
    { 
        rdb_hdr lines words chars filename;
        cat ${*:--} | field NF | xargs wc | sed 's/^  *//' | sps2tabs
    };
    pushd ~/Dropbox;
    set recorg.{out,org,${1:-rdb},cut};
    : "cut" or grep -v file needs an entry;
    [[ -f $4 ]] || echo $2 > $4;
    recorgx | xargs ls -lt | tee $1 | awk -v year=$(date +%Y) -f $(awk_file) > $2;
    grep -v -f $4 $1 | recorgt | recorgy > $3;
    rdput $3;
    popd;
    unset recorg{t,x,y} forg
}

collecting the report

Here are the reporting tools to "gamify" the writing, with a summary through the first days:

daily_report () 
{ 
    : date 2016-12-02;
    report_notpipe && return 1;
    rdb_hdr day words file;
    : dawn of time for word-counting;
    row 'time > 161202114600' | tail +3 | awk -f $(awk_file)
}
daily_totalwords () 
{ 
    : date 2016-12-03;
    latest_report | daily_report | row 'file ~ /total/' | ncolumn file | quietly rd sort -r
}
daily_mvag () 
{ 
    daily_totalwords | rd sort | addcol mvag | compute '
        d = 7;
        n = 1/d;
        o = 1-n;
        mvag = o*ovag + ((ovag)?n:1)*words;
        ovag = mvag
    '
}

daily_mvag is at the top of the heap. So, the routine task is:

$ recorg     # updates "recorg.rdb" and it's history
$ daily_mvag # produces a report which looks like:
day    words  mvag
---    -----  ------
161202   819  819
161203   812  818
161204   362  752.857
161205  2458  996.449
161206 -1198  682.956

The big hit today ( -1198) is because in yesterday's mail, I had mistakenly cut and pasted, duplicating the whole file. Note the "dawn of time" in daily_report. That marks the time when I'd first collected a record of the RDB data table.

what about deleted files

When a file gets deleted, (not merely changed), it would be nice to remove it from the active history.

Sort the delete time and file name so all deletes appear before the undeleted. When a file appears with no delete time clear a deleted flag for the file. Files with uncleared deleted flags are absolutelty deleted. Report them.

For the moment, my fix is to count the changes to files rather than the totals for each day. I'd been taking the difference between yesterday's and today's total field. I'm not sure how a deleted file fares under this treatment. Here are the functions supporting daily_mvag

daily_totalwords () 
{ 
    : date 2016-12-03;
    : date 2016-12-09;
    rdb_hdr day words;
    latest_report | daily_report | tail +3 | awk -f $(awk_file)
}
daily_mvag () 
{ 
    : date 2016-12-04;
    : add results to journaled RDB table;
    : date 2016-12-06;
    : date 2016-12-09;
    daily_totalwords | rd sort | addcol mvag | compute '
        d = 7;
        n = 1/d;
        o = 1-n;
        mvag = o*ovag + ((ovag)?n:1)*words;
        ovag = mvag
    ' | tee ~/Dropbox/mvag.rdb;
    rdput ~/Dropbox/mvag.rdb
}
latest_report () 
{ 
    : date 2016-12-02;
    : date 2016-12-05;
    : date 2016-12-09;
    rdb_hdr time words file;
    prepare_report | tail +3 | awk -f $(awk_file) 2> report.err
}
daily_report () 
{ 
    : date 2016-12-02;
    : date 2016-12-09;
    report_notpipe && return 1;
    rdb_hdr day words file;
    : dawn of time for word-counting;
    row 'time > 161202114600' | row 'file !~ /total/' | tail +3 | awk -f $(awk_file)
}
awk_file () 
{ 
    : date 2016-09-30;
    : date 2016-11-25;
    trace_call $*;
    local awk=${1:-$(myname 2)}.awk;
    for lib in $(lib_paths) {.,..}/lib;
    do
        [[ -f $lib/$awk ]] && { 
            echo $lib/$awk;
            return 0
        };
    done;
    return 1
}

A recurring theme in these is this fragment:

 ... | tail +3 | awk -f $(awk_file)

from which you may be able to tell that awk_file finds a file with the .awk suffix whose name matches either the calling function or the first argument. For example the latest_report function uses an awk_file, found on the lib_paths which is just the list of directories on the users PATH after replacing a trailing /bin with /lib. A function aff is the upward-compatible version of ff.

aff () 
{ 
    : date 2016-12-09;
    function _aff () 
    { 
        set $1 $(awk_file $1);
        ff $1;
        case $# in 
            2)
                cat $2
            ;;
        esac
    };
    foreach _aff $*;
    unset _aff
}

which adds the awk_file to the output for any function using one. This solves a problem I'd noted some time ago: what's the size of an awk script before it is better left in a separate file, rather than part of the function.

recorg_report () 
{ 
    : date 2016-12-09;
    : do NOT leave the history file laying loose
    zcat ~/Dropbox/h.recorg.rdb.Z 
}
mvag_report () 
{ 
    ${*:-echo} recorg daily_{totalwords,mvag} {latest,daily,prepare,recorg,mvag}_report
}

And, to put a wrap on things, recorg_report corrects a mistake I made: don't leave both a compressed file and it's uncompressed version lying around. It's more trouble than it's worth to defend against overwriting one or the other at a useful time.

When the function collection approaches an application, then I find it usefull to put the collection names into a self-referencing function. In the case of mvag_report, a trick I like to use is the ${*:-echo} idiom. It's default behavior is to echo or name the functions. With arguments, use them, so the fun alternative is:

$ mvag_report ff

produces the function bodies.

and the paper diary

Solution: for today's date in my diary, add a "words" DRAWER to hold the word-count for the day. Roll it up with a separate collection from the recorg gathering, but before the daily total, and moving average are caclulate.

DONE include a recorg callgraph

This shows the calling tree. The major functions, in order are:

recorg
frdput
daily_mvag

daily_mvag

daily_totalwords
- awk_file
  - lib_paths
    - paths
  - myname
- daily_report
  - awk_file
  - rdb_hdr
  - report_notpipe
    - report_usage
      - comment
      - myname
  - words
    - wh
    - wpl
- latest_report
  - awk_file
  - prepare_report
    - rd
      - rdb_hdr
    - recorg_report
    - words
  - rdb_hdr
  - words
- rdb_hdr
- words
rd
words

recorg

awk_file
comment
field
frdput
- backup
  - backup_one
    - backup_here
      - timestamp
        
        epoch
    - ignore
    - report_notfile
      - report_usage
  - comment
  - foreach
    - cmd
    - report_notcommand
      - ignore
      - quietly
      - report_usage
- frdinit
  - rdb_hdr
  - report_notfile
- myname
- rd_compress
  - report_notfile
  - report_notpipe
- report_notfile
nvn
rdb_hdr
sps2tabs
words