From e11ce9bdb2b8a54865edb93dc896479dfafba173 Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Fri, 5 Nov 2021 23:22:55 +0530 Subject: doc: Add spell check workflow to tutorial. * doc/ccwl.skb (Tutorial)[Let's write a spell check workflow]: New section. * Makefile.am (doc/spell-check.out): New target. (EXTRA_DIST): Add doc/spell-check-text.txt and doc/dictionary. * doc/dictionary, doc/spell-check-text.txt, doc/spell-check-workflow-1.scm, doc/spell-check-workflow-2.scm, doc/spell-check.scm: New files. --- doc/ccwl.skb | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 75 insertions(+), 1 deletion(-) (limited to 'doc/ccwl.skb') diff --git a/doc/ccwl.skb b/doc/ccwl.skb index c460ab4..5efd709 100644 --- a/doc/ccwl.skb +++ b/doc/ccwl.skb @@ -247,7 +247,81 @@ following output.]) (p [The MD5, SHA1 and SHA256 checksums are in the files ,(file "112be1054505027982e64d56b0879049c12737c6"), ,(file "d2f19c786fcd3feb329004c8747803fba581a02d") and -,(file "0d2eaa5619c14b43326101200d0f27b0d8a1a4b1") respectively.])))) +,(file "0d2eaa5619c14b43326101200d0f27b0d8a1a4b1") respectively.]))) + + (section :title [Let's write a spell check workflow] + (p [Finally, let's put together a complex workflow to understand +how everything fits together. The workflow we will be attempting is a +spell check workflow inspired by the founders of Unix,(footnote +["UNIX: Making Computers Easier to Use" has a ,(ref +:url "https://www.youtube.com/watch?v=XvDZLjaCJuw?t=315" +:text "section where Brian Kernighan writes a spell check system using +pipes")]) and by dgsh,(footnote [dgsh, a shell supporting general +directed graph pipelines, has a ,(ref +:url "https://www.spinellis.gr/sw/dgsh/#spell-highlight" :text "spell +check example").]). The workflow is pictured below. Let's start by +coding each of the steps required by the workflow.]) + + (image :file "doc/spell-check.png") + + (p [The first command, ,(code "split-words"), splits up the +input text into words, one per line. It does this by invoking the +,(command "tr") command to replace anything that is not an alphabetic +character with a newline. In addition, it uses the +,(code "--squeeze-repeats") flag to prevent blank lines from appearing +in its output. Notice that no type is specified for the input +,(code "text"). When no type is specified, ccwl assumes a +,(code "File") type.] + (scheme-source-form "doc/spell-check.scm" "\\(define split-words")) + + (p [We want our spell check to be case-insensitive. So, we +downcase all words. This is achieved using another invocation of the +,(command "tr") command.] + (scheme-source-form "doc/spell-check.scm" "\\(define downcase")) + + (p [For easy comparison against a dictionary, we want both our +words and our dictionary sorted and deduplicated. We achieve this by +invoking the ,(command "sort") command with the ,(code "--unique") +flag.] + (scheme-source-form "doc/spell-check.scm" "\\(define sort")) + + (p [Finally, we compare the sorted word list with the sorted +dictionary to identify the misspellings. We do this using the +,(command "comm") command.] + (scheme-source-form "doc/spell-check.scm" + "\\(define find-misspellings")) + + (p [Now, let's wire up the workflow. First, we assemble the +,(code "split-words")-,(code "downcase")-,(code "sort-words") arm of +the workflow. This arm is just a linear chain that can be assembled +using ,(code "pipe"). We will need to invoke the ,(code "sort") +command twice in our workflow. To distinguish the two invocations, CWL +requires us to specify a unique step id for each invocation. We do +this using the second element, ,(code "(sort-words)"). To avoid name +conflicts, we also need to rename the output of the ,(code "sort") +command. The last step, +,(source-ref "ccwl/ccwl.scm" "\\(\\(rename" (code "rename")), a +special ccwl construct that, is used to achieve this. In this case, it +renames the ,(code "sorted") output of the ,(code "sort") command into +,(code "sorted-words").] + (scheme-source "doc/spell-check-workflow-1.scm")) + + (p [Next, we assemble the ,(code "split-dictionary") arm of the +workflow. This arm is just a single step. Then, we connect up both the +arms using a ,(code "tee"). Here too, we have a step id and renaming +of intermediate inputs/outputs.] + (scheme-source "doc/spell-check-workflow-2.scm")) + + (p [And finally, we use the outputs of both the arms of the +workflow together in the ,(code "find-misspellings") step.] + (scheme-source-form "doc/spell-check.scm" "\\(workflow")) + + (p [The complete workflow is as follows.] + (scheme-source "doc/spell-check.scm")) + + (p [When compiled and run with a text file and a dictionary, the +misspelt words appear at the output.] + (prog :line #f (source :file "doc/spell-check.out"))))) (chapter :title [Cookbook] (section :title [Reuse external CWL workflows] -- cgit v1.2.3