guile-email - Guile email parser

Age	Commit message (Collapse)	Author
2020-05-25	base64: Reimplement from scratch.	Arun Isaac
	* email/base64.scm: Replace file.
2020-05-25	utils: Do not match sequence byte by byte in read-bytes-till.	Arun Isaac
	* email/utils.scm (bytevector-match, bytevector-overlap, lookahead-bytevector-n): New functions. (read-bytes-till): Do not match sequence byte by byte. Process blocks of bytes at a time.
2020-05-25	utils: Introduce the not-end-let utility.	Arun Isaac
	* email/utils.scm (not-end-let): New macro. * .dir-locals.el (scheme-mode): Indent not-end-let correctly.
2020-05-25	utils: Do not return eof if matched at beginning.	Arun Isaac
	* email/utils.scm (read-while, read-bytes-till): Do not return eof if matched at beginning. Return empty string or bytevector respectively. * tests/utils.scm ("read-bytes-till returns empty bytevector on match at beginning", "read-while returns empty string on match at beginning"): New tests.
2019-12-16	base64: Import only the required rnrs modules.	Arun Isaac
	* email/base64.scm: Import (rnrs arithmetic bitwise), (rnrs arithmetic fixnums), (rnrs base), (rnrs bytevectors) and (rnrn io ports), not all of (rnrs).
2019-12-04	email: Handle blank Subject headers.	Arun Isaac
	* email/email.scm (post-process-fields): Treat blank Subject headers as having the null string as value. * tests/email.scm ("blank Subject header must be treated as having the null string as value"): New test. Reported-by: Ricardo Wurmus <rekado@elephly.net>
2019-10-09	email: Return keywords header as a list.	Arun Isaac
	* email/email.scm (parse-email-headers): Return keywords header as a list of strings. * tests/email.scm ("keywords header must be a list"): New test.
2019-10-08	Reindent calls to call-with-port.	Arun Isaac
	* email/email.scm (body->mime-entities, email->headers+body): Reindent calls to call-with-port. * email/quoted-printable.scm (quoted-printable-encode, q-encoding-encode): Reindent calls to call-with-port. * tests/utils.scm ("read-bytes-till returns eof-object on end of file"): Reindent call to call-with-port.
2019-10-08	email: Override invalid charset more strongly.	Arun Isaac
	* email/email.scm (post-process-content-type): Use alist-combine to override charset more strongly than just appending to the alist. * tests/email.scm ("tolerate invalid charset"): Update test.
2019-10-08	email: Introduce alist union utility.	Arun Isaac
	* email/utils.scm (alist-combine): New function. (alist-delete): Delete function. email/email.scm (add-default-headers, add-default-mime-entity-headers): Use alist-combine.
2019-10-08	email: Deduplicate post processing of header fields.	Arun Isaac
	* email/email.scm (post-process-fields): New function. (parse-mime-entity, decode-body): Invoke post-process-fields.
2019-10-02	email: Tolerate decoding errors in body.	Arun Isaac
	* email/email.scm (decode-body): Tolerate decoding errors in the body using the substitute conversion strategy. * tests/email.scm ("tolerate decoding errors in body"): New test.
2019-10-01	email: Tolerate invalid charset.	Arun Isaac
	* email/email.scm (post-process-content-type): If charset is invalid, assume default UTF-8 as charset. * tests/email.scm ("tolerate invalid charset"): New test. Reported-by: Ricardo Wurmus <rekado@elephly.net>
2019-09-28	email: Tolerate decoding errors in MIME encoded words.	Arun Isaac
	* email/email.scm (decode-mime-encoded-word): Tolerate decoding errors in MIME encoded words using the substitute conversion strategy. * tests/email.scm ("tolerate decoding errors in MIME encoded words"): New test. Reported-by: Christopher Baines <mail@cbaines.net>
2019-09-28	email: Remove duplicate unbracketed-angle-addr definition.	Arun Isaac
	* email/email.scm (unbracketed-angle-addr): Delete duplicate definition.
2019-09-23	email: Update mbox->emails docstring.	Arun Isaac
	The earlier docstring was one meant for read-next-email-in-mbox. * email/email.scm (mbox->emails): Update docstring.
2019-09-23	email: Add read-next-email-in-mbox docstring.	Arun Isaac
	* email/email.scm (read-next-email-in-mbox): Add docstring.
2019-09-23	email: Tolerate non-ASCII non-UTF-8 characters in headers.	Arun Isaac
	* email/email.scm (email->headers+body): If non-ASCII non-UTF-8 characters occur in the headers, do not raise a decoding error. Work around using the substitute conversion strategy. * tests/email.scm ("tolerate non-ASCII characters in headers"): Rename to "decode utf-8 characters in headers". ("tolerate non-ascii non-utf-8 characters in headers"): New test. Reported-by: Christopher Baines <mail@cbaines.net>
2019-09-17	email: Tolerate non-ASCII characters in headers.	Arun Isaac
	We tolerate non-ASCII characters in headers in order to support Emacs message mode parens style addresses. * email/email.scm (email->headers+body): Read headers as UTF-8 characters. * tests/email.scm ("tolerate non-ascii characters in headers"): New tests. Reported-by: Christopher Baines <mail@cbaines.net>
2019-08-07	doc: Document mbox->emails.	Arun Isaac
	* doc/guile-email.texi (Reading Email): New chapter. * email/email.scm (mbox->emails): Add docstring.
2019-08-07	utils: Clarify read-while docstring.	Arun Isaac
	* email/utils.scm (read-while): Clarify docstring.
2019-07-28	email: Improve comment about default charset.	Arun Isaac
	* email/email.scm (post-process-content-type): Mention that RFC6657 specifies UTF-8 as the default charset only for text/* media types.
2019-07-28	email: Read mboxes as bytevectors.	Arun Isaac
	* email/email.scm (read-next-email-in-mbox): Read bytes from mboxes, not characters.
2019-07-28	utils: Return eof-object from read-bytes-till on end of file.	Arun Isaac
	* email/utils.scm (read-bytes-till): Return eof-object, not #vu8(), on end of file. * tests/utils.scm: New file. * Makefile.am (SCM_TESTS): Register it.
2019-07-28	email: Decode MIME entities without headers.	Arun Isaac
	* email/email.scm (email->headers+body): If there are no headers, return "" as headers not an eof-object. (parse-email-body): Parse headers of parent entity or email to parse-mime-entity. (add-default-mime-entity-headers): New function. (parse-mime-entity): Use add-default-mime-entity-headers instead of add-default-headers. Handle MIME entities without headers. * tests/email.scm ("decode MIME entity without headers"): New test.
2019-07-28	email: Support email with mixed encoding of characters.	Arun Isaac
	Prior to this, parse-email would accept email in the form of a string. A string is constrained to use the same encoding for all its characters whereas an email can have characters encoded using different encoding schemes. Therefore, it is more correct that parse-email deals with bytevectors instead of strings. * email/utils.scm (read-bytes-till): New function. * email/email.scm (body->mime-entities, email->headers+body, decode-body): Deal with emails as bytevectors instead of strings. (parse-mime-entity): Rename text argument to bv. (parse-email, parse-email-body): Overload to handle input in the form of a string or bytevector. * doc/guile-email.texi (Parsing e-mail): Document overloading of parse-email and parse-email-body. * tests/email.scm ("handle truncated multipart message gracefully"): Deal in bytevectors instead of strings. ("email with 8 bit encoding and non UTF-8 charset", "multipart email with a 8 bit encoding and non UTF-8 charset part"): New tests. * tests/email-with-8bit-encoding-and-non-utf8-charset, tests/multipart-email-with-a-8bit-encoding-and-non-utf8-charset-part: New files. Reported-by: Jack Hill <jackhill@jackhill.us>
2019-07-26	email: Match mime-entity-fields only against headers.	Arun Isaac
	* email/email.scm (parse-mime-entity): Match mime-entity-fields only against the headers, not the whole email.
2019-07-26	email: Import all of (email utils).	Arun Isaac
	* email/email.scm: Import all of (email utils), not a subset of the exported functions.
2019-07-21	email: Decode MIME encoded words in Subject header.	Arun Isaac
	Prior to this, MIME encoded words in the Subject header were not decoded. * email/email.scm (parse-email-headers): Decode MIME encoded words in Subject header. * tests/email.scm ("decode MIME encoded words in Subject header"): New test. Reported-by: Ricardo Wurmus <rekado@elephly.net>
2019-06-25	email: Fix typo in docstring of parse-mime-entity.	Arun Isaac
	* email/email.scm (parse-mime-entity): Replace "a" with "an" in docstring.
2018-11-13	email: Support emacs message mode parens style addresses.	Arun Isaac
	* email/email.scm (define-comment-pattern, define-cfws-pattern, define-dot-atom-pattern, define-domain-pattern, define-addr-spec-pattern): New macros. (captured-comment, captured-cfws, captured-dot-atom, captured-domain, captured-addr-spec): New patterns. (mailbox): Use captured-addr-spec instead of addr-spec. (post-process-mailbox): Handle emacs message mode parens style addresses.
2018-11-13	email: Discard angle brackets in address fields only.	Arun Isaac
	* email/email.scm (define-angle-addr): New macro. (unbracketed-angle-addr): New pattern. (name-addr): Use unbracketed-angle-addr instead of angle-addr. (post-process-mailbox): Do not trim angle brackets from address. That is now handled by the grammar itself.
2018-11-13	email: Deduplicate email address parsing.	Arun Isaac
	* email/email.scm (post-process-mailbox): New function. (parse-email-address): Call post-process-mailbox instead of reimplementing address parsing using regular expressions. (parse-email-headers): Call post-process-mailbox.
2018-11-13	email: Fix typo in parse-email-address docstring.	Arun Isaac
	* email/email.scm (parse-email-address): Fix typo in examples in parse-email-address docstring. The returned value must be an association list of pairs, not of lists.
2018-10-02	utils: Use else for the default cond clause.	Arun Isaac
	* email/utils.scm (read-while)[read-while-loop]: Use else, instead of #t, for the default cond clause.
2018-10-01	email: Do not discard trace fields.	Arun Isaac
	* email/email.scm (angle-addr): Capture "<" and ">". (parse-email-headers): Do not discard trace fields. Trim "<" and ">" from angle-addr in mailbox, but not from trace fields.
2018-10-01	email: Handle truncated messages gracefully.	Arun Isaac
	* email/email.scm (body->mime-entities)[read-mime-entity]: Check for eof-object so that truncated messages are handled gracefully without raising an error.
2018-09-15	quoted-printable: Use specific rnrs libraries.	Arun Isaac
	* email/quoted-printable.scm: Use (rnrs bytevectors) and (rnrs io ports) instead of (rnrs).
2018-09-15	quoted-printable: Use call-with-bytevector-output-port.	Arun Isaac
	* email/quoted-printable.scm (quoted-printable-decode): Use call-with-bytevector-output-port instead of call-with-port and open-bytevector-output-port.
2018-09-15	quoted-printable: Q-encode #\? and #\_ with their ASCII values.	Arun Isaac
	* email/quoted-printable.scm (%q-encoding-literal-char-set, %quoted-printable-literal-char-set): New variables. (quoted-printable-encode): Move core encoding code to ... (quoted-printable-style-encode): ... this new function. (q-encoding-decode): Call quoted-printable-style-encode with the appropriate literal-char-set instead of calling quoted-printable-encode. * tests/quoted-printable.scm (q-encoding of special characters): Add to check for this bug.
2018-09-15	quoted-printable: Encode #\= with its ASCII code.	Arun Isaac
	* email/quoted-printable.scm (quoted-printable-encode): Encode #\= with its ASCII code. * test/quoted-printable.scm (quoted-printable-encoding of =): Add test to check for this bug.
2018-09-14	quoted-printable: Encode printable ASCII characters to themselves.	Arun Isaac
	* email/quoted-printable.scm (quoted-printable-encode): Encode only printable ASCII characters, that is, ASCII characters in the interval [#\space, #\delete), to themselves.
2018-09-12	quoted-printable: Add q-encoding-encode.	Arun Isaac
	* email/quoted-printable.scm (q-encoding-encode): New function. * tests/quoted-printable.scm (q-encoding wikipedia example): Rename to ... (q-encoding wikipedia example: decoding): ... this. (q-encoding wikipedia example: encoding): New test.
2018-09-12	Untabify and re-indent all sources.	Arun Isaac
	* build-aux/test-driver.scm, email/base64.scm, email/email.scm, email/quoted-printable.scm, email/utils.scm, tests/quoted-printable.scm: Untabify and re-indent.
2018-09-12	quoted-printable: Add quoted-printable-encode.	Arun Isaac
	* email/quoted-printable.scm (quoted-printable-encode): New function. * tests/quoted-printable.scm (quoted-printable wikipedia example): Rename to ... (quoted-printable wikipedia example: decoding): ... this. (quoted-printable wikipedia example: encoding, quoted-printable wikipedia example: encoded output should not be more than 76 columns wide): New tests.
2018-09-12	quoted-printable: Close port after use.	Arun Isaac
	* email/quoted-printable.scm (quoted-printable-decode): Close bytevector port after use. In cond, use else instead of #t for the default clause.
2018-09-08	Initial commit.	Arun Isaac