guile-email - Guile email parser

Age	Commit message (Collapse)	Author
2021-10-24	email: Handle unrecognized Content-Transfer-Encoding headers.	Arun Isaac
	* email/email.scm (handle-invalid-headers): New function. (parse-email-headers): Handle invalid headers. * tests/email.scm ("Assume application/octet-stream Content-Type if Content-Transfer-Encoding is unrecognized"): New test.
2021-10-02	email: Do not use an empty bytevector to test the charset.	Mathieu Othacehe
	Using an empty bytevector no longer throws an exception since Guile commit 5ea8c69e9153a970952bf6f0b32c4fad6a28e839. * email/email.scm (post-process-content-transfer-encoding): Use a bytevector of unit length to test the charset validity. Signed-off-by: Arun Isaac <arunisaac@systemreboot.net>
2021-03-15	email: Use only cfws-captured-words in obs-phrase.	Arun Isaac
	* email/email.scm (obs-phrase): Replace word with cfws-captured-word. * tests/email.scm ("Parse names with more than two words"): New test.
2020-12-05	email: Indent better.	Arun Isaac
	* email/email.scm (define-cfws-pattern): Indent better.
2020-12-05	email: Give higher precedence to obsolete id-left, id-right patterns.	Arun Isaac
	* email/email.scm (id-left, id-right): Give higher precedence to obsolete patterns.
2020-12-05	email: Support remaining obsolete specification.	Arun Isaac
	* email/email.scm (obs-phrase-list, obs-utext, obs-unstruct, obs-optional): New macros. (unstructured, in-reply-to, references, keywords, optional-field): Include obsolete patterns.
2020-12-05	email: Support obsolete Received header.	Arun Isaac
	* email/email.scm (received): Include obsolete pattern. (parse-mime-entity): Post process obsolete received forms.
2020-12-05	email: Do not capture cfws in atoms and dot-atoms.	Arun Isaac
	* email/email.scm (define-atom-pattern): Do not capture cfws unless specified. (atom): Do not specify cfws. (define-dot-atom-pattern): Do not capture cfws. (define-word-pattern): New macro. (cfws-captured-atom, cfws-captured-word): New patterns. (obs-phrase): Use cfws-captured-word. (received-token): Capture all. (parse-mime-entity): Post process received and received-token. * tests/email.scm ("parse email headers"): Fix test.
2020-12-05	email: Support obsolete date and time.	Arun Isaac
	* email/email.scm (obs-day-of-week, obs-day, obs-year, obs-hour, obs-minute, obs-second, obs-zone): New macros. (day-of-week, day, year, hours, minutes, seconds, zone): Include obsolete pattern. (parse-email-headers): Handle obsolete two and three digit years, and alphabetic time zone specifiers. * tests/email.scm ("RFC5322 A.6.2. Obsolete dates"): New test.
2020-12-05	email: Support obsolete addressing.	Arun Isaac
	* email/email.scm (obs-qp, obs-fws, obs-no-ws-ctl, obs-ctext, obs-qtext, obs-phrase, obs-local-part, obs-dtext, obs-domain, obs-domain-list, obs-route, obs-angle-addr, captured-atom, captured-obs-domain, captured-domain, obs-mbox-list, obs-group-list, obs-addr-list, obs-id-left, obs-id-right): New patterns. (quoted-pair, fws, ctext, qtext, phrase, dtext, define-angle-addr-pattern, mailbox-list, group-list, address-list, define-field-pattern, from, sender, bcc, id-left, id-right, resent-from, resent-sender, resent-bcc, obs-resent-rply): Include obsolete pattern. (define-printable-ascii-character-pattern-with-obsolete, define-atom-pattern, define-obs-domain-pattern): New macros. (define-domain-pattern): Accept obs-domain as a new argument. (fields): Include obs-resent-rply. * tests/email.scm ("RFC5322 A.6.1. Obsolete addressing"): New test. ("parse email addresses with period in name"): Mark as passing.
2020-05-25	tests: Test inputs of different lengths.	Arun Isaac
	* tests/base64.scm ("base64 random bytevector: base64-encode and base64-decode are inverses of each other", "base64 random bytevector: encoded output should not be more than 76 columns wide", "base64 random bytevector: encoded output must only consist of characters from the base64 alphabet"): Test inputs of different lengths. * tests/quoted-printable.scm ("quoted-printable random bytevector: quoted-printable-encode and quoted-printable-decode are inverses of each other", "quoted-printable random bytevector: encoded output should not be more than 76 columns wide", "quoted-printable random bytevector: encoded output must only consist of printable ASCII characters", "q-encoding random bytevector: q-encoding-encode and q-encoding-decode are inverses of each other"): Test inputs of different lengths.
2020-05-25	email: Decode base64 bytevector without converting to string.	Arun Isaac
	The new base64 decoder can directly operate on bytevectors in addition to strings. This feature may not remain forever, but it greatly improves performance. So, it stays for now. * email/email.scm (decode-body): Decode base64 encoded body directly without converting to an intermediate string.
2020-05-25	email: Do not filter base64 encoded bytes before decoding.	Arun Isaac
	The new base64 decoder skips invalid characters safely. * email/email.scm (decode-body): Do not filter base64 encoded body to remove invalid base64 characters.
2020-05-25	base64: Reimplement from scratch.	Arun Isaac
	* email/base64.scm: Replace file.
2020-05-25	utils: Do not match sequence byte by byte in read-bytes-till.	Arun Isaac
	* email/utils.scm (bytevector-match, bytevector-overlap, lookahead-bytevector-n): New functions. (read-bytes-till): Do not match sequence byte by byte. Process blocks of bytes at a time.
2020-05-25	utils: Introduce the not-end-let utility.	Arun Isaac
	* email/utils.scm (not-end-let): New macro. * .dir-locals.el (scheme-mode): Indent not-end-let correctly.
2020-05-25	utils: Do not return eof if matched at beginning.	Arun Isaac
	* email/utils.scm (read-while, read-bytes-till): Do not return eof if matched at beginning. Return empty string or bytevector respectively. * tests/utils.scm ("read-bytes-till returns empty bytevector on match at beginning", "read-while returns empty string on match at beginning"): New tests.
2019-12-16	base64: Import only the required rnrs modules.	Arun Isaac
	* email/base64.scm: Import (rnrs arithmetic bitwise), (rnrs arithmetic fixnums), (rnrs base), (rnrs bytevectors) and (rnrn io ports), not all of (rnrs).
2019-12-04	email: Handle blank Subject headers.	Arun Isaac
	* email/email.scm (post-process-fields): Treat blank Subject headers as having the null string as value. * tests/email.scm ("blank Subject header must be treated as having the null string as value"): New test. Reported-by: Ricardo Wurmus <rekado@elephly.net>
2019-10-09	email: Return keywords header as a list.	Arun Isaac
	* email/email.scm (parse-email-headers): Return keywords header as a list of strings. * tests/email.scm ("keywords header must be a list"): New test.
2019-10-08	Reindent calls to call-with-port.	Arun Isaac
	* email/email.scm (body->mime-entities, email->headers+body): Reindent calls to call-with-port. * email/quoted-printable.scm (quoted-printable-encode, q-encoding-encode): Reindent calls to call-with-port. * tests/utils.scm ("read-bytes-till returns eof-object on end of file"): Reindent call to call-with-port.
2019-10-08	email: Override invalid charset more strongly.	Arun Isaac
	* email/email.scm (post-process-content-type): Use alist-combine to override charset more strongly than just appending to the alist. * tests/email.scm ("tolerate invalid charset"): Update test.
2019-10-08	email: Introduce alist union utility.	Arun Isaac
	* email/utils.scm (alist-combine): New function. (alist-delete): Delete function. email/email.scm (add-default-headers, add-default-mime-entity-headers): Use alist-combine.
2019-10-08	email: Deduplicate post processing of header fields.	Arun Isaac
	* email/email.scm (post-process-fields): New function. (parse-mime-entity, decode-body): Invoke post-process-fields.
2019-10-02	email: Tolerate decoding errors in body.	Arun Isaac
	* email/email.scm (decode-body): Tolerate decoding errors in the body using the substitute conversion strategy. * tests/email.scm ("tolerate decoding errors in body"): New test.
2019-10-01	email: Tolerate invalid charset.	Arun Isaac
	* email/email.scm (post-process-content-type): If charset is invalid, assume default UTF-8 as charset. * tests/email.scm ("tolerate invalid charset"): New test. Reported-by: Ricardo Wurmus <rekado@elephly.net>
2019-09-28	email: Tolerate decoding errors in MIME encoded words.	Arun Isaac
	* email/email.scm (decode-mime-encoded-word): Tolerate decoding errors in MIME encoded words using the substitute conversion strategy. * tests/email.scm ("tolerate decoding errors in MIME encoded words"): New test. Reported-by: Christopher Baines <mail@cbaines.net>
2019-09-28	email: Remove duplicate unbracketed-angle-addr definition.	Arun Isaac
	* email/email.scm (unbracketed-angle-addr): Delete duplicate definition.
2019-09-23	email: Update mbox->emails docstring.	Arun Isaac
	The earlier docstring was one meant for read-next-email-in-mbox. * email/email.scm (mbox->emails): Update docstring.
2019-09-23	email: Add read-next-email-in-mbox docstring.	Arun Isaac
	* email/email.scm (read-next-email-in-mbox): Add docstring.
2019-09-23	email: Tolerate non-ASCII non-UTF-8 characters in headers.	Arun Isaac
	* email/email.scm (email->headers+body): If non-ASCII non-UTF-8 characters occur in the headers, do not raise a decoding error. Work around using the substitute conversion strategy. * tests/email.scm ("tolerate non-ASCII characters in headers"): Rename to "decode utf-8 characters in headers". ("tolerate non-ascii non-utf-8 characters in headers"): New test. Reported-by: Christopher Baines <mail@cbaines.net>
2019-09-17	email: Tolerate non-ASCII characters in headers.	Arun Isaac
	We tolerate non-ASCII characters in headers in order to support Emacs message mode parens style addresses. * email/email.scm (email->headers+body): Read headers as UTF-8 characters. * tests/email.scm ("tolerate non-ascii characters in headers"): New tests. Reported-by: Christopher Baines <mail@cbaines.net>
2019-08-07	doc: Document mbox->emails.	Arun Isaac
	* doc/guile-email.texi (Reading Email): New chapter. * email/email.scm (mbox->emails): Add docstring.
2019-08-07	utils: Clarify read-while docstring.	Arun Isaac
	* email/utils.scm (read-while): Clarify docstring.
2019-07-28	email: Improve comment about default charset.	Arun Isaac
	* email/email.scm (post-process-content-type): Mention that RFC6657 specifies UTF-8 as the default charset only for text/* media types.
2019-07-28	email: Read mboxes as bytevectors.	Arun Isaac
	* email/email.scm (read-next-email-in-mbox): Read bytes from mboxes, not characters.
2019-07-28	utils: Return eof-object from read-bytes-till on end of file.	Arun Isaac
	* email/utils.scm (read-bytes-till): Return eof-object, not #vu8(), on end of file. * tests/utils.scm: New file. * Makefile.am (SCM_TESTS): Register it.
2019-07-28	email: Decode MIME entities without headers.	Arun Isaac
	* email/email.scm (email->headers+body): If there are no headers, return "" as headers not an eof-object. (parse-email-body): Parse headers of parent entity or email to parse-mime-entity. (add-default-mime-entity-headers): New function. (parse-mime-entity): Use add-default-mime-entity-headers instead of add-default-headers. Handle MIME entities without headers. * tests/email.scm ("decode MIME entity without headers"): New test.
2019-07-28	email: Support email with mixed encoding of characters.	Arun Isaac
	Prior to this, parse-email would accept email in the form of a string. A string is constrained to use the same encoding for all its characters whereas an email can have characters encoded using different encoding schemes. Therefore, it is more correct that parse-email deals with bytevectors instead of strings. * email/utils.scm (read-bytes-till): New function. * email/email.scm (body->mime-entities, email->headers+body, decode-body): Deal with emails as bytevectors instead of strings. (parse-mime-entity): Rename text argument to bv. (parse-email, parse-email-body): Overload to handle input in the form of a string or bytevector. * doc/guile-email.texi (Parsing e-mail): Document overloading of parse-email and parse-email-body. * tests/email.scm ("handle truncated multipart message gracefully"): Deal in bytevectors instead of strings. ("email with 8 bit encoding and non UTF-8 charset", "multipart email with a 8 bit encoding and non UTF-8 charset part"): New tests. * tests/email-with-8bit-encoding-and-non-utf8-charset, tests/multipart-email-with-a-8bit-encoding-and-non-utf8-charset-part: New files. Reported-by: Jack Hill <jackhill@jackhill.us>
2019-07-26	email: Match mime-entity-fields only against headers.	Arun Isaac
	* email/email.scm (parse-mime-entity): Match mime-entity-fields only against the headers, not the whole email.
2019-07-26	email: Import all of (email utils).	Arun Isaac
	* email/email.scm: Import all of (email utils), not a subset of the exported functions.
2019-07-21	email: Decode MIME encoded words in Subject header.	Arun Isaac
	Prior to this, MIME encoded words in the Subject header were not decoded. * email/email.scm (parse-email-headers): Decode MIME encoded words in Subject header. * tests/email.scm ("decode MIME encoded words in Subject header"): New test. Reported-by: Ricardo Wurmus <rekado@elephly.net>
2019-06-25	email: Fix typo in docstring of parse-mime-entity.	Arun Isaac
	* email/email.scm (parse-mime-entity): Replace "a" with "an" in docstring.
2018-11-13	email: Support emacs message mode parens style addresses.	Arun Isaac
	* email/email.scm (define-comment-pattern, define-cfws-pattern, define-dot-atom-pattern, define-domain-pattern, define-addr-spec-pattern): New macros. (captured-comment, captured-cfws, captured-dot-atom, captured-domain, captured-addr-spec): New patterns. (mailbox): Use captured-addr-spec instead of addr-spec. (post-process-mailbox): Handle emacs message mode parens style addresses.
2018-11-13	email: Discard angle brackets in address fields only.	Arun Isaac
	* email/email.scm (define-angle-addr): New macro. (unbracketed-angle-addr): New pattern. (name-addr): Use unbracketed-angle-addr instead of angle-addr. (post-process-mailbox): Do not trim angle brackets from address. That is now handled by the grammar itself.
2018-11-13	email: Deduplicate email address parsing.	Arun Isaac
	* email/email.scm (post-process-mailbox): New function. (parse-email-address): Call post-process-mailbox instead of reimplementing address parsing using regular expressions. (parse-email-headers): Call post-process-mailbox.
2018-11-13	email: Fix typo in parse-email-address docstring.	Arun Isaac
	* email/email.scm (parse-email-address): Fix typo in examples in parse-email-address docstring. The returned value must be an association list of pairs, not of lists.
2018-10-02	utils: Use else for the default cond clause.	Arun Isaac
	* email/utils.scm (read-while)[read-while-loop]: Use else, instead of #t, for the default cond clause.
2018-10-01	email: Do not discard trace fields.	Arun Isaac
	* email/email.scm (angle-addr): Capture "<" and ">". (parse-email-headers): Do not discard trace fields. Trim "<" and ">" from angle-addr in mailbox, but not from trace fields.
2018-10-01	email: Handle truncated messages gracefully.	Arun Isaac
	* email/email.scm (body->mime-entities)[read-mime-entity]: Check for eof-object so that truncated messages are handled gracefully without raising an error.