Age | Commit message (Expand) | Author |
2023-09-03 | email: Tolerate parentheses in display names.•••* email/email.scm (define-atom-pattern): Support customization of the
atext pattern as well.
(define-phrase-pattern): New macro.
(obs-phrase): Define using define-phrase-pattern.
(liberal-atext, liberal-cfws-captured-atom,
liberal-cfws-captured-word, liberal-phrase): New patterns.
(display-name): Use liberal-phrase instead of phrase.
* tests/email.scm ("tolerate email addresses with parentheses in
name"): New test.
| Arun Isaac |
2023-01-06 | email: Support Date fields with missing seconds.•••* email/email.scm (parse-email-headers): Extend the date-time parser to
match when seconds are missing, defaulting to "0".
* tests/email.scm ("parse Date", "parse Date without seconds"): New
tests.
Signed-off-by: Arun Isaac <arunisaac@systemreboot.net>
| Andrew Whatson |
2023-01-03 | email: Support quoted-printable CR LF sequences.•••* email/quoted-printable.scm (quoted-printable-decode): Ignore "=\r\n"
sequences in the input.
* tests/quoted-printable.scm ("quoted-printable decoding of soft line
breaks (=\\n)", "quoted-printable decoding of soft line
breaks (=\\r\\n)"): New tests.
Signed-off-by: Arun Isaac <arunisaac@systemreboot.net>
| Andrew Whatson |
2021-10-24 | email: Handle Received header with two tokens but no timestamp.•••* email/email.scm (parse-email-headers): Match Received header with
timestamp more precisely.
* tests/email.scm ("Parse Received header with two tokens but no
timestamp"): No test.
| Arun Isaac |
2021-10-24 | email: Handle unrecognized Content-Transfer-Encoding headers.•••* email/email.scm (handle-invalid-headers): New function.
(parse-email-headers): Handle invalid headers.
* tests/email.scm ("Assume application/octet-stream Content-Type if
Content-Transfer-Encoding is unrecognized"): New test.
| Arun Isaac |
2021-10-02 | email: Do not use an empty bytevector to test the charset.•••Using an empty bytevector no longer throws an exception since Guile
commit 5ea8c69e9153a970952bf6f0b32c4fad6a28e839.
* email/email.scm (post-process-content-transfer-encoding): Use a
bytevector of unit length to test the charset validity.
Signed-off-by: Arun Isaac <arunisaac@systemreboot.net>
| Mathieu Othacehe |
2021-03-15 | email: Use only cfws-captured-words in obs-phrase.•••* email/email.scm (obs-phrase): Replace word with cfws-captured-word.
* tests/email.scm ("Parse names with more than two words"): New test.
| Arun Isaac |
2020-12-05 | email: Indent better.•••* email/email.scm (define-cfws-pattern): Indent better.
| Arun Isaac |
2020-12-05 | email: Give higher precedence to obsolete id-left, id-right patterns.•••* email/email.scm (id-left, id-right): Give higher precedence to
obsolete patterns.
| Arun Isaac |
2020-12-05 | email: Support remaining obsolete specification.•••* email/email.scm (obs-phrase-list, obs-utext, obs-unstruct,
obs-optional): New macros.
(unstructured, in-reply-to, references, keywords, optional-field):
Include obsolete patterns.
| Arun Isaac |
2020-12-05 | email: Support obsolete Received header.•••* email/email.scm (received): Include obsolete pattern.
(parse-mime-entity): Post process obsolete received forms.
| Arun Isaac |
2020-12-05 | email: Do not capture cfws in atoms and dot-atoms.•••* email/email.scm (define-atom-pattern): Do not capture cfws unless
specified.
(atom): Do not specify cfws.
(define-dot-atom-pattern): Do not capture cfws.
(define-word-pattern): New macro.
(cfws-captured-atom, cfws-captured-word): New patterns.
(obs-phrase): Use cfws-captured-word.
(received-token): Capture all.
(parse-mime-entity): Post process received and received-token.
* tests/email.scm ("parse email headers"): Fix test.
| Arun Isaac |
2020-12-05 | email: Support obsolete date and time.•••* email/email.scm (obs-day-of-week, obs-day, obs-year, obs-hour,
obs-minute, obs-second, obs-zone): New macros.
(day-of-week, day, year, hours, minutes, seconds, zone): Include
obsolete pattern.
(parse-email-headers): Handle obsolete two and three digit years, and
alphabetic time zone specifiers.
* tests/email.scm ("RFC5322 A.6.2. Obsolete dates"): New test.
| Arun Isaac |
2020-12-05 | email: Support obsolete addressing.•••* email/email.scm (obs-qp, obs-fws, obs-no-ws-ctl, obs-ctext,
obs-qtext, obs-phrase, obs-local-part, obs-dtext, obs-domain,
obs-domain-list, obs-route, obs-angle-addr, captured-atom,
captured-obs-domain, captured-domain, obs-mbox-list, obs-group-list,
obs-addr-list, obs-id-left, obs-id-right): New patterns.
(quoted-pair, fws, ctext, qtext, phrase, dtext,
define-angle-addr-pattern, mailbox-list, group-list, address-list,
define-field-pattern, from, sender, bcc, id-left, id-right,
resent-from, resent-sender, resent-bcc, obs-resent-rply): Include
obsolete pattern.
(define-printable-ascii-character-pattern-with-obsolete,
define-atom-pattern, define-obs-domain-pattern): New macros.
(define-domain-pattern): Accept obs-domain as a new argument.
(fields): Include obs-resent-rply.
* tests/email.scm ("RFC5322 A.6.1. Obsolete addressing"): New test.
("parse email addresses with period in name"): Mark as passing.
| Arun Isaac |
2020-05-25 | tests: Test inputs of different lengths.•••* tests/base64.scm ("base64 random bytevector: base64-encode and
base64-decode are inverses of each other", "base64 random
bytevector: encoded output should not be more than 76 columns wide",
"base64 random bytevector: encoded output must only consist of
characters from the base64 alphabet"): Test inputs of different lengths.
* tests/quoted-printable.scm ("quoted-printable random bytevector:
quoted-printable-encode and quoted-printable-decode are inverses of
each other", "quoted-printable random bytevector: encoded output
should not be more than 76 columns wide", "quoted-printable random
bytevector: encoded output must only consist of printable ASCII
characters", "q-encoding random bytevector: q-encoding-encode and
q-encoding-decode are inverses of each other"): Test inputs of
different lengths.
| Arun Isaac |
2020-05-25 | email: Decode base64 bytevector without converting to string.•••The new base64 decoder can directly operate on bytevectors in addition
to strings. This feature may not remain forever, but it greatly
improves performance. So, it stays for now.
* email/email.scm (decode-body): Decode base64 encoded body directly
without converting to an intermediate string.
| Arun Isaac |
2020-05-25 | email: Do not filter base64 encoded bytes before decoding.•••The new base64 decoder skips invalid characters safely.
* email/email.scm (decode-body): Do not filter base64 encoded body to
remove invalid base64 characters.
| Arun Isaac |
2020-05-25 | base64: Reimplement from scratch.•••* email/base64.scm: Replace file.
| Arun Isaac |
2020-05-25 | utils: Do not match sequence byte by byte in read-bytes-till.•••* email/utils.scm (bytevector-match, bytevector-overlap,
lookahead-bytevector-n): New functions.
(read-bytes-till): Do not match sequence byte by byte. Process blocks
of bytes at a time.
| Arun Isaac |
2020-05-25 | utils: Introduce the not-end-let utility.•••* email/utils.scm (not-end-let): New macro.
* .dir-locals.el (scheme-mode): Indent not-end-let correctly.
| Arun Isaac |
2020-05-25 | utils: Do not return eof if matched at beginning.•••* email/utils.scm (read-while, read-bytes-till): Do not return eof if
matched at beginning. Return empty string or bytevector respectively.
* tests/utils.scm ("read-bytes-till returns empty bytevector on match
at beginning", "read-while returns empty string on match at
beginning"): New tests.
| Arun Isaac |
2019-12-16 | base64: Import only the required rnrs modules.•••* email/base64.scm: Import (rnrs arithmetic bitwise), (rnrs arithmetic
fixnums), (rnrs base), (rnrs bytevectors) and (rnrn io ports), not all
of (rnrs).
| Arun Isaac |
2019-12-04 | email: Handle blank Subject headers.•••* email/email.scm (post-process-fields): Treat blank Subject headers
as having the null string as value.
* tests/email.scm ("blank Subject header must be treated as having the
null string as value"): New test.
Reported-by: Ricardo Wurmus <rekado@elephly.net>
| Arun Isaac |
2019-10-09 | email: Return keywords header as a list.•••* email/email.scm (parse-email-headers): Return keywords header as a
list of strings.
* tests/email.scm ("keywords header must be a list"): New test.
| Arun Isaac |
2019-10-08 | Reindent calls to call-with-port.•••* email/email.scm (body->mime-entities, email->headers+body): Reindent
calls to call-with-port.
* email/quoted-printable.scm (quoted-printable-encode,
q-encoding-encode): Reindent calls to call-with-port.
* tests/utils.scm ("read-bytes-till returns eof-object on end of
file"): Reindent call to call-with-port.
| Arun Isaac |
2019-10-08 | email: Override invalid charset more strongly.•••* email/email.scm (post-process-content-type): Use alist-combine to
override charset more strongly than just appending to the alist.
* tests/email.scm ("tolerate invalid charset"): Update test.
| Arun Isaac |
2019-10-08 | email: Introduce alist union utility.•••* email/utils.scm (alist-combine): New function.
(alist-delete*): Delete function.
* email/email.scm (add-default-headers,
add-default-mime-entity-headers): Use alist-combine.
| Arun Isaac |
2019-10-08 | email: Deduplicate post processing of header fields.•••* email/email.scm (post-process-fields): New function.
(parse-mime-entity, decode-body): Invoke post-process-fields.
| Arun Isaac |
2019-10-02 | email: Tolerate decoding errors in body.•••* email/email.scm (decode-body): Tolerate decoding errors in the body
using the substitute conversion strategy.
* tests/email.scm ("tolerate decoding errors in body"): New test.
| Arun Isaac |
2019-10-01 | email: Tolerate invalid charset.•••* email/email.scm (post-process-content-type): If charset is invalid,
assume default UTF-8 as charset.
* tests/email.scm ("tolerate invalid charset"): New test.
Reported-by: Ricardo Wurmus <rekado@elephly.net>
| Arun Isaac |
2019-09-28 | email: Tolerate decoding errors in MIME encoded words.•••* email/email.scm (decode-mime-encoded-word): Tolerate decoding errors
in MIME encoded words using the substitute conversion strategy.
* tests/email.scm ("tolerate decoding errors in MIME encoded words"):
New test.
Reported-by: Christopher Baines <mail@cbaines.net>
| Arun Isaac |
2019-09-28 | email: Remove duplicate unbracketed-angle-addr definition.•••* email/email.scm (unbracketed-angle-addr): Delete duplicate
definition.
| Arun Isaac |
2019-09-23 | email: Update mbox->emails docstring.•••The earlier docstring was one meant for read-next-email-in-mbox.
* email/email.scm (mbox->emails): Update docstring.
| Arun Isaac |
2019-09-23 | email: Add read-next-email-in-mbox docstring.•••* email/email.scm (read-next-email-in-mbox): Add docstring.
| Arun Isaac |
2019-09-23 | email: Tolerate non-ASCII non-UTF-8 characters in headers.•••* email/email.scm (email->headers+body): If non-ASCII non-UTF-8
characters occur in the headers, do not raise a decoding error. Work
around using the substitute conversion strategy.
* tests/email.scm ("tolerate non-ASCII characters in headers"): Rename
to "decode utf-8 characters in headers".
("tolerate non-ascii non-utf-8 characters in headers"): New test.
Reported-by: Christopher Baines <mail@cbaines.net>
| Arun Isaac |
2019-09-17 | email: Tolerate non-ASCII characters in headers.•••We tolerate non-ASCII characters in headers in order to support Emacs
message mode parens style addresses.
* email/email.scm (email->headers+body): Read headers as UTF-8
characters.
* tests/email.scm ("tolerate non-ascii characters in headers"): New
tests.
Reported-by: Christopher Baines <mail@cbaines.net>
| Arun Isaac |
2019-08-07 | doc: Document mbox->emails.•••* doc/guile-email.texi (Reading Email): New chapter.
* email/email.scm (mbox->emails): Add docstring.
| Arun Isaac |
2019-08-07 | utils: Clarify read-while docstring.•••* email/utils.scm (read-while): Clarify docstring.
| Arun Isaac |
2019-07-28 | email: Improve comment about default charset.•••* email/email.scm (post-process-content-type): Mention that RFC6657
specifies UTF-8 as the default charset only for text/* media types.
| Arun Isaac |
2019-07-28 | email: Read mboxes as bytevectors.•••* email/email.scm (read-next-email-in-mbox): Read bytes from mboxes,
not characters.
| Arun Isaac |
2019-07-28 | utils: Return eof-object from read-bytes-till on end of file.•••* email/utils.scm (read-bytes-till): Return eof-object, not #vu8(), on
end of file.
* tests/utils.scm: New file.
* Makefile.am (SCM_TESTS): Register it.
| Arun Isaac |
2019-07-28 | email: Decode MIME entities without headers.•••* email/email.scm (email->headers+body): If there are no headers,
return "" as headers not an eof-object.
(parse-email-body): Parse headers of parent entity or email to
parse-mime-entity.
(add-default-mime-entity-headers): New function.
(parse-mime-entity): Use add-default-mime-entity-headers instead of
add-default-headers. Handle MIME entities without headers.
* tests/email.scm ("decode MIME entity without headers"): New test.
| Arun Isaac |
2019-07-28 | email: Support email with mixed encoding of characters.•••Prior to this, parse-email would accept email in the form of a
string. A string is constrained to use the same encoding for all its
characters whereas an email can have characters encoded using
different encoding schemes. Therefore, it is more correct that
parse-email deals with bytevectors instead of strings.
* email/utils.scm (read-bytes-till): New function.
* email/email.scm (body->mime-entities, email->headers+body,
decode-body): Deal with emails as bytevectors instead of strings.
(parse-mime-entity): Rename text argument to bv.
(parse-email, parse-email-body): Overload to handle input in the form
of a string or bytevector.
* doc/guile-email.texi (Parsing e-mail): Document overloading of
parse-email and parse-email-body.
* tests/email.scm ("handle truncated multipart message gracefully"):
Deal in bytevectors instead of strings.
("email with 8 bit encoding and non UTF-8 charset", "multipart email
with a 8 bit encoding and non UTF-8 charset part"): New tests.
* tests/email-with-8bit-encoding-and-non-utf8-charset,
tests/multipart-email-with-a-8bit-encoding-and-non-utf8-charset-part:
New files.
Reported-by: Jack Hill <jackhill@jackhill.us>
| Arun Isaac |
2019-07-26 | email: Match mime-entity-fields only against headers.•••* email/email.scm (parse-mime-entity): Match mime-entity-fields only
against the headers, not the whole email.
| Arun Isaac |
2019-07-26 | email: Import all of (email utils).•••* email/email.scm: Import all of (email utils), not a subset of the
exported functions.
| Arun Isaac |
2019-07-21 | email: Decode MIME encoded words in Subject header.•••Prior to this, MIME encoded words in the Subject header were not
decoded.
* email/email.scm (parse-email-headers): Decode MIME encoded words in
Subject header.
* tests/email.scm ("decode MIME encoded words in Subject header"): New
test.
Reported-by: Ricardo Wurmus <rekado@elephly.net>
| Arun Isaac |
2019-06-25 | email: Fix typo in docstring of parse-mime-entity.•••* email/email.scm (parse-mime-entity): Replace "a" with "an" in
docstring.
| Arun Isaac |
2018-11-13 | email: Support emacs message mode parens style addresses.•••* email/email.scm (define-comment-pattern, define-cfws-pattern,
define-dot-atom-pattern, define-domain-pattern,
define-addr-spec-pattern): New macros.
(captured-comment, captured-cfws, captured-dot-atom, captured-domain,
captured-addr-spec): New patterns.
(mailbox): Use captured-addr-spec instead of addr-spec.
(post-process-mailbox): Handle emacs message mode parens style addresses.
| Arun Isaac |
2018-11-13 | email: Discard angle brackets in address fields only.•••* email/email.scm (define-angle-addr): New macro.
(unbracketed-angle-addr): New pattern.
(name-addr): Use unbracketed-angle-addr instead of angle-addr.
(post-process-mailbox): Do not trim angle brackets from address. That
is now handled by the grammar itself.
| Arun Isaac |
2018-11-13 | email: Deduplicate email address parsing.•••* email/email.scm (post-process-mailbox): New function.
(parse-email-address): Call post-process-mailbox instead of
reimplementing address parsing using regular expressions.
(parse-email-headers): Call post-process-mailbox.
| Arun Isaac |