From ac83c2a00c13702bc365cd0f3074239fa63d743f Mon Sep 17 00:00:00 2001 From: Arun Isaac Date: Fri, 26 Jul 2019 01:53:22 +0530 Subject: email: Support email with mixed encoding of characters. Prior to this, parse-email would accept email in the form of a string. A string is constrained to use the same encoding for all its characters whereas an email can have characters encoded using different encoding schemes. Therefore, it is more correct that parse-email deals with bytevectors instead of strings. * email/utils.scm (read-bytes-till): New function. * email/email.scm (body->mime-entities, email->headers+body, decode-body): Deal with emails as bytevectors instead of strings. (parse-mime-entity): Rename text argument to bv. (parse-email, parse-email-body): Overload to handle input in the form of a string or bytevector. * doc/guile-email.texi (Parsing e-mail): Document overloading of parse-email and parse-email-body. * tests/email.scm ("handle truncated multipart message gracefully"): Deal in bytevectors instead of strings. ("email with 8 bit encoding and non UTF-8 charset", "multipart email with a 8 bit encoding and non UTF-8 charset part"): New tests. * tests/email-with-8bit-encoding-and-non-utf8-charset, tests/multipart-email-with-a-8bit-encoding-and-non-utf8-charset-part: New files. Reported-by: Jack Hill --- doc/guile-email.texi | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) (limited to 'doc/guile-email.texi') diff --git a/doc/guile-email.texi b/doc/guile-email.texi index b606021..70a4e28 100644 --- a/doc/guile-email.texi +++ b/doc/guile-email.texi @@ -53,22 +53,42 @@ RF2047 and RFC2049. @node Parsing e-mail @chapter Parsing e-mail -@deffn {Scheme Procedure} parse-email email -Parse string @var{email} and return result as an record. -@end deffn +@deftypefn {Scheme Procedure} parse-email (bytevector @var{email}) +@deftypefnx {Scheme Procedure} parse-email (string @var{email}) +Parse bytevector @var{email} and return result as an @code{} +record. + +Parse string @var{email} and return result as an @code{} +record. +@end deftypefn @deffn {Scheme Procedure} parse-email-headers headers Parse string @var{headers} as email headers and return an association list of header keys and values. @end deffn -@deffn {Scheme Procedure} parse-email-body headers body -Parse @var{body} as email body where @var{headers} is an association -list of header keys and values as returned by +@deftypefn {Scheme Procedure} parse-email-body (string @var{headers}) (bytevector @var{body}) +@deftypefnx {Scheme Procedure} parse-email-body (string @var{headers}) (string @var{body}) +Parse bytevector @var{body} as email body where @var{headers} is an +association list of header keys and values as returned by @code{parse-email-headers}. Return a list of records if the body is a multipart message. Else, return a single record. -@end deffn + +Parse string @var{body} as email body where @var{headers} is an +association list of header keys and values as returned by +@code{parse-email-headers}. Return a list of records if +the body is a multipart message. Else, return a single +record. +@end deftypefn + +Note that while an email can have characters encoded using different +schemes, a string is constrained to have all characters encoded using +the same scheme. Therefore, passing a string to @code{parse-email} or +@code{parse-email-body} will not always produce correct results. Hence, +this variant of @code{parse-email} and @code{parse-email-body} will be +deprecated in the future. This variant is only provided in the interest +of backward compatibility. @node Encoding and Decoding @chapter Encoding and Decoding -- cgit v1.2.3