missing messages when DC ships bad utf8 #20

Open
opened 2018-11-18 07:09:03 +00:00 by mappu04 · 3 comments
Owner

We're dealing with raw bytes almost everywhere. It's assumed these are UTF-8 strings.

When DC sends us a message that's not valid UTF-8 (e.g. today, a pasted em-dash in Windows-1252 encoding(?)) TG silently(?) drops the message.

We should either

  • better handle error messages returned from the message-submission API, or
  • filter out invalid characters before submitting them to TG
We're dealing with raw bytes almost everywhere. It's assumed these are UTF-8 strings. When DC sends us a message that's not valid UTF-8 (e.g. today, a pasted em-dash in Windows-1252 encoding(?)) TG silently(?) drops the message. We should either - better handle error messages returned from the message-submission API, or - filter out invalid characters before submitting them to TG
mappu04 added the
bug
label 2018-11-18 07:09:39 +00:00
Author
Owner

Attempt to parse all incoming messages as UTF-8. If they fail, replace invalid characters with U+FFFD REPLACEMENT CHARACTER

Attempt to parse all incoming messages as UTF-8. If they fail, replace invalid characters with U+FFFD REPLACEMENT CHARACTER
Author
Owner
2019/04/07 13:02:08 Delivering PM to telegram user: Bad Request: strings must be encoded in UTF-8
``` 2019/04/07 13:02:08 Delivering PM to telegram user: Bad Request: strings must be encoded in UTF-8 ```
Author
Owner
2020/03/26 11:12:28 Delivering public message to group chat: Bad Request: text must be encoded in UTF-8
``` 2020/03/26 11:12:28 Delivering public message to group chat: Bad Request: text must be encoded in UTF-8 ```
Sign in to join this conversation.
No Label
bug
wishlist
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: code.ivysaur.me/nmdc-telegramfrontend#20
No description provided.