Skip to content

Bad handling of malformed dates coming from BSky #319

@merickson

Description

@merickson

On occasion, sometimes dates are incorrectly formatted coming from the BlueSky API.

For example, take this BlueSky post (https://bsky.app/profile/did:plc:xplfowvb5fkrduzpw4s6fsrv/post/3k4jng4uxmv2d):

Object {
        "$type": String("app.bsky.feed.post"),
        "createdAt": String("2023-08-09T09:44:46.189371"),
        "embed": Object {
            "$type": String("app.bsky.embed.images"),
            "images": Array [
                Object {
                    "alt": String(""),
                    "image": Object {
                        "cid": String("bafkreiea22et5bwq4ulvc5r6etbqstqlx5onccnfeblve4zwdz62grbdx4"),
                        "mimeType": String("image/jpeg"),
                    },
                },
            ],
        },
        "text": String("🇧🇷Choró-boi\n🌎Taraba major\n\nO choró-boi é uma ave  passeriforme da família Thamnophilidae"),
    }

If you'll notice, the createdAt timestamp does not match the regular expression ensuring conformance to the union of RFC3339 and ISO8601 formatting in string.rs:

.get_or_init(|| Regex::new(r"^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?(Z|(\+[0-9]{2}|\-[0-9][1-9]):[0-9]{2})$").unwrap())

Furthermore, because of the use of unwrap() when using RecordData::try_from_unknown() the error isn't bubbled up:

Ok(serde_json::from_slice(&json).unwrap())

I'm not the best Rusteacean at present, but this is what I've come up with for a workaround so far. Assuming r is an individual post coming out of BskyAgent.api.app.bsky.feed.get_posts():

let inter_json = serde_json::to_vec(&r.record).unwrap();
let record: Object<RecordData> = match serde_json::from_slice(inter_json.as_slice()) {
    Ok(o) => o,
    Err(e) => {
        warn!("Got error {}", e);
        warn!("Attempting to fix date...");               
        
        let mut inter_obj: serde_json::value::Value = serde_json::from_slice(inter_json.as_slice()).unwrap();
        let date_str = inter_obj["createdAt"].as_str().unwrap();
        let new_date = chrono::NaiveDateTime::parse_from_str(date_str, "%Y-%m-%dT%H:%M:%S%.f").unwrap();
        debug!("new_date: {}", new_date);
        inter_obj["createdAt"] = serde_json::to_value(new_date.format("%Y-%m-%dT%H:%M:%S%.3fZ").to_string()).unwrap();
        debug!("{:#?}", inter_obj);
        serde_json::from_slice(serde_json::to_vec(&inter_obj).unwrap().as_slice()).unwrap()
    }
};

When I have a little more time I'm happy to contribute to make the fix in the library where appropriate but I wanted to make sure this was logged for future reference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions