Skip to content

Multipart body should not be map[string][]byte #822

@abiriadev

Description

@abiriadev

First of all, thanks for the wonderful project! colly has saved our team a lot of time!!

Context

ref: #8, #33

According to RFC7578 section 4.3:

4.3. Multiple Files for One Form Field

The form data for a form field might include multiple files.
[RFC2388] suggested that multiple files for a single form field be transmitted using a nested "multipart/mixed" part. This usage is deprecated.

To match widely deployed implementations, multiple files MUST be sent by supplying each file in a separate part but all with the same "name" parameter.

Receiving applications intended for wide applicability (e.g., multipart/form-data parsing libraries) SHOULD also support the older method of supplying multiple files.

and this practice is unsurprisingly common, and I am facing the exact same case.

The issue

The name field does not have to be unique. There are few common cases when a duplicated name field is required (e.g., when uploading an array of files), and this case should be properly covered.

colly/colly.go

Lines 551 to 559 in 99b7fb1

// PostMultipart starts a collector job by creating a Multipart POST request
// with raw binary data. PostMultipart also calls the previously provided callbacks
func (c *Collector) PostMultipart(URL string, requestData map[string][]byte) error {
boundary := randomBoundary()
hdr := http.Header{}
hdr.Set("Content-Type", "multipart/form-data; boundary="+boundary)
hdr.Set("User-Agent", c.UserAgent)
return c.scrape(URL, "POST", 1, createMultipartReader(boundary, requestData), nil, hdr, true)
}

colly/colly.go

Lines 1461 to 1469 in 99b7fb1

buffer.WriteString("Content-type: multipart/form-data; boundary=" + boundary + "\n\n")
for contentType, content := range data {
buffer.WriteString(dashBoundary + "\n")
buffer.WriteString("Content-Disposition: form-data; name=" + contentType + "\n")
buffer.WriteString(fmt.Sprintf("Content-Length: %d \n\n", len(content)))
buffer.Write(content)
buffer.WriteString("\n")
}
buffer.WriteString(dashBoundary + "--\n\n")

Unfortunately, the current implementaion accepts map[string][]byte, which enforces name to be unique.

Suggestion

Maybe we can accept []Subpart so that:

  1. The order of subparts is guaranteed
  2. filename and other metadata can be optionally included
  3. Duplicate name fields are allowed

and so on.

I would love to hear your opinion! If you think this is feasible, I will start working on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions