Skip to content

Proposition | New form of Zip: MultiZip #1077

Open
@the-black-wolf

Description

@the-black-wolf

@atifaziz, as instructed

I used this in financial analytics. Its a form of a single Reduce step of Map/Reduce pattern. it implements Zip but on a variable number of sources, with the caveat that they are all of the same type.

Current values of all source streams are packed in an array and send to the caller suppled resultSelector reducer which will process them into TResult.

It takes several parameters:

public static IEnumerable<TResult> MultiZip<TSource, TResult>(
    IEnumerable<IEnumerable<TSource>> sourceList,
    Func<TSource[], bool[], TResult> resultSelector,
    MultiZipMissingSource missingSourceAction = MultiZipMissingSource.Remove,
    MultiZipMode operatingMode = MultiZipMode.Shortest,
    bool immutableResultSource = false)

sourceList is the list of streams. resultSelector is the reducer, missingSourceAction and operatingMode define how the cycling will be done (see below). Ultimately I've taken the naming scheme you used in the three separate Zip* methods with variable-type-fixed-streams you already have. Defaults to removing missing sources and doing the Shortest (exit on first exhaust) mode.
immutableResultSource controls allocation by reusing the results arrays, which can make a difference in a scenario with large number of sources. For most uses it can be reused, if caller wants to store arrays somewhere, they can flip this to true.
resultSelector is also supplied with a bool-map of which elements were sourced from streams and which are default(TSource) for exhausted streams in Longest mode of operation.

Explanation of enums:

        /// <summary>
        /// Determines the behavior of MultiZip extension when the list of sources contains a missing (null) source.
        /// </summary>
        public enum MultiZipMissingSource
        {
            /// <summary>
            /// Removes all missing (null) sources from the sourceList. This will affect the ordering of values sent to resultSelector. This is the default MultiZip behavior.
            /// </summary>
            Remove,
            /// <summary>
            /// Ignore missing (null) sources, treat them as empty, serving default values and indicating an exhausted source. Keep in mind that missing sources will prevent processing if exhausted action is break or error. 
            /// </summary>
            Ignore,
            /// <summary>
            /// Throws InvalidOperationException if any of the sources is missing (null). Will not throw exception on valid but empty sources.
            /// </summary>
            Error
        }

        /// <summary>
        /// Determines the behavior of MultiZip when one or more sources are exhausted. Effectively implements Shortest, Longest and Equi variants of the MultiZip implementation.
        /// </summary>
        public enum MultiZipMode
        {
            /// <summary>
            /// Stops processing as soon as any of the sources is exhausted. The boolean map sent to resultSelector will always contain `true` values and can be ignored. This is the default MultiZip behavior and is the same as MultiZip-Shortest.
            /// </summary>
            Shortest,
            /// <summary>
            /// Continues processing until all sources are exhausted. Exhausted sources will continue serving the default value for T, please use the boolean map sent to resultSelector to determine which values are sourced and which are default. This behavior is the same as MultiZip-Longest.
            /// </summary>
            Longest,
            /// <summary>
            /// Will throw InvalidOperationException if any (but not all) sources are exhausted. If all sources are exhausted at the same time, it simply ends processing. The boolean map sent to resultSelector will always contain `true` values and can be ignored. This behavior is the same as MultiZip-Equi.
            /// </summary>
            Equi
        }

The version I have does not support async, but it can be made. Also, several overrides can be made to facilitate different forms of result selector.

Ok, let me know. I can post this relatively quickly, I already have the code, I just need to write some tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions