Description
I want a translation of the exercise custom-set to C.
But I want YOUR input before I submit a PR.
Q: Which approaches do we want to allow/encourage?
The three most "obvious" approaches are
- Use a sorted (dynamically allocated) array.
Pro: Probably the easiest to implement, cache-friendly.
Con:insert()
is in O(n). - Use a binary tree (e.g. Red-Black-Tree).
Pro: In my experience having implemented it correctly is a great achievement.
Con: That's hard to implement and time-consuming to mentor. - Use a hash table.
Pro:insert()
is in amortized O(1),contains()
is in expected O(1).
Con: For beginners that's a hard task, they have to choose a hash function, an approach for collision resolution, and they might even have to resize it. And it's probably hard to mentor, too.
The tests and the initial .h
file can be written in a way that encourages/allows/disallows those three approaches, e.g. if custom_set_create()
takes a hash function or if the struct custom_set
is defined in the .h
file that would steer the students in a certain direction.
Q: Is there a fixed limit for the size of the set?
- Yes: That's much easier to implement.
- No: IMHO that's closer to many "real-life" applications.
Q: Do we want to allow/encourage/enforce arbitrary element types?
- Yes: That's a nice thing to learn and it should give some insight on how to implement those type of (old-school) generic data structures and how
qsort()
andbsearch()
work. - No: The canonical tests only use integers. That's much easier to implement.
(BTW: The tests mention "mixed-type sets" in a comment, but I wouldn't know how to do that properly in C without hard-coding the allowed types or having some sort of "base struct".)
The canonical tests require these operations:
some sort of creation, add, empty, contains, subset, disjoint, equal, intersection, difference, and union.
Q: Do we want anything further, e.g. size or some sort of access to the elements?
And last not least, Q: Should be have this exercise at all in the C track?
Please be candid, I'm open to all sorts of constructive criticism.
I already did some of the work and created two translations. Both have a Python script that reads the canonical-data.json and prints the complete test_custom_set.c
. Both implement the struct custom_set
with a dynamically allocated sorted array.
In the first one the type of the elements is int
, the function custom_test_create()
takes a size and a variable argument list (probably a first in the C track, right?). The .h
file looks like this:
struct custom_set;
struct custom_set *custom_set_create(size_t size, ...);
void custom_set_destroy(struct custom_set *cs);
size_t custom_set_size(struct custom_set const *cs);
bool custom_set_is_empty(struct custom_set const *cs);
bool custom_set_contains(struct custom_set const *cs, int element);
bool custom_set_is_subset_of(struct custom_set const *subset, struct custom_set const *superset);
bool custom_set_disjoint(struct custom_set const *cs1, struct custom_set const *cs2);
bool custom_set_equal(struct custom_set const *cs1, struct custom_set const *cs2);
bool custom_set_add(struct custom_set *cs, int element);
struct custom_set *custom_set_intersection(struct custom_set const *cs1, struct custom_set const *cs2);
struct custom_set *custom_set_difference(struct custom_set const *cs1, struct custom_set const *cs2);
struct custom_set *custom_set_union(struct custom_set const *cs1, struct custom_set const *cs2);
The second one is more generic, its custom_test_create()
takes the size of the element type and a comparison function. The .h
file looks like this:
struct custom_set;
struct custom_set *custom_set_create(
size_t elem_size,
int (*cmp)(void const *, void const *));
struct custom_set *custom_set_create_with_elems(
size_t elem_size,
int (*cmp)(void const *, void const *),
size_t nelems, void const *elements);
void custom_set_destroy(struct custom_set *cs);
size_t custom_set_size(struct custom_set const *cs);
bool custom_set_is_empty(struct custom_set const *cs);
bool custom_set_contains(struct custom_set const *cs, void const *element);
bool custom_set_is_subset_of(struct custom_set const *subset, struct custom_set const *superset);
bool custom_set_disjoint(struct custom_set const *cs1, struct custom_set const *cs2);
bool custom_set_equal(struct custom_set const *cs1, struct custom_set const *cs2);
bool custom_set_add(struct custom_set *cs, void const *element);
struct custom_set *custom_set_intersection(struct custom_set const *cs1, struct custom_set const *cs2);
struct custom_set *custom_set_difference(struct custom_set const *cs1, struct custom_set const *cs2);
struct custom_set *custom_set_union(struct custom_set const *cs1, struct custom_set const *cs2);
Both translations compile, pass the tests and make memcheck
.
Let me know what you think.