@@ -15981,7 +15981,7 @@ generate_dataset(schema: 'Schema', n: 'int' = 100, seed: 'int | None' = None, ou
1598115981 ),
1598215982 )
1598315983
15984- pb.generate_dataset(schema, n=50, seed=23)
15984+ pb.preview(pb. generate_dataset(schema, n=50, seed=23) )
1598515985 ```
1598615986
1598715987
@@ -16819,7 +16819,7 @@ duration_field(min_duration: 'str | timedelta | None' = None, max_duration: 'str
1681916819 ),
1682016820 )
1682116821
16822- pb.generate_dataset(schema, n=100, seed=23)
16822+ pb.preview(pb. generate_dataset(schema, n=100, seed=23) )
1682316823 ```
1682416824
1682516825 Colon-separated strings can also be used for quick duration definitions:
@@ -16830,7 +16830,7 @@ duration_field(min_duration: 'str | timedelta | None' = None, max_duration: 'str
1683016830 break_time=pb.duration_field(min_duration="0:05:00", max_duration="0:30:00"),
1683116831 )
1683216832
16833- pb.generate_dataset(schema, n=30, seed=23)
16833+ pb.preview(pb. generate_dataset(schema, n=30, seed=23) )
1683416834 ```
1683516835
1683616836 Optional durations can be created with `nullable=True`, and duration fields work well
@@ -16850,7 +16850,106 @@ duration_field(min_duration: 'str | timedelta | None' = None, max_duration: 'str
1685016850 ),
1685116851 )
1685216852
16853- pb.generate_dataset(schema, n=30, seed=7)
16853+ pb.preview(pb.generate_dataset(schema, n=30, seed=7))
16854+ ```
16855+
16856+
16857+ profile_fields(*, set: "Literal['minimal', 'standard', 'full']" = 'standard', split_name: 'bool' = True, include: 'list[str] | None' = None, exclude: 'list[str] | None' = None, prefix: 'str | None' = None) -> 'dict[str, StringField]'
16858+
16859+ Create a dict of string field specifications representing a person profile.
16860+
16861+ Returns a dictionary of `StringField` objects suitable for `**`-unpacking into a `Schema()`.
16862+ Each field uses a preset that participates in the existing coherence system, so generated
16863+ data will have coherent names, emails, addresses, and phone numbers within each row.
16864+
16865+ Parameters
16866+ ----------
16867+ set
16868+ The base set of profile fields to include. Options are `"minimal"` (name, email, phone;
16869+ 3-4 columns depending on `split_name=`), `"standard"` (name, email, city, state,
16870+ postcode, phone; 6-7 columns), and `"full"` (name, email, address, city, state,
16871+ postcode, phone, company, job; 9-10 columns). Default is `"standard"`.
16872+ split_name
16873+ Whether to split the name into separate `first_name` and `last_name` columns (`True`,
16874+ the default) or use a single combined `name` column (`False`).
16875+ include
16876+ List of additional preset names to add to the base set. For example,
16877+ `include=["company"]` adds a company column to the `"standard"` set. Presets already
16878+ in the base set are silently ignored.
16879+ exclude
16880+ List of preset names to remove from the (possibly augmented) set. For example,
16881+ `exclude=["postcode"]` removes the postcode column. Presets not in the set are silently
16882+ ignored.
16883+ prefix
16884+ Optional string to prepend to every column name. For example, `prefix="customer_"`
16885+ produces keys like `"customer_first_name"`, `"customer_email"`, etc.
16886+
16887+ Returns
16888+ -------
16889+ dict[str, StringField]
16890+ A dictionary mapping column names to `StringField` objects, ordered logically (name fields
16891+ first, then contact, address, phone, business).
16892+
16893+ Raises
16894+ ------
16895+ ValueError
16896+ If `set=` is not one of `"minimal"`, `"standard"`, or `"full"`; if `include=` or `exclude=`
16897+ contain unknown preset names; if a preset appears in both `include=` and `exclude=`; or if
16898+ `include=` contains name presets incompatible with the `split_name=` setting.
16899+
16900+ Examples
16901+ --------
16902+ The default call returns the `"standard"` set of profile columns. The `**` operator unpacks the
16903+ returned dictionary directly into `Schema()`, as if each `string_field()` call had been written
16904+ by hand. All coherence rules apply automatically: emails are derived from names, and
16905+ city/state/postcode/phone are internally consistent.
16906+
16907+ ```python
16908+ import pointblank as pb
16909+
16910+ schema = pb.Schema(
16911+ user_id=pb.int_field(unique=True),
16912+ **pb.profile_fields(),
16913+ )
16914+
16915+ pb.preview(pb.generate_dataset(schema, n=100, seed=23))
16916+ ```
16917+
16918+ Use `set=` to control how many columns are generated. The `"minimal"` set includes only `name`,
16919+ `email`, and `phone`, while `"full"` adds `address`, `company`, and `job`. Setting
16920+ `split_name=False` collapses `first_name` and `last_name` into a single combined `name` column:
16921+
16922+ ```python
16923+ schema = pb.Schema(
16924+ **pb.profile_fields(set="minimal", split_name=False),
16925+ balance=pb.float_field(min_val=0, max_val=10000),
16926+ )
16927+
16928+ pb.preview(pb.generate_dataset(schema, n=50, seed=23))
16929+ ```
16930+
16931+ The `include=` and `exclude=` parameters let you customize the column set without switching to a
16932+ different base set. Here we start from the `"full"` set but drop the business columns:
16933+
16934+ ```python
16935+ schema = pb.Schema(
16936+ **pb.profile_fields(set="full", exclude=["company", "job"]),
16937+ )
16938+
16939+ pb.preview(pb.generate_dataset(schema, n=50, seed=23, country="DE"))
16940+ ```
16941+
16942+ The `prefix=` parameter prepends a string to every column name, which is especially useful when
16943+ a schema needs two independent profiles (e.g., a sender and a recipient). Each prefixed group
16944+ maintains its own coherence:
16945+
16946+ ```python
16947+ schema = pb.Schema(
16948+ **pb.profile_fields(set="minimal", prefix="sender_"),
16949+ **pb.profile_fields(set="minimal", prefix="recipient_"),
16950+ )
16951+
16952+ pb.preview(pb.generate_dataset(schema, n=50, seed=23))
1685416953 ```
1685516954
1685616955
0 commit comments