Skip to content

Commit 0703cfc

Browse files
authored
Merge pull request #256 from xKDR/v0.1.1
Version 0.1.1 into main
2 parents f9aa828 + 31a80ef commit 0703cfc

19 files changed

+700
-48
lines changed

CONTRIBUTING.md

Lines changed: 56 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22

33
# Contributing to Survey.jl
44

5+
* [Overview](#overview)
6+
* [Reporting Issues](#reporting-issues)
7+
* [Recommended workflow setup](#recommended-workflow-setup)
8+
* [Modifying an existing docstring in `src/`](#modifying-an-existing-docstring-in--src--)
9+
* [Adding a new docstring to `src/`](#adding-a-new-docstring-to--src--)
10+
* [Doctests](#doctests)
11+
* [Integration with exisiting API](#integration-with-exisiting-api)
12+
* [Contributing](#contributing)
13+
* [Style Guidelines](#style-guidelines)
14+
* [Git Recommendations For Pull Requests](#git-recommendations-for-pull-requests)
15+
516
## Overview
617
Thank you for thinking about making contributions to Survey.jl!
718
We aim to keep consistency in contribution guidelines to DataFrames.jl, which is the main upstream dependency for the project.
@@ -16,6 +27,46 @@ Reading through the ColPrac guide for collaborative practices is highly recommen
1627
(`Pkg.add(name="Survey", rev="main")`) is a good gut check and can streamline the process,
1728
along with including the first two lines of output from `versioninfo()`
1829

30+
## Setting up development workflow
31+
32+
Below tutorial uses Windows Subsystem for Linux (WSL) and VSCode. Linux/MacOS/BSD can ignore WSL specific steps.
33+
34+
1. Install Ubuntu on WSL from the [Ubuntu website](https://ubuntu.com/wsl) or the Microsoft Store
35+
2. Create a fork of the [Survey.jl repository](https://github.com/xKDR/Survey.jl). You will only be ever working on this fork, and submitting Pull Requests to the main repo.
36+
3. Copy the SSH link from your fork by clicking the green `<> Code` icon and then `SSH`.
37+
- You must already have SSH setup for this to work. If you don't, look up a tutorial on how to clone a github repository using SSH.
38+
4. Open a WSL terminal, and run :
39+
- `curl -fsSL https://install.julialang.org | sh`
40+
- `git clone [email protected]:your_username/Survey.jl.git` -- replace "*your_username**"
41+
- `julia`
42+
3. You are now in the Julia REPL, run :
43+
- `import Pkg; Pkg.add("Revise")`
44+
- `import Pkg; Pkg.add("Survey")`
45+
- `import Pkg; Pkg.add("Test")`
46+
- `] dev .`
47+
4. Open VSCode and install the following extensions :
48+
- WSL
49+
- Julia
50+
5. Go back to your WSL terminal, navigate to the folder of your repo, and run `code .` to open VSCode in that folder
51+
6. Create a `dev` folder (only if you want, it is gitignored by default), and a `test.jl` file in the file. Paste this block of code and save :
52+
53+
```julia
54+
using Revise, Survey, Test
55+
56+
@testset "ratio.jl" begin
57+
apiclus1 = load_data("apiclus1")
58+
dclus1 = SurveyDesign(apiclus1; clusters=:dnum, strata=:stype, weights=:pw)
59+
@test ratio(:api00, :enroll, dclus1).ratio[1] 1.17182 atol = 1e-4
60+
end
61+
```
62+
63+
9. In the WSL terminal (not Julia REPL), run `julia dev/test.jl`
64+
✅ If you get no errors, your setup is now complete !
65+
66+
You can keep working in the `dev` folder, which is .gitignored.
67+
Once you have working code and tests, you can move them to the appropriate folders, commit, push, and submit a Pull Request.
68+
Make sure to read the rest of this document so you can learn the best practices and guidelines for this project.
69+
1970
## Modifying an existing docstring in `src/`
2071

2172
All docstrings are written inline above the methods or types they are associated with and can
@@ -94,7 +145,7 @@ This way you are modifying as little as possible of previously written code, and
94145
* If you want to propose a new functionality it is strongly recommended to open an issue first and reach a decision on the final design.
95146
Then a pull request serves an implementation of the agreed way how things should work.
96147
* If you are a new contributor and would like to get a guidance on what area
97-
you could focus your first PR please do not hesitate to ask and JuliaData members
148+
you could focus your first PR please do not hesitate to ask community members
98149
will help you with picking a topic matching your experience.
99150
* Feel free to open, or comment on, an issue and solicit feedback early on,
100151
especially if you're unsure about aligning with design goals and direction,
@@ -104,22 +155,15 @@ This way you are modifying as little as possible of previously written code, and
104155
* Aim for atomic commits, if possible, e.g. `change 'foo' behavior like so` &
105156
`'bar' handles such and such corner case`,
106157
rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`.
107-
* Pull requests are tested against release and development branches of Julia,
108-
so using `Pkg.test("DataFrames")` as you develop can be helpful.
158+
* Pull requests are tested against release branches of Julia,
159+
so using `Pkg.test("Survey")` as you develop can be helpful.
109160
* The style guidelines outlined below are not the personal style of most contributors,
110161
but for consistency throughout the project, we've adopted them.
111-
* It is recommended to disable GitHub Actions on your fork; check Settings > Actions.
112162
* If a PR adds a new exported name then make sure to add a docstring for it and
113163
add a reference to it in the documentation.
114164
* A PR with breaking changes should have `[BREAKING]` as a first part of its name.
115-
* If a PR changes or adds functionality please update NEWS.md file accordingly as
116-
a part of the PR (along with the link to the PR); please do not add entries
117-
to NEWS.md for changes that are bug fixes or are not user visible, such as
118-
adding tests, updating documentation or improving code layout.
119-
* If you make a PR please try to avoid pushing many small commits to GitHub in
120-
a sequence as each such commit triggers a separate CI job, which takes over
121-
an hour. This has a consequence of making other PRs in packages from the JuliaData
122-
ecosystem wait for such CI jobs to finish as hey share a common pool of CI resources.
165+
* A PR which is still draft or work in progress should have `WIP:` as a first part of its name.
166+
* If you make a PR please try to avoid pushing many small commits to GitHub in a sequence as each such commit triggers a separate CI job, which takes compuational time, and not a good use of the small pool of CI resources.
123167

124168
## Style Guidelines
125169

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "Survey"
22
uuid = "c1a98b4d-6cd2-47ec-b9e9-69b59c35373c"
33
authors = ["Ayush Patnaik <[email protected]>"]
4-
version = "0.1.0"
4+
version = "0.2.0"
55

66
[deps]
77
AlgebraOfGraphics = "cbdf2221-f076-402e-a563-3d30da359d67"

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,8 @@ cluster: none
9999
popsize: [6190.0, 6190.0, 6190.0 6190.0]
100100
sampsize: [200, 200, 200 200]
101101
weights: [31.0, 31.0, 31.0 31.0]
102-
probs: [0.0323, 0.0323, 0.0323 0.0323]
102+
allprobs: [0.0323, 0.0323, 0.0323 0.0323]
103+
type: bootstrap
103104
replicates: 1000
104105

105106
julia> mean(:api00, bootsrs)

docs/Project.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
[deps]
2+
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
3+
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
24
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
3-
Survey = "c1a98b4d-6cd2-47ec-b9e9-69b59c35373c"
45
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
56
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
7+
Survey = "c1a98b4d-6cd2-47ec-b9e9-69b59c35373c"

docs/src/api.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ SurveyDesign
1414
ReplicateDesign
1515
load_data
1616
bootweights
17+
jackknifeweights
18+
jackknife_variance
1719
mean
1820
total
1921
quantile

src/Survey.jl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ include("boxplot.jl")
2525
include("show.jl")
2626
include("ratio.jl")
2727
include("by.jl")
28+
include("jackknife.jl")
2829

2930
export load_data
3031
export AbstractSurveyDesign, SurveyDesign, ReplicateDesign
@@ -35,5 +36,6 @@ export hist, sturges, freedman_diaconis
3536
export boxplot
3637
export bootweights
3738
export ratio
39+
export jackknifeweights, jackknife_variance
3840

3941
end

src/SurveyDesign.jl

Lines changed: 200 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,14 +126,117 @@ end
126126
"""
127127
ReplicateDesign <: AbstractSurveyDesign
128128
129-
Survey design obtained by replicating an original design using [`bootweights`](@ref).
129+
Survey design obtained by replicating an original design using [`bootweights`](@ref). If
130+
replicate weights are available, then they can be used to directly create a `ReplicateDesign`.
130131
131-
```jldoctest
132+
# Constructors
133+
134+
```julia
135+
ReplicateDesign(
136+
data::AbstractDataFrame,
137+
replicate_weights::Vector{Symbol};
138+
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
139+
strata::Union{Nothing,Symbol} = nothing,
140+
popsize::Union{Nothing,Symbol} = nothing,
141+
weights::Union{Nothing,Symbol} = nothing
142+
)
143+
144+
ReplicateDesign(
145+
data::AbstractDataFrame,
146+
replicate_weights::UnitIndex{Int};
147+
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
148+
strata::Union{Nothing,Symbol} = nothing,
149+
popsize::Union{Nothing,Symbol} = nothing,
150+
weights::Union{Nothing,Symbol} = nothing
151+
)
152+
153+
ReplicateDesign(
154+
data::AbstractDataFrame,
155+
replicate_weights::Regex;
156+
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
157+
strata::Union{Nothing,Symbol} = nothing,
158+
popsize::Union{Nothing,Symbol} = nothing,
159+
weights::Union{Nothing,Symbol} = nothing
160+
)
161+
```
162+
163+
# Arguments
164+
165+
The constructor has the same arguments as [`SurveyDesign`](@ref). The only additional argument is `replicate_weights`, which can
166+
be of one of the following types.
167+
168+
- `Vector{Symbol}`: In this case, each `Symbol` in the vector should represent a column of `data` containing the replicate weights.
169+
- `UnitIndex{Int}`: For instance, this could be UnitRange(5:10). This will mean that the replicate weights are contained in columns 5 through 10.
170+
- `Regex`: In this case, all the columns of `data` which match this `Regex` will be treated as the columns containing the replicate weights.
171+
172+
All the columns containing the replicate weights will be renamed to the form `replicate_i`, where `i` ranges from 1 to the number of columns containing the replicate weights.
173+
174+
# Examples
175+
176+
Here is an example where the [`bootweights`](@ref) function is used to create a `ReplicateDesign`.
177+
178+
```jldoctest replicate-design; setup = :(using Survey, CSV, DataFrames)
132179
julia> apistrat = load_data("apistrat");
133180
134181
julia> dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw);
135182
136-
julia> bootstrat = bootweights(dstrat; replicates=1000)
183+
julia> bootstrat = bootweights(dstrat; replicates=1000) # creating a ReplicateDesign using bootweights
184+
ReplicateDesign:
185+
data: 200×1044 DataFrame
186+
strata: stype
187+
[E, E, E … H]
188+
cluster: none
189+
popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0]
190+
sampsize: [100, 100, 100 … 50]
191+
weights: [44.21, 44.21, 44.21 … 15.1]
192+
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
193+
type: bootstrap
194+
replicates: 1000
195+
196+
```
197+
198+
If the replicate weights are given to us already, then we can directly pass them to the `ReplicateDesign` constructor. For instance, in
199+
the above example, suppose we had the `bootstrat` data as a CSV file (for this example, we also rename the columns containing the replicate weights to the form `r_i`).
200+
201+
```jldoctest replicate-design
202+
julia> using CSV;
203+
204+
julia> DataFrames.rename!(bootstrat.data, ["replicate_"*string(index) => "r_"*string(index) for index in 1:1000]);
205+
206+
julia> CSV.write("apistrat_withreplicates.csv", bootstrat.data);
207+
208+
```
209+
210+
We can now pass the replicate weights directly to the `ReplicateDesign` constructor, either as a `Vector{Symbol}`, a `UnitRange` or a `Regex`.
211+
212+
```jldoctest replicate-design
213+
julia> bootstrat_direct = ReplicateDesign(CSV.read("apistrat_withreplicates.csv", DataFrame), [Symbol("r_"*string(replicate)) for replicate in 1:1000]; strata=:stype, weights=:pw)
214+
ReplicateDesign:
215+
data: 200×1044 DataFrame
216+
strata: stype
217+
[E, E, E … H]
218+
cluster: none
219+
popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0]
220+
sampsize: [100, 100, 100 … 50]
221+
weights: [44.21, 44.21, 44.21 … 15.1]
222+
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
223+
type: bootstrap
224+
replicates: 1000
225+
226+
julia> bootstrat_unitrange = ReplicateDesign(CSV.read("apistrat_withreplicates.csv", DataFrame), UnitRange(45:1044);strata=:stype, weights=:pw)
227+
ReplicateDesign:
228+
data: 200×1044 DataFrame
229+
strata: stype
230+
[E, E, E … H]
231+
cluster: none
232+
popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0]
233+
sampsize: [100, 100, 100 … 50]
234+
weights: [44.21, 44.21, 44.21 … 15.1]
235+
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
236+
type: bootstrap
237+
replicates: 1000
238+
239+
julia> bootstrat_regex = ReplicateDesign(CSV.read("apistrat_withreplicates.csv", DataFrame), r"r_\\d";strata=:stype, weights=:pw)
137240
ReplicateDesign:
138241
data: 200×1044 DataFrame
139242
strata: stype
@@ -143,8 +246,11 @@ popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0]
143246
sampsize: [100, 100, 100 … 50]
144247
weights: [44.21, 44.21, 44.21 … 15.1]
145248
allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
249+
type: bootstrap
146250
replicates: 1000
251+
147252
```
253+
148254
"""
149255
struct ReplicateDesign <: AbstractSurveyDesign
150256
data::AbstractDataFrame
@@ -155,5 +261,96 @@ struct ReplicateDesign <: AbstractSurveyDesign
155261
weights::Symbol # Effective weights in case of singlestage approx supported
156262
allprobs::Symbol # Right now only singlestage approx supported
157263
pps::Bool
264+
type::String
158265
replicates::UInt
266+
replicate_weights::Vector{Symbol}
267+
268+
# default constructor
269+
function ReplicateDesign(
270+
data::DataFrame,
271+
cluster::Symbol,
272+
popsize::Symbol,
273+
sampsize::Symbol,
274+
strata::Symbol,
275+
weights::Symbol,
276+
allprobs::Symbol,
277+
pps::Bool,
278+
type::String,
279+
replicates::UInt,
280+
replicate_weights::Vector{Symbol}
281+
)
282+
new(data, cluster, popsize, sampsize, strata, weights, allprobs,
283+
pps, type, replicates, replicate_weights)
284+
end
285+
286+
# constructor with given replicate_weights
287+
function ReplicateDesign(
288+
data::AbstractDataFrame,
289+
replicate_weights::Vector{Symbol};
290+
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
291+
strata::Union{Nothing,Symbol} = nothing,
292+
popsize::Union{Nothing,Symbol} = nothing,
293+
weights::Union{Nothing,Symbol} = nothing
294+
)
295+
# rename the replicate weights if needed
296+
rename!(data, [replicate_weights[index] => "replicate_"*string(index) for index in 1:length(replicate_weights)])
297+
298+
# call the SurveyDesign constructor
299+
base_design = SurveyDesign(
300+
data;
301+
clusters=clusters,
302+
strata=strata,
303+
popsize=popsize,
304+
weights=weights
305+
)
306+
new(
307+
base_design.data,
308+
base_design.cluster,
309+
base_design.popsize,
310+
base_design.sampsize,
311+
base_design.strata,
312+
base_design.weights,
313+
base_design.allprobs,
314+
base_design.pps,
315+
"bootstrap",
316+
length(replicate_weights),
317+
replicate_weights
318+
)
319+
end
320+
321+
# replicate weights given as a range of columns
322+
ReplicateDesign(
323+
data::AbstractDataFrame,
324+
replicate_weights::UnitRange{Int};
325+
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
326+
strata::Union{Nothing,Symbol} = nothing,
327+
popsize::Union{Nothing,Symbol} = nothing,
328+
weights::Union{Nothing,Symbol} = nothing
329+
) =
330+
ReplicateDesign(
331+
data,
332+
Symbol.(names(data)[replicate_weights]);
333+
clusters=clusters,
334+
strata=strata,
335+
popsize=popsize,
336+
weights=weights
337+
)
338+
339+
# replicate weights given as regular expression
340+
ReplicateDesign(
341+
data::AbstractDataFrame,
342+
replicate_weights::Regex;
343+
clusters::Union{Nothing,Symbol,Vector{Symbol}} = nothing,
344+
strata::Union{Nothing,Symbol} = nothing,
345+
popsize::Union{Nothing,Symbol} = nothing,
346+
weights::Union{Nothing,Symbol} = nothing
347+
) =
348+
ReplicateDesign(
349+
data,
350+
Symbol.(names(data)[findall(name -> occursin(replicate_weights, name), names(data))]);
351+
clusters=clusters,
352+
strata=strata,
353+
popsize=popsize,
354+
weights=weights
355+
)
159356
end

0 commit comments

Comments
 (0)