Public Interface
BazerData Module
BazerData.panel_fill! — Methodpanel_fill!(
df::DataFrame,
id_var::Symbol,
time_var::Symbol,
value_var::Union{Symbol, Vector{Symbol}};
gap::Union{Int, DatePeriod} = 1,
method::Symbol = :backwards,
uniquecheck::Bool = true,
flag::Bool = false,
merge::Bool = false
)Arguments
df::AbstractDataFrame: a panel datasetid_var::Symbol: the individual index dimension of the paneltime_var::Symbol: the time index dimension of the panel (must be integer or a date)value_var::Union{Symbol, Vector{Symbol}}: the set of columns we would like to fill
Keywords
gap::Union{Int, DatePeriod} = 1: the interval size for which we want to fill datamethod::Symbol = :backwards: the interpolation method to fill the data options are::backwards(default),:forwards,:linear,:nearestemail me for other interpolations (anything from Interpolations.jl is possible)uniquecheck::Bool = true: check if panel is cleanflag::Bool = false: flag the interpolated values
Returns
AbstractDataFrame:
Examples
- See tests
BazerData.panel_fill — Methodpanel_fill(...)
Same as panel_fill but without modification in place in placeBazerData.tabulate — Methodtabulate(df::AbstractDataFrame, cols::Union{Symbol, Array{Symbol}};
reorder_cols=true, out::Symbol=:stdout)This was forked from TexTables.jl and was inspired by https://github.com/matthieugomez/statar
Arguments
df::AbstractDataFrame: Input DataFrame to analyzecols::Union{Symbol, Vector{Symbol}}: Single column name or vector of column names to tabulategroup_type::Union{Symbol, Vector{Symbol}}=:value: Specifies how to group each column::value: Group by the actual values in the column:type: Group by the type of values in the columnVector{Symbol}: Vector combining:valueand:typefor different columns
reorder_cols::Bool=trueWhether to sort the output by sortable columnsformat_tbl::Symbol=:longHow to present the results long or wide (stata twoway)format_stat::Symbol=:freqWhich statistics to present for format :freq or :pctskip_stat::Union{Nothing, Symbol, Vector{Symbol}}=nothingdo not print out all statistics (only for string)out::Symbol=:stdoutOutput format::stdoutPrint formatted table to standard output (returns nothing):dfReturn the result as a DataFrame:stringReturn the formatted table as a string
Returns
Nothingifout=:stdoutDataFrameifout=:dfStringifout=:string
Output Format
The resulting table contains the following columns:
- Specified grouping columns (from
cols) freq: Frequency countpct: Percentage of totalcum: Cumulative percentage
TO DO
allow user to specify order of columns (reorder = false flag)
Examples
See the README for more examples
# Simple frequency table for one column
tabulate(df, :country)
## Group by value type
tabulate(df, :age, group_type=:type)
# Multiple columns with mixed grouping
tabulate(df, [:country, :age], group_type=[:value, :type])
# Return as DataFrame instead of printing
result_df = tabulate(df, :country, out=:df)BazerData.tlag — Methodtlag(x, t_vec; n = nothing, checksorted = true, verbose = false)Create a lagged version of array x based on time vector t_vec, where each element is shifted backward in time by a specified amount n.
Arguments
x: Array of values to be laggedt_vec: Vector of time points corresponding to each element inx
Keyword Arguments
n: Time gap for lagging. Ifnothing(default), uses the minimal unit difference between time points.checksorted: Iftrue(default), verifies thatt_vecis sorted in ascending orderverbose: Iftrue, prints informational messages about the process
Returns
- An array of the same length as
xwhere each element is the value ofxfromntime units ago, ormissingif no corresponding past value exists
Notes
- Time vectors must be strictly sorted (ascending order)
- The time gap
nmust be positive - Uses linear scan to match time points
- For
Datetypes, no type checking is performed onn - Elements at the beginning will be
missingif they don't have values fromntime units ago - See PanelShift.jl for original implementation
Errors
- If
t_vecis not sorted andchecksorted=true - If
nis not positive - If
xandt_vechave different lengths - If
nhas a type that doesn't match the difference type oft_vec
Examples
julia> tlag([1, 2, 3], [1, 2, 3], n = 1)
3-element Vector{Union{Missing, Int64}}:
missing
1
2BazerData.tlead — Methodtlead(x, t_vec; n = nothing, checksorted = true, verbose = false)Create a leading version of array x based on time vector t_vec, where each element is shifted forward in time by a specified amount n.
Arguments
x: Array of values to be ledt_vec: Vector of time points corresponding to each element inx
Keyword Arguments
n: Time gap for leading. Ifnothing(default), uses the minimal unit difference between time points.checksorted: Iftrue(default), verifies thatt_vecis sorted in ascending orderverbose: Iftrue, prints informational messages about the process
Returns
- An array of the same length as
xwhere each element is the value ofxfromntime units in the future, ormissingif no corresponding future value exists
Notes
- Time vectors must be strictly sorted (ascending order)
- The time gap
nmust be positive - Uses linear scan to match time points
- For
Datetypes, no type checking is performed onn - Elements at the end will be
missingif they don't have values fromntime units in the future - See PanelShift.jl for original implementation
Errors
- If
t_vecis not sorted andchecksorted=true - If
nis not positive - If
xandt_vechave different lengths - If
nhas a type that doesn't match the difference type oft_vec
Examples
julia> tlead([1, 2, 3], [8, 9, 10], n = 1)
3-element Vector{Union{Missing, Int64}}:
2
3
missingBazerData.tshift — Methodtshift(x, t_vec; n = nothing, kwargs...)Create a shifted version of array x based on time vector t_vec, where each element is shifted by a specified amount n. Acts as a unified interface to tlag and tlead.
Arguments
x: Array of values to be shiftedt_vec: Vector of time points corresponding to each element inx
Keyword Arguments
n: Time gap for shifting. If positive, performs a lag operation (backward in time); if negative, performs a lead operation (forward in time). Ifnothing(default), defaults to a lag operation with minimal unit difference.kwargs...: Additional keyword arguments passed to eithertlagortlead
Returns
- An array of the same length as
xwhere each element is the value ofxshifted byntime units, ormissingif no corresponding value exists at that time point
Notes
- Positive
nvalues calltlag(backward shift in time) - Negative
nvalues calltlead(forward shift in time) - If
nis not specified, issues a warning and defaults to a lag operation
Examples
julia> tshift([1, 2, 3], [-3, -2, -1], n = 1)
3-element Vector{Union{Missing, Int64}}:
missing
1
2
julia> tshift([1, 2, 3], [-3, -2, -1], n = -1)
3-element Vector{Union{Missing, Int64}}:
2
3
missing
BazerData.winsorize — Methodwinsorize(
x::AbstractVector;
probs::Union{Tuple{Real, Real}, Nothing} = nothing,
cutpoints::Union{Tuple{Real, Real}, Nothing} = nothing,
replace::Symbol = :missing
verbose::Bool=false
)Arguments
x::AbstractVector: a vector of values
Keywords
probs::Union{Tuple{Real, Real}, Nothing}: A vector of probabilities that can be used instead of cutpointscutpoints::Union{Tuple{Real, Real}, Nothing}: Cutpoints under and above which are defined outliers. Default is (median - five times interquartile range, median + five times interquartile range). Compared to bottom and top percentile, this takes into account the whole distribution of the vectorreplace_value::Tuple: Values by which outliers are replaced. Default to cutpoints. A frequent alternative is missing.IQR::Real: when inferring cutpoints what is the multiplier from the median for the interquartile range. (median ± IQR * (q75-q25))verbose::Bool: printing level
Returns
AbstractVector: A vector the size of x with substituted values
Examples
- See tests
This code is based on Matthieu Gomez winsorize function in the statar R package
BazerData.xtile — Methodxtile(data::Vector{T}, n_quantiles::Integer,
weights::Union{Vector{Float64}, Nothing}=nothing)::Vector{Int} where T <: RealCreate quantile groups using Julia's built-in weighted quantile functionality.
Arguments
data: Values to groupn_quantiles: Number of groupsweights: Optional weights of weight type (StatasBase)
Examples
sales = rand(10_000);
a = xtile(sales, 10);
b = xtile(sales, 10, weights=Weights(repeat([1], length(sales))) );
@assert a == b