Public Interface

BazerData Module

BazerData.panel_fill!Method
panel_fill!(
    df::DataFrame,
    id_var::Symbol, 
    time_var::Symbol, 
    value_var::Union{Symbol, Vector{Symbol}};
    gap::Union{Int, DatePeriod} = 1, 
    method::Symbol = :backwards, 
    uniquecheck::Bool = true,
    flag::Bool = false,
    merge::Bool = false
)

Arguments

  • df::AbstractDataFrame: a panel dataset
  • id_var::Symbol: the individual index dimension of the panel
  • time_var::Symbol: the time index dimension of the panel (must be integer or a date)
  • value_var::Union{Symbol, Vector{Symbol}}: the set of columns we would like to fill

Keywords

  • gap::Union{Int, DatePeriod} = 1 : the interval size for which we want to fill data
  • method::Symbol = :backwards: the interpolation method to fill the data options are: :backwards (default), :forwards, :linear, :nearest email me for other interpolations (anything from Interpolations.jl is possible)
  • uniquecheck::Bool = true: check if panel is clean
  • flag::Bool = false: flag the interpolated values

Returns

  • AbstractDataFrame:

Examples

  • See tests
source
BazerData.tabulateMethod
tabulate(df::AbstractDataFrame, cols::Union{Symbol, Array{Symbol}};
    reorder_cols=true, out::Symbol=:stdout)

This was forked from TexTables.jl and was inspired by https://github.com/matthieugomez/statar

Arguments

  • df::AbstractDataFrame: Input DataFrame to analyze
  • cols::Union{Symbol, Vector{Symbol}}: Single column name or vector of column names to tabulate
  • group_type::Union{Symbol, Vector{Symbol}}=:value: Specifies how to group each column:
    • :value: Group by the actual values in the column
    • :type: Group by the type of values in the column
    • Vector{Symbol}: Vector combining :value and :type for different columns
  • reorder_cols::Bool=true Whether to sort the output by sortable columns
  • format_tbl::Symbol=:long How to present the results long or wide (stata twoway)
  • format_stat::Symbol=:freq Which statistics to present for format :freq or :pct
  • skip_stat::Union{Nothing, Symbol, Vector{Symbol}}=nothing do not print out all statistics (only for string)
  • out::Symbol=:stdout Output format:
    • :stdout Print formatted table to standard output (returns nothing)
    • :df Return the result as a DataFrame
    • :string Return the formatted table as a string

Returns

  • Nothing if out=:stdout
  • DataFrame if out=:df
  • String if out=:string

Output Format

The resulting table contains the following columns:

  • Specified grouping columns (from cols)
  • freq: Frequency count
  • pct: Percentage of total
  • cum: Cumulative percentage

TO DO

allow user to specify order of columns (reorder = false flag)

Examples

See the README for more examples

# Simple frequency table for one column
tabulate(df, :country)

## Group by value type
tabulate(df, :age, group_type=:type)

# Multiple columns with mixed grouping
tabulate(df, [:country, :age], group_type=[:value, :type])

# Return as DataFrame instead of printing
result_df = tabulate(df, :country, out=:df)
source
BazerData.tlagMethod
tlag(x, t_vec; n = nothing, checksorted = true, verbose = false)

Create a lagged version of array x based on time vector t_vec, where each element is shifted backward in time by a specified amount n.

Arguments

  • x: Array of values to be lagged
  • t_vec: Vector of time points corresponding to each element in x

Keyword Arguments

  • n: Time gap for lagging. If nothing (default), uses the minimal unit difference between time points.
  • checksorted: If true (default), verifies that t_vec is sorted in ascending order
  • verbose: If true, prints informational messages about the process

Returns

  • An array of the same length as x where each element is the value of x from n time units ago, or missing if no corresponding past value exists

Notes

  • Time vectors must be strictly sorted (ascending order)
  • The time gap n must be positive
  • Uses linear scan to match time points
  • For Date types, no type checking is performed on n
  • Elements at the beginning will be missing if they don't have values from n time units ago
  • See PanelShift.jl for original implementation

Errors

  • If t_vec is not sorted and checksorted=true
  • If n is not positive
  • If x and t_vec have different lengths
  • If n has a type that doesn't match the difference type of t_vec

Examples

julia> tlag([1, 2, 3], [1, 2, 3], n = 1)
3-element Vector{Union{Missing, Int64}}:
  missing
 1
 2
source
BazerData.tleadMethod
tlead(x, t_vec; n = nothing, checksorted = true, verbose = false)

Create a leading version of array x based on time vector t_vec, where each element is shifted forward in time by a specified amount n.

Arguments

  • x: Array of values to be led
  • t_vec: Vector of time points corresponding to each element in x

Keyword Arguments

  • n: Time gap for leading. If nothing (default), uses the minimal unit difference between time points.
  • checksorted: If true (default), verifies that t_vec is sorted in ascending order
  • verbose: If true, prints informational messages about the process

Returns

  • An array of the same length as x where each element is the value of x from n time units in the future, or missing if no corresponding future value exists

Notes

  • Time vectors must be strictly sorted (ascending order)
  • The time gap n must be positive
  • Uses linear scan to match time points
  • For Date types, no type checking is performed on n
  • Elements at the end will be missing if they don't have values from n time units in the future
  • See PanelShift.jl for original implementation

Errors

  • If t_vec is not sorted and checksorted=true
  • If n is not positive
  • If x and t_vec have different lengths
  • If n has a type that doesn't match the difference type of t_vec

Examples

julia> tlead([1, 2, 3], [8, 9, 10], n = 1)
3-element Vector{Union{Missing, Int64}}:
 2
 3
  missing
source
BazerData.tshiftMethod
tshift(x, t_vec; n = nothing, kwargs...)

Create a shifted version of array x based on time vector t_vec, where each element is shifted by a specified amount n. Acts as a unified interface to tlag and tlead.

Arguments

  • x: Array of values to be shifted
  • t_vec: Vector of time points corresponding to each element in x

Keyword Arguments

  • n: Time gap for shifting. If positive, performs a lag operation (backward in time); if negative, performs a lead operation (forward in time). If nothing (default), defaults to a lag operation with minimal unit difference.
  • kwargs...: Additional keyword arguments passed to either tlag or tlead

Returns

  • An array of the same length as x where each element is the value of x shifted by n time units, or missing if no corresponding value exists at that time point

Notes

  • Positive n values call tlag (backward shift in time)
  • Negative n values call tlead (forward shift in time)
  • If n is not specified, issues a warning and defaults to a lag operation

Examples

julia> tshift([1, 2, 3], [-3, -2, -1], n = 1)
3-element Vector{Union{Missing, Int64}}:
  missing
 1
 2

julia> tshift([1, 2, 3], [-3, -2, -1], n = -1)
3-element Vector{Union{Missing, Int64}}:
 2
 3
  missing

See also: tlag, tlead

source
BazerData.winsorizeMethod
winsorize(
    x::AbstractVector; 
    probs::Union{Tuple{Real, Real}, Nothing} = nothing,
    cutpoints::Union{Tuple{Real, Real}, Nothing} = nothing,
    replace::Symbol = :missing
    verbose::Bool=false
)

Arguments

  • x::AbstractVector: a vector of values

Keywords

  • probs::Union{Tuple{Real, Real}, Nothing}: A vector of probabilities that can be used instead of cutpoints
  • cutpoints::Union{Tuple{Real, Real}, Nothing}: Cutpoints under and above which are defined outliers. Default is (median - five times interquartile range, median + five times interquartile range). Compared to bottom and top percentile, this takes into account the whole distribution of the vector
  • replace_value::Tuple: Values by which outliers are replaced. Default to cutpoints. A frequent alternative is missing.
  • IQR::Real: when inferring cutpoints what is the multiplier from the median for the interquartile range. (median ± IQR * (q75-q25))
  • verbose::Bool: printing level

Returns

  • AbstractVector: A vector the size of x with substituted values

Examples

  • See tests

This code is based on Matthieu Gomez winsorize function in the statar R package

source
BazerData.xtileMethod
xtile(data::Vector{T}, n_quantiles::Integer, 
             weights::Union{Vector{Float64}, Nothing}=nothing)::Vector{Int} where T <: Real

Create quantile groups using Julia's built-in weighted quantile functionality.

Arguments

  • data: Values to group
  • n_quantiles: Number of groups
  • weights: Optional weights of weight type (StatasBase)

Examples

sales = rand(10_000);
a = xtile(sales, 10);
b = xtile(sales, 10, weights=Weights(repeat([1], length(sales))) );
@assert a == b
source