Public Interface
BazerData
Module
BazerData.panel_fill!
— Methodpanel_fill!(
df::DataFrame,
id_var::Symbol,
time_var::Symbol,
value_var::Union{Symbol, Vector{Symbol}};
gap::Union{Int, DatePeriod} = 1,
method::Symbol = :backwards,
uniquecheck::Bool = true,
flag::Bool = false,
merge::Bool = false
)
Arguments
df::AbstractDataFrame
: a panel datasetid_var::Symbol
: the individual index dimension of the paneltime_var::Symbol
: the time index dimension of the panel (must be integer or a date)value_var::Union{Symbol, Vector{Symbol}}
: the set of columns we would like to fill
Keywords
gap::Union{Int, DatePeriod} = 1
: the interval size for which we want to fill datamethod::Symbol = :backwards
: the interpolation method to fill the data options are::backwards
(default),:forwards
,:linear
,:nearest
email me for other interpolations (anything from Interpolations.jl is possible)uniquecheck::Bool = true
: check if panel is cleanflag::Bool = false
: flag the interpolated values
Returns
AbstractDataFrame
:
Examples
- See tests
BazerData.panel_fill
— Methodpanel_fill(...)
Same as panel_fill but without modification in place in place
BazerData.tabulate
— Methodtabulate(df::AbstractDataFrame, cols::Union{Symbol, Array{Symbol}};
reorder_cols=true, out::Symbol=:stdout)
This was forked from TexTables.jl and was inspired by https://github.com/matthieugomez/statar
Arguments
df::AbstractDataFrame
: Input DataFrame to analyzecols::Union{Symbol, Vector{Symbol}}
: Single column name or vector of column names to tabulategroup_type::Union{Symbol, Vector{Symbol}}=:value
: Specifies how to group each column::value
: Group by the actual values in the column:type
: Group by the type of values in the columnVector{Symbol}
: Vector combining:value
and:type
for different columns
reorder_cols::Bool=true
Whether to sort the output by sortable columnsformat_tbl::Symbol=:long
How to present the results long or wide (stata twoway)format_stat::Symbol=:freq
Which statistics to present for format :freq or :pctskip_stat::Union{Nothing, Symbol, Vector{Symbol}}=nothing
do not print out all statistics (only for string)out::Symbol=:stdout
Output format::stdout
Print formatted table to standard output (returns nothing):df
Return the result as a DataFrame:string
Return the formatted table as a string
Returns
Nothing
ifout=:stdout
DataFrame
ifout=:df
String
ifout=:string
Output Format
The resulting table contains the following columns:
- Specified grouping columns (from
cols
) freq
: Frequency countpct
: Percentage of totalcum
: Cumulative percentage
TO DO
allow user to specify order of columns (reorder = false flag)
Examples
See the README for more examples
# Simple frequency table for one column
tabulate(df, :country)
## Group by value type
tabulate(df, :age, group_type=:type)
# Multiple columns with mixed grouping
tabulate(df, [:country, :age], group_type=[:value, :type])
# Return as DataFrame instead of printing
result_df = tabulate(df, :country, out=:df)
BazerData.tlag
— Methodtlag(x, t_vec; n = nothing, checksorted = true, verbose = false)
Create a lagged version of array x
based on time vector t_vec
, where each element is shifted backward in time by a specified amount n
.
Arguments
x
: Array of values to be laggedt_vec
: Vector of time points corresponding to each element inx
Keyword Arguments
n
: Time gap for lagging. Ifnothing
(default), uses the minimal unit difference between time points.checksorted
: Iftrue
(default), verifies thatt_vec
is sorted in ascending orderverbose
: Iftrue
, prints informational messages about the process
Returns
- An array of the same length as
x
where each element is the value ofx
fromn
time units ago, ormissing
if no corresponding past value exists
Notes
- Time vectors must be strictly sorted (ascending order)
- The time gap
n
must be positive - Uses linear scan to match time points
- For
Date
types, no type checking is performed onn
- Elements at the beginning will be
missing
if they don't have values fromn
time units ago - See PanelShift.jl for original implementation
Errors
- If
t_vec
is not sorted andchecksorted=true
- If
n
is not positive - If
x
andt_vec
have different lengths - If
n
has a type that doesn't match the difference type oft_vec
Examples
julia> tlag([1, 2, 3], [1, 2, 3], n = 1)
3-element Vector{Union{Missing, Int64}}:
missing
1
2
BazerData.tlead
— Methodtlead(x, t_vec; n = nothing, checksorted = true, verbose = false)
Create a leading version of array x
based on time vector t_vec
, where each element is shifted forward in time by a specified amount n
.
Arguments
x
: Array of values to be ledt_vec
: Vector of time points corresponding to each element inx
Keyword Arguments
n
: Time gap for leading. Ifnothing
(default), uses the minimal unit difference between time points.checksorted
: Iftrue
(default), verifies thatt_vec
is sorted in ascending orderverbose
: Iftrue
, prints informational messages about the process
Returns
- An array of the same length as
x
where each element is the value ofx
fromn
time units in the future, ormissing
if no corresponding future value exists
Notes
- Time vectors must be strictly sorted (ascending order)
- The time gap
n
must be positive - Uses linear scan to match time points
- For
Date
types, no type checking is performed onn
- Elements at the end will be
missing
if they don't have values fromn
time units in the future - See PanelShift.jl for original implementation
Errors
- If
t_vec
is not sorted andchecksorted=true
- If
n
is not positive - If
x
andt_vec
have different lengths - If
n
has a type that doesn't match the difference type oft_vec
Examples
julia> tlead([1, 2, 3], [8, 9, 10], n = 1)
3-element Vector{Union{Missing, Int64}}:
2
3
missing
BazerData.tshift
— Methodtshift(x, t_vec; n = nothing, kwargs...)
Create a shifted version of array x
based on time vector t_vec
, where each element is shifted by a specified amount n
. Acts as a unified interface to tlag
and tlead
.
Arguments
x
: Array of values to be shiftedt_vec
: Vector of time points corresponding to each element inx
Keyword Arguments
n
: Time gap for shifting. If positive, performs a lag operation (backward in time); if negative, performs a lead operation (forward in time). Ifnothing
(default), defaults to a lag operation with minimal unit difference.kwargs...
: Additional keyword arguments passed to eithertlag
ortlead
Returns
- An array of the same length as
x
where each element is the value ofx
shifted byn
time units, ormissing
if no corresponding value exists at that time point
Notes
- Positive
n
values calltlag
(backward shift in time) - Negative
n
values calltlead
(forward shift in time) - If
n
is not specified, issues a warning and defaults to a lag operation
Examples
julia> tshift([1, 2, 3], [-3, -2, -1], n = 1)
3-element Vector{Union{Missing, Int64}}:
missing
1
2
julia> tshift([1, 2, 3], [-3, -2, -1], n = -1)
3-element Vector{Union{Missing, Int64}}:
2
3
missing
BazerData.winsorize
— Methodwinsorize(
x::AbstractVector;
probs::Union{Tuple{Real, Real}, Nothing} = nothing,
cutpoints::Union{Tuple{Real, Real}, Nothing} = nothing,
replace::Symbol = :missing
verbose::Bool=false
)
Arguments
x::AbstractVector
: a vector of values
Keywords
probs::Union{Tuple{Real, Real}, Nothing}
: A vector of probabilities that can be used instead of cutpointscutpoints::Union{Tuple{Real, Real}, Nothing}
: Cutpoints under and above which are defined outliers. Default is (median - five times interquartile range, median + five times interquartile range). Compared to bottom and top percentile, this takes into account the whole distribution of the vectorreplace_value::Tuple
: Values by which outliers are replaced. Default to cutpoints. A frequent alternative is missing.IQR::Real
: when inferring cutpoints what is the multiplier from the median for the interquartile range. (median ± IQR * (q75-q25))verbose::Bool
: printing level
Returns
AbstractVector
: A vector the size of x with substituted values
Examples
- See tests
This code is based on Matthieu Gomez winsorize function in the statar
R package
BazerData.xtile
— Methodxtile(data::Vector{T}, n_quantiles::Integer,
weights::Union{Vector{Float64}, Nothing}=nothing)::Vector{Int} where T <: Real
Create quantile groups using Julia's built-in weighted quantile functionality.
Arguments
data
: Values to groupn_quantiles
: Number of groupsweights
: Optional weights of weight type (StatasBase)
Examples
sales = rand(10_000);
a = xtile(sales, 10);
b = xtile(sales, 10, weights=Weights(repeat([1], length(sales))) );
@assert a == b