Package: sparklyr 1.8.6.9001

Edgar Ruiz

sparklyr: R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Authors:Javier Luraschi [aut], Kevin Kuo [aut], Kevin Ushey [aut], JJ Allaire [aut], Samuel Macedo [ctb], Hossein Falaki [aut], Lu Wang [aut], Andy Zhang [aut], Yitao Li [aut], Jozef Hajnala [ctb], Maciej Szymkiewicz [ctb], Wil Davis [ctb], Edgar Ruiz [aut, cre], RStudio [cph], The Apache Software Foundation [aut, cph]

sparklyr_1.8.6.9001.tar.gz
sparklyr_1.8.6.9001.zip(r-4.5)sparklyr_1.8.6.9001.zip(r-4.4)sparklyr_1.8.6.9001.zip(r-4.3)
sparklyr_1.8.6.9001.tgz(r-4.4-any)sparklyr_1.8.6.9001.tgz(r-4.3-any)
sparklyr_1.8.6.9001.tar.gz(r-4.5-noble)sparklyr_1.8.6.9001.tar.gz(r-4.4-noble)
sparklyr_1.8.6.9001.tgz(r-4.4-emscripten)sparklyr_1.8.6.9001.tgz(r-4.3-emscripten)
sparklyr.pdf |sparklyr.html
sparklyr/json (API)
NEWS

# Install 'sparklyr' in R:
install.packages('sparklyr', repos = c('https://sparklyr.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/sparklyr/sparklyr/issues

On CRAN:

apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr

414 exports 946 stars 9.57 score 38 dependencies 21 dependents 4 mentions 3.6k scripts 118.1k downloads

Last updated 7 days agofrom:f3bae8d0d1. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 10 2024
R-4.5-winOKSep 10 2024
R-4.5-linuxOKSep 10 2024
R-4.4-winOKSep 10 2024
R-4.4-macOKSep 10 2024
R-4.3-winOKSep 10 2024
R-4.3-macOKSep 10 2024

Exports:%->%%>%arrow_enabled_objectaugmentcollectcollect_from_rdscompile_package_jarsconnection_configconnection_is_openconnection_spark_shinyappcopy_todistinctdownload_scalacfillfilterfind_scalacft_binarizerft_bucketed_random_projection_lshft_bucketizerft_chisq_selectorft_count_vectorizerft_dctft_discrete_cosine_transformft_dplyr_transformerft_elementwise_productft_feature_hasherft_hashing_tfft_idfft_imputerft_index_to_stringft_interactionft_max_abs_scalerft_min_max_scalerft_minhash_lshft_ngramft_normalizerft_one_hot_encoderft_one_hot_encoder_estimatorft_pcaft_polynomial_expansionft_quantile_discretizerft_r_formulaft_regex_tokenizerft_robust_scalerft_sql_transformerft_standard_scalerft_stop_words_removerft_string_indexerft_string_indexer_modelft_tokenizerft_vector_assemblerft_vector_indexerft_vector_slicerft_word2vecfull_joinget_spark_sql_catalog_implementationglancehive_contexthive_context_confighof_aggregatehof_array_sorthof_existshof_filterhof_forallhof_map_filterhof_map_zip_withhof_transformhof_transform_keyshof_transform_valueshof_zip_withinner_joininvokeinvoke_methodinvoke_newinvoke_staticis_ml_estimatoris_ml_transformerj_invokej_invoke_methodj_invoke_newj_invoke_staticjarrayjava_contextjfloatjfloat_arrayjobj_classjobj_set_paramleft_joinlivy_available_versionslivy_configlivy_home_dirlivy_installlivy_install_dirlivy_installed_versionslivy_service_startlivy_service_stopml_add_stageml_aft_survival_regressionml_alsml_approx_nearest_neighborsml_approx_similarity_joinml_association_rulesml_binary_classification_evalml_binary_classification_evaluatorml_bisecting_kmeansml_call_constructorml_chisquare_testml_classification_evalml_clustering_evaluatorml_clustering_pipelineml_compute_costml_compute_silhouette_measureml_construct_model_clusteringml_construct_model_supervisedml_corrml_cross_validatorml_decision_treeml_decision_tree_classifierml_decision_tree_regressorml_default_stop_wordsml_describe_topicsml_evaluateml_feature_importancesml_find_synonymsml_fitml_fit_and_transformml_fpgrowthml_freq_itemsetsml_freq_seq_patternsml_gaussian_mixtureml_gbt_classifierml_gbt_regressorml_generalized_linear_regressionml_gradient_boosted_treesml_is_setml_isotonic_regressionml_kmeansml_labelsml_ldaml_linear_regressionml_linear_svcml_loadml_log_likelihoodml_log_perplexityml_logistic_regressionml_metrics_binaryml_metrics_multiclassml_metrics_regressionml_model_dataml_multiclass_classification_evaluatorml_multilayer_perceptronml_multilayer_perceptron_classifierml_naive_bayesml_one_vs_restml_paramml_param_mapml_paramsml_pcaml_pipelineml_power_iterationml_predictml_prefixspanml_random_forestml_random_forest_classifierml_random_forest_regressorml_recommendml_regression_evaluatorml_saveml_stageml_stagesml_standardize_formulaml_sub_modelsml_summaryml_supervised_pipelineml_survival_regressionml_topics_matrixml_train_validation_splitml_transformml_tree_feature_importanceml_uidml_validation_metricsml_vocabularymutatena.replacenestnew_ml_classification_modelnew_ml_classifiernew_ml_clustering_modelnew_ml_estimatornew_ml_modelnew_ml_model_classificationnew_ml_model_clusteringnew_ml_model_predictionnew_ml_model_regressionnew_ml_prediction_modelnew_ml_predictornew_ml_probabilistic_classification_modelnew_ml_probabilistic_classifiernew_ml_transformerpivot_longerpivot_widerprint_jobjquote_sql_namerandom_stringreactiveSparkregister_extensionregisterDoSparkregistered_extensionsreplace_naright_joinsdf_alongsdf_bind_colssdf_bind_rowssdf_broadcastsdf_checkpointsdf_coalescesdf_collectsdf_copy_tosdf_crosstabsdf_debug_stringsdf_describesdf_dimsdf_distinctsdf_drop_duplicatessdf_expand_gridsdf_fitsdf_fit_and_transformsdf_from_avrosdf_importsdf_is_streamingsdf_last_indexsdf_lensdf_load_parquetsdf_load_tablesdf_ncolsdf_nrowsdf_num_partitionssdf_partitionsdf_partition_sizessdf_persistsdf_pivotsdf_predictsdf_projectsdf_quantilesdf_random_splitsdf_rbetasdf_rbinomsdf_rcauchysdf_rchisqsdf_read_columnsdf_registersdf_repartitionsdf_residualssdf_rexpsdf_rgammasdf_rgeomsdf_rhypersdf_rlnormsdf_rnormsdf_rpoissdf_rtsdf_runifsdf_rweibullsdf_samplesdf_save_parquetsdf_save_tablesdf_schemasdf_separate_columnsdf_seqsdf_sortsdf_sqlsdf_to_avrosdf_transformsdf_unnest_longersdf_unnest_widersdf_weighted_samplesdf_with_sequential_idsdf_with_unique_idselectseparatespark_adaptive_query_executionspark_advisory_shuffle_partition_sizespark_applyspark_apply_bundlespark_apply_logspark_auto_broadcast_join_thresholdspark_available_versionsspark_coalesce_initial_num_partitionsspark_coalesce_min_num_partitionsspark_coalesce_shuffle_partitionsspark_compilation_specspark_compilespark_configspark_config_existsspark_config_kubernetesspark_config_packagesspark_config_settingsspark_config_valuespark_connectspark_connect_methodspark_connectionspark_connection_findspark_connection_is_openspark_contextspark_context_configspark_dataframespark_default_compilation_specspark_default_versionspark_dependencyspark_dependency_fallbackspark_disconnectspark_disconnect_allspark_extensionspark_get_checkpoint_dirspark_get_javaspark_home_dirspark_home_setspark_ide_columnsspark_ide_connection_actionsspark_ide_connection_closedspark_ide_connection_openspark_ide_connection_updatedspark_ide_objectsspark_ide_previewspark_insert_tablespark_installspark_install_dirspark_install_findspark_install_tarspark_installed_versionsspark_integ_test_skipspark_jobjspark_last_errorspark_load_tablespark_logspark_pipeline_stagespark_readspark_read_avrospark_read_binaryspark_read_csvspark_read_deltaspark_read_imagespark_read_jdbcspark_read_jsonspark_read_libsvmspark_read_orcspark_read_parquetspark_read_sourcespark_read_tablespark_read_textspark_save_tablespark_sessionspark_session_configspark_set_checkpoint_dirspark_submitspark_table_namespark_uninstallspark_versionspark_version_from_homespark_versionsspark_webspark_writespark_write_avrospark_write_csvspark_write_deltaspark_write_jdbcspark_write_jsonspark_write_orcspark_write_parquetspark_write_rdsspark_write_sourcespark_write_tablespark_write_textsparklyr_get_backend_portsrc_databasesstream_findstream_generate_teststream_idstream_lagstream_namestream_read_cloudfilesstream_read_csvstream_read_deltastream_read_jsonstream_read_kafkastream_read_orcstream_read_parquetstream_read_socketstream_read_tablestream_read_textstream_renderstream_statsstream_stopstream_trigger_continuousstream_trigger_intervalstream_viewstream_watermarkstream_write_consolestream_write_csvstream_write_deltastream_write_jsonstream_write_kafkastream_write_memorystream_write_orcstream_write_parquetstream_write_tablestream_write_texttbl_cachetbl_change_dbtbl_uncachetidyuniteunnestworker_spark_apply_unbundle

Dependencies:askpassblobclicodetoolsconfigcpp11curlDBIdbplyrdplyrfansigenericsglobalsgluehttrjsonlitelifecyclemagrittrmimeopensslpillarpkgconfigpurrrR6rlangrstudioapistringistringrsystibbletidyrtidyselectutf8uuidvctrswithrxml2yaml

Readme and manuals

Help Manual

Help pageTopics
Subsetting operator for Spark dataframe[.tbl_spark
Infix operator for composing a lambda expression%->%
Set/Get Spark checkpoint directorycheckpoint_directory spark_get_checkpoint_dir spark_set_checkpoint_dir
Collect Spark data serialized in RDS format into Rcollect_from_rds
Compile Scala sources into a Java Archive (jar)compile_package_jars
Read configuration values for a connectionconnection_config
Copy an R Data Frame to Sparkcopy_to.spark_connection
Distinctdistinct
Downloads default Scala Compilersdownload_scalac
dplyr wrappers for Apache Spark higher order functionsdplyr_hof
Enforce Specific Structure for R Objectsensure
Fillfill
Filterfilter
Discover the Scala Compilerfind_scalac
Feature Transformation - Binarizer (Transformer)ft_binarizer
Feature Transformation - Bucketizer (Transformer)ft_bucketizer
Feature Transformation - ChiSqSelector (Estimator)ft_chisq_selector
Feature Transformation - CountVectorizer (Estimator)ft_count_vectorizer ml_vocabulary
Feature Transformation - Discrete Cosine Transform (DCT) (Transformer)ft_dct ft_discrete_cosine_transform
Feature Transformation - ElementwiseProduct (Transformer)ft_elementwise_product
Feature Transformation - FeatureHasher (Transformer)ft_feature_hasher
Feature Transformation - HashingTF (Transformer)ft_hashing_tf
Feature Transformation - IDF (Estimator)ft_idf
Feature Transformation - Imputer (Estimator)ft_imputer
Feature Transformation - IndexToString (Transformer)ft_index_to_string
Feature Transformation - Interaction (Transformer)ft_interaction
Feature Transformation - LSH (Estimator)ft_bucketed_random_projection_lsh ft_lsh ft_minhash_lsh
Utility functions for LSH modelsft_lsh_utils ml_approx_nearest_neighbors ml_approx_similarity_join
Feature Transformation - MaxAbsScaler (Estimator)ft_max_abs_scaler
Feature Transformation - MinMaxScaler (Estimator)ft_min_max_scaler
Feature Transformation - NGram (Transformer)ft_ngram
Feature Transformation - Normalizer (Transformer)ft_normalizer
Feature Transformation - OneHotEncoder (Transformer)ft_one_hot_encoder
Feature Transformation - OneHotEncoderEstimator (Estimator)ft_one_hot_encoder_estimator
Feature Transformation - PCA (Estimator)ft_pca ml_pca
Feature Transformation - PolynomialExpansion (Transformer)ft_polynomial_expansion
Feature Transformation - QuantileDiscretizer (Estimator)ft_quantile_discretizer
Feature Transformation - RFormula (Estimator)ft_r_formula
Feature Transformation - RegexTokenizer (Transformer)ft_regex_tokenizer
Feature Transformation - RobustScaler (Estimator)ft_robust_scaler
Feature Transformation - SQLTransformerft_dplyr_transformer ft_sql_transformer
Feature Transformation - StandardScaler (Estimator)ft_standard_scaler
Feature Transformation - StopWordsRemover (Transformer)ft_stop_words_remover
Feature Transformation - StringIndexer (Estimator)ft_string_indexer ft_string_indexer_model ml_labels
Feature Transformation - Tokenizer (Transformer)ft_tokenizer
Feature Transformation - VectorAssembler (Transformer)ft_vector_assembler
Feature Transformation - VectorIndexer (Estimator)ft_vector_indexer
Feature Transformation - VectorSlicer (Transformer)ft_vector_slicer
Feature Transformation - Word2Vec (Estimator)ft_word2vec ml_find_synonyms
Full joinfull_join
Generic Call Interfacegeneric_call_interface
Retrieve the Spark connection's SQL catalog implementation propertyget_spark_sql_catalog_implementation
Runtime configuration interface for Hivehive_context_config
Apply Aggregate Function to Array Columnhof_aggregate
Sorts array using a custom comparatorhof_array_sort
Determine Whether Some Element Exists in an Array Columnhof_exists
Filter Array Columnhof_filter
Checks whether all elements in an array satisfy a predicatehof_forall
Filters a maphof_map_filter
Merges two maps into onehof_map_zip_with
Transform Array Columnhof_transform
Transforms keys of a maphof_transform_keys
Transforms values of a maphof_transform_values
Combines 2 Array Columnshof_zip_with
Inner joininner_join
Invoke a Method on a JVM Objectinvoke invoke_new invoke_static
Invoke a Java function.j_invoke j_invoke_new j_invoke_static
Instantiate a Java array with a specific element type.jarray
Instantiate a Java float type.jfloat
Instantiate an Array[Float].jfloat_array
Join Spark tbls.full_join.tbl_spark inner_join.tbl_spark join.tbl_spark left_join.tbl_spark right_join.tbl_spark
Left joinleft_join
list all sparklyr-*.jar files that have been builtlist_sparklyr_jars
Create a Spark Configuration for Livylivy_config
Start Livylivy_service_start livy_service_stop
Spark ML - Survival Regressionml_aft_survival_regression ml_survival_regression
Spark ML - ALSml_als ml_recommend
Tidying methods for Spark ML ALSaugment.ml_model_als glance.ml_model_als ml_als_tidiers tidy.ml_model_als
Spark ML - Bisecting K-Means Clusteringml_bisecting_kmeans
Chi-square hypothesis testing for categorical data.ml_chisquare_test
Spark ML - Clustering Evaluatorml_clustering_evaluator
Compute correlation matrixml_corr
Spark ML - Decision Treesml_decision_tree ml_decision_tree_classifier ml_decision_tree_regressor
Default stop wordsml_default_stop_words
Evaluate the Model on a Validation Setml_evaluate ml_evaluate.ml_evaluator ml_evaluate.ml_generalized_linear_regression_model ml_evaluate.ml_linear_regression_model ml_evaluate.ml_logistic_regression_model ml_evaluate.ml_model_classification ml_evaluate.ml_model_clustering ml_evaluate.ml_model_generalized_linear_regression ml_evaluate.ml_model_linear_regression ml_evaluate.ml_model_logistic_regression
Spark ML - Evaluatorsml_binary_classification_eval ml_binary_classification_evaluator ml_classification_eval ml_evaluator ml_multiclass_classification_evaluator ml_regression_evaluator
Spark ML - Feature Importance for Tree Modelsml_feature_importances ml_tree_feature_importance
Frequent Pattern Mining - FPGrowthml_association_rules ml_fpgrowth ml_freq_itemsets
Spark ML - Gaussian Mixture clustering.ml_gaussian_mixture
Spark ML - Gradient Boosted Treesml_gbt_classifier ml_gbt_regressor ml_gradient_boosted_trees
Spark ML - Generalized Linear Regressionml_generalized_linear_regression
Tidying methods for Spark ML linear modelsaugment.ml_model_generalized_linear_regression augment.ml_model_linear_regression augment._ml_model_linear_regression glance.ml_model_generalized_linear_regression glance.ml_model_linear_regression ml_glm_tidiers tidy.ml_model_generalized_linear_regression tidy.ml_model_linear_regression
Spark ML - Isotonic Regressionml_isotonic_regression
Tidying methods for Spark ML Isotonic Regressionaugment.ml_model_isotonic_regression glance.ml_model_isotonic_regression ml_isotonic_regression_tidiers tidy.ml_model_isotonic_regression
Spark ML - K-Means Clusteringml_compute_cost ml_compute_silhouette_measure ml_kmeans
Evaluate a K-mean clusteringml_kmeans_cluster_eval
Spark ML - Latent Dirichlet Allocationml_describe_topics ml_lda ml_log_likelihood ml_log_perplexity ml_topics_matrix
Tidying methods for Spark ML LDA modelsaugment.ml_model_lda glance.ml_model_lda ml_lda_tidiers tidy.ml_model_lda
Spark ML - Linear Regressionml_linear_regression
Spark ML - LinearSVCml_linear_svc
Tidying methods for Spark ML linear svcaugment.ml_model_linear_svc glance.ml_model_linear_svc ml_linear_svc_tidiers tidy.ml_model_linear_svc
Spark ML - Logistic Regressionml_logistic_regression
Tidying methods for Spark ML Logistic Regressionaugment.ml_model_logistic_regression augment._ml_model_logistic_regression glance.ml_model_logistic_regression ml_logistic_regression_tidiers tidy.ml_model_logistic_regression
Extracts metrics from a fitted tableml_metrics_binary
Extracts metrics from a fitted tableml_metrics_multiclass
Extracts metrics from a fitted tableml_metrics_regression
Extracts data associated with a Spark ML modelml_model_data
Spark ML - Multilayer Perceptronml_multilayer_perceptron ml_multilayer_perceptron_classifier
Tidying methods for Spark ML MLPaugment.ml_model_multilayer_perceptron_classification glance.ml_model_multilayer_perceptron_classification ml_multilayer_perceptron_tidiers tidy.ml_model_multilayer_perceptron_classification
Spark ML - Naive-Bayesml_naive_bayes
Tidying methods for Spark ML Naive Bayesaugment.ml_model_naive_bayes glance.ml_model_naive_bayes ml_naive_bayes_tidiers tidy.ml_model_naive_bayes
Spark ML - OneVsRestml_one_vs_rest
Tidying methods for Spark ML Principal Component Analysisaugment.ml_model_pca glance.ml_model_pca ml_pca_tidiers tidy.ml_model_pca
Spark ML - Pipelinesml_pipeline
Spark ML - Power Iteration Clusteringml_power_iteration
Frequent Pattern Mining - PrefixSpanml_freq_seq_patterns ml_prefixspan
Spark ML - Random Forestml_random_forest ml_random_forest_classifier ml_random_forest_regressor
Spark ML - Pipeline stage extractionml_stage ml_stages
Spark ML - Extraction of summary metricsml_summary
Tidying methods for Spark ML Survival Regressionaugment.ml_model_aft_survival_regression glance.ml_model_aft_survival_regression ml_survival_regression_tidiers tidy.ml_model_aft_survival_regression
Tidying methods for Spark ML tree modelsaugment.ml_model_decision_tree_classification augment.ml_model_decision_tree_regression augment.ml_model_gbt_classification augment.ml_model_gbt_regression augment.ml_model_random_forest_classification augment.ml_model_random_forest_regression augment._ml_model_decision_tree_classification augment._ml_model_decision_tree_regression augment._ml_model_gbt_classification augment._ml_model_gbt_regression augment._ml_model_random_forest_classification augment._ml_model_random_forest_regression glance.ml_model_decision_tree_classification glance.ml_model_decision_tree_regression glance.ml_model_gbt_classification glance.ml_model_gbt_regression glance.ml_model_random_forest_classification glance.ml_model_random_forest_regression ml_tree_tidiers tidy.ml_model_decision_tree_classification tidy.ml_model_decision_tree_regression tidy.ml_model_gbt_classification tidy.ml_model_gbt_regression tidy.ml_model_random_forest_classification tidy.ml_model_random_forest_regression
Spark ML - UIDml_uid
Tidying methods for Spark ML unsupervised modelsaugment.ml_model_bisecting_kmeans augment.ml_model_gaussian_mixture augment.ml_model_kmeans glance.ml_model_bisecting_kmeans glance.ml_model_gaussian_mixture glance.ml_model_kmeans ml_unsupervised_tidiers tidy.ml_model_bisecting_kmeans tidy.ml_model_gaussian_mixture tidy.ml_model_kmeans
Spark ML - ML Paramsml-params ml_is_set ml_param ml_params ml_param_map
Spark ML - Model Persistenceml-persistence ml_load ml_save ml_save.ml_model
Spark ML - Transform, fit, and predict methods (ml_ interface)is_ml_estimator is_ml_transformer ml-transform-methods ml_fit ml_fit.default ml_fit_and_transform ml_predict ml_predict.ml_model_classification ml_transform
Spark ML - Tuningml-tuning ml_cross_validator ml_sub_models ml_train_validation_split ml_validation_metrics
Mutatemutate
Replace Missing Values in Objectsna.replace
Nestnest
Pivot longerpivot_longer
Pivot widerpivot_wider
Random string generationrandom_string
Reactive spark readerreactiveSpark
Register a Package that Implements a Spark Extensionregistered_extensions register_extension
Register a Parallel BackendregisterDoSpark
Replace NAreplace_na
Right joinright_join
Create DataFrame for along Objectsdf_along
Bind multiple Spark DataFrames by row and columnsdf_bind sdf_bind_cols sdf_bind_rows
Broadcast hintsdf_broadcast
Checkpoint a Spark DataFramesdf_checkpoint
Coalesces a Spark DataFramesdf_coalesce
Collect a Spark DataFrame into R.sdf_collect
Copy an Object into Sparksdf_copy_to sdf_import
Cross Tabulationsdf_crosstab
Debug Info for Spark DataFramesdf_debug_string
Compute summary statistics for columns of a data framesdf_describe
Support for Dimension Operationssdf_dim sdf_ncol sdf_nrow
Invoke distinct on a Spark DataFramesdf_distinct
Remove duplicates from a Spark DataFramesdf_drop_duplicates
Create a Spark dataframe containing all combinations of inputssdf_expand_grid
Convert column(s) from avro formatsdf_from_avro
Spark DataFrame is Streamingsdf_is_streaming
Returns the last index of a Spark DataFramesdf_last_index
Create DataFrame for Lengthsdf_len
Gets number of partitions of a Spark DataFramesdf_num_partitions
Compute the number of records within each partition of a Spark DataFramesdf_partition_sizes
Persist a Spark DataFramesdf_persist
Pivot a Spark DataFramesdf_pivot
Project features onto principal componentssdf_project
Compute (Approximate) Quantiles with a Spark DataFramesdf_quantile
Partition a Spark Dataframesdf_partition sdf_random_split
Generate random samples from a Beta distributionsdf_rbeta
Generate random samples from a binomial distributionsdf_rbinom
Generate random samples from a Cauchy distributionsdf_rcauchy
Generate random samples from a chi-squared distributionsdf_rchisq
Read a Column from a Spark DataFramesdf_read_column
Register a Spark DataFramesdf_register
Repartition a Spark DataFramesdf_repartition
Model Residualssdf_residuals sdf_residuals.ml_model_generalized_linear_regression sdf_residuals.ml_model_linear_regression
Generate random samples from an exponential distributionsdf_rexp
Generate random samples from a Gamma distributionsdf_rgamma
Generate random samples from a geometric distributionsdf_rgeom
Generate random samples from a hypergeometric distributionsdf_rhyper
Generate random samples from a log normal distributionsdf_rlnorm
Generate random samples from the standard normal distributionsdf_rnorm
Generate random samples from a Poisson distributionsdf_rpois
Generate random samples from a t-distributionsdf_rt
Generate random samples from the uniform distribution U(0, 1).sdf_runif
Generate random samples from a Weibull distribution.sdf_rweibull
Randomly Sample Rows from a Spark DataFramesdf_sample
Read the Schema of a Spark DataFramesdf_schema
Separate a Vector Column into Scalar Columnssdf_separate_column
Create DataFrame for Rangesdf_seq
Sort a Spark DataFramesdf_sort
Spark DataFrame from SQLsdf_sql
Convert column(s) to avro formatsdf_to_avro
Unnest longersdf_unnest_longer
Unnest widersdf_unnest_wider
Perform Weighted Random Sampling on a Spark DataFramesdf_weighted_sample
Add a Sequential ID Column to a Spark DataFramesdf_with_sequential_id
Add a Unique ID Column to a Spark DataFramesdf_with_unique_id
Save / Load a Spark DataFramesdf-saveload sdf_load_parquet sdf_load_table sdf_save_parquet sdf_save_table
Spark ML - Transform, fit, and predict methods (sdf_ interface)sdf-transform-methods sdf_fit sdf_fit_and_transform sdf_predict sdf_transform
Selectselect
Separateseparate
Retrieves or sets status of Spark AQEspark_adaptive_query_execution
Retrieves or sets advisory size of the shuffle partitionspark_advisory_shuffle_partition_size
Apply an R Function in Sparkspark_apply
Create Bundle for Spark Applyspark_apply_bundle
Log Writer for Spark Applyspark_apply_log
Retrieves or sets the auto broadcast join thresholdspark_auto_broadcast_join_threshold
Retrieves or sets initial number of shuffle partitions before coalescingspark_coalesce_initial_num_partitions
Retrieves or sets the minimum number of shuffle partitions after coalescingspark_coalesce_min_num_partitions
Retrieves or sets whether coalescing contiguous shuffle partitions is enabledspark_coalesce_shuffle_partitions
Define a Spark Compilation Specificationspark_compilation_spec
Read Spark Configurationspark_config
Kubernetes Configurationspark_config_kubernetes
Retrieve Available Settingsspark_config_settings
Function that negotiates the connection with the Spark back-endspark_connect_method
Retrieve the Spark Connection Associated with an R Objectspark_connection
Find Spark Connectionspark_connection_find
spark_connection classspark_connection-class
Runtime configuration interface for the Spark Context.spark_context_config
Retrieve a Spark DataFramespark_dataframe
Default Compilation Specification for Spark Extensionsspark_default_compilation_spec
Define a Spark dependencyspark_dependency
Fallback to Spark Dependencyspark_dependency_fallback
Create Spark Extensionspark_extension
Set the SPARK_HOME environment variablespark_home_set
Set of functions to provide integration with the RStudio IDEspark_ide_columns spark_ide_connection_actions spark_ide_connection_closed spark_ide_connection_open spark_ide_connection_updated spark_ide_objects spark_ide_preview
Inserts a Spark DataFrame into a Spark tablespark_insert_table
Download and install various versions of Sparkspark_available_versions spark_install spark_installed_versions spark_install_dir spark_install_tar spark_uninstall
It lets the package know if it should test a particular functionality or notspark_integ_test_skip
Retrieve a Spark JVM Object Referencespark_jobj
spark_jobj classspark_jobj-class
Surfaces the last error from Spark captured by internal `spark_error` functionspark_last_error
Reads from a Spark Table into a Spark DataFrame.spark_load_table
View Entries in the Spark Logspark_log
Read file(s) into a Spark DataFrame using a custom readerspark_read
Read Apache Avro data into a Spark DataFrame.spark_read_avro
Read binary data into a Spark DataFrame.spark_read_binary
Read a CSV file into a Spark DataFramespark_read_csv
Read from Delta Lake into a Spark DataFrame.spark_read_delta
Read image data into a Spark DataFrame.spark_read_image
Read from JDBC connection into a Spark DataFrame.spark_read_jdbc
Read a JSON file into a Spark DataFramespark_read_json
Read libsvm file into a Spark DataFrame.spark_read_libsvm
Read a ORC file into a Spark DataFramespark_read_orc
Read a Parquet file into a Spark DataFramespark_read_parquet
Read from a generic source into a Spark DataFrame.spark_read_source
Reads from a Spark Table into a Spark DataFrame.spark_read_table
Read a Text file into a Spark DataFramespark_read_text
Saves a Spark DataFrame as a Spark tablespark_save_table
Runtime configuration interface for the Spark Sessionspark_session_config
Generate random samples from some distributionspark_statistical_routines
Generate a Table Name from Expressionspark_table_name
Get the Spark Version Associated with a Spark Connectionspark_version
Get the Spark Version Associated with a Spark Installationspark_version_from_home
Open the Spark web interfacespark_web
Write Spark DataFrame to file using a custom writerspark_write
Serialize a Spark DataFrame into Apache Avro formatspark_write_avro
Write a Spark DataFrame to a CSVspark_write_csv
Writes a Spark DataFrame into Delta Lakespark_write_delta
Writes a Spark DataFrame into a JDBC tablespark_write_jdbc
Write a Spark DataFrame to a JSON filespark_write_json
Write a Spark DataFrame to a ORC filespark_write_orc
Write a Spark DataFrame to a Parquet filespark_write_parquet
Write Spark DataFrame to RDS filesspark_write_rds
Writes a Spark DataFrame into a generic sourcespark_write_source
Writes a Spark DataFrame into a Spark tablespark_write_table
Write a Spark DataFrame to a Text filespark_write_text
Access the Spark APIhive_context java_context spark-api spark_context spark_session
Manage Spark Connectionsspark-connections spark_connect spark_connection_is_open spark_disconnect spark_disconnect_all spark_submit
Return the port number of a `sparklyr` backend.sparklyr_get_backend_port
Show database listsrc_databases
Find Streamstream_find
Generate Test Streamstream_generate_test
Spark Stream's Identifierstream_id
Apply lag function to columns of a Spark Streaming DataFramestream_lag
Spark Stream's Namestream_name
Read files created by the streamstream_read_cloudfiles stream_read_csv stream_read_delta stream_read_json stream_read_kafka stream_read_orc stream_read_parquet stream_read_socket stream_read_table stream_read_text
Render Streamstream_render
Stream Statisticsstream_stats
Stops a Spark Streamstream_stop
Spark Stream Continuous Triggerstream_trigger_continuous
Spark Stream Interval Triggerstream_trigger_interval
View Streamstream_view
Watermark Streamstream_watermark
Write files to the streamstream_write_console stream_write_csv stream_write_delta stream_write_json stream_write_kafka stream_write_orc stream_write_parquet stream_write_text
Write Memory Streamstream_write_memory
Write Stream to Tablestream_write_table
Cache a Spark Tabletbl_cache
Use specific databasetbl_change_db
Uncache a Spark Tabletbl_uncache
transform a subset of column(s) in a Spark Dataframetransform_sdf
Uniteunite
Unnestunnest