Subsetting operator for Spark dataframe | [.tbl_spark |
Infix operator for composing a lambda expression | %->% |
Set/Get Spark checkpoint directory | checkpoint_directory spark_get_checkpoint_dir spark_set_checkpoint_dir |
Collect Spark data serialized in RDS format into R | collect_from_rds |
Compile Scala sources into a Java Archive (jar) | compile_package_jars |
Read configuration values for a connection | connection_config |
Copy an R Data Frame to Spark | copy_to.spark_connection |
Distinct | distinct |
Downloads default Scala Compilers | download_scalac |
dplyr wrappers for Apache Spark higher order functions | dplyr_hof |
Enforce Specific Structure for R Objects | ensure |
Fill | fill |
Filter | filter |
Discover the Scala Compiler | find_scalac |
Feature Transformation - Binarizer (Transformer) | ft_binarizer |
Feature Transformation - Bucketizer (Transformer) | ft_bucketizer |
Feature Transformation - ChiSqSelector (Estimator) | ft_chisq_selector |
Feature Transformation - CountVectorizer (Estimator) | ft_count_vectorizer ml_vocabulary |
Feature Transformation - Discrete Cosine Transform (DCT) (Transformer) | ft_dct ft_discrete_cosine_transform |
Feature Transformation - ElementwiseProduct (Transformer) | ft_elementwise_product |
Feature Transformation - FeatureHasher (Transformer) | ft_feature_hasher |
Feature Transformation - HashingTF (Transformer) | ft_hashing_tf |
Feature Transformation - IDF (Estimator) | ft_idf |
Feature Transformation - Imputer (Estimator) | ft_imputer |
Feature Transformation - IndexToString (Transformer) | ft_index_to_string |
Feature Transformation - Interaction (Transformer) | ft_interaction |
Feature Transformation - LSH (Estimator) | ft_bucketed_random_projection_lsh ft_lsh ft_minhash_lsh |
Utility functions for LSH models | ft_lsh_utils ml_approx_nearest_neighbors ml_approx_similarity_join |
Feature Transformation - MaxAbsScaler (Estimator) | ft_max_abs_scaler |
Feature Transformation - MinMaxScaler (Estimator) | ft_min_max_scaler |
Feature Transformation - NGram (Transformer) | ft_ngram |
Feature Transformation - Normalizer (Transformer) | ft_normalizer |
Feature Transformation - OneHotEncoder (Transformer) | ft_one_hot_encoder |
Feature Transformation - OneHotEncoderEstimator (Estimator) | ft_one_hot_encoder_estimator |
Feature Transformation - PCA (Estimator) | ft_pca ml_pca |
Feature Transformation - PolynomialExpansion (Transformer) | ft_polynomial_expansion |
Feature Transformation - QuantileDiscretizer (Estimator) | ft_quantile_discretizer |
Feature Transformation - RFormula (Estimator) | ft_r_formula |
Feature Transformation - RegexTokenizer (Transformer) | ft_regex_tokenizer |
Feature Transformation - RobustScaler (Estimator) | ft_robust_scaler |
Feature Transformation - SQLTransformer | ft_dplyr_transformer ft_sql_transformer |
Feature Transformation - StandardScaler (Estimator) | ft_standard_scaler |
Feature Transformation - StopWordsRemover (Transformer) | ft_stop_words_remover |
Feature Transformation - StringIndexer (Estimator) | ft_string_indexer ft_string_indexer_model ml_labels |
Feature Transformation - Tokenizer (Transformer) | ft_tokenizer |
Feature Transformation - VectorAssembler (Transformer) | ft_vector_assembler |
Feature Transformation - VectorIndexer (Estimator) | ft_vector_indexer |
Feature Transformation - VectorSlicer (Transformer) | ft_vector_slicer |
Feature Transformation - Word2Vec (Estimator) | ft_word2vec ml_find_synonyms |
Full join | full_join |
Generic Call Interface | generic_call_interface |
Retrieve the Spark connection's SQL catalog implementation property | get_spark_sql_catalog_implementation |
Runtime configuration interface for Hive | hive_context_config |
Apply Aggregate Function to Array Column | hof_aggregate |
Sorts array using a custom comparator | hof_array_sort |
Determine Whether Some Element Exists in an Array Column | hof_exists |
Filter Array Column | hof_filter |
Checks whether all elements in an array satisfy a predicate | hof_forall |
Filters a map | hof_map_filter |
Merges two maps into one | hof_map_zip_with |
Transform Array Column | hof_transform |
Transforms keys of a map | hof_transform_keys |
Transforms values of a map | hof_transform_values |
Combines 2 Array Columns | hof_zip_with |
Inner join | inner_join |
Invoke a Method on a JVM Object | invoke invoke_new invoke_static |
Invoke a Java function. | j_invoke j_invoke_new j_invoke_static |
Instantiate a Java array with a specific element type. | jarray |
Instantiate a Java float type. | jfloat |
Instantiate an Array[Float]. | jfloat_array |
Join Spark tbls. | full_join.tbl_spark inner_join.tbl_spark join.tbl_spark left_join.tbl_spark right_join.tbl_spark |
Left join | left_join |
list all sparklyr-*.jar files that have been built | list_sparklyr_jars |
Create a Spark Configuration for Livy | livy_config |
Start Livy | livy_service_start livy_service_stop |
Spark ML - Survival Regression | ml_aft_survival_regression ml_survival_regression |
Spark ML - ALS | ml_als ml_recommend |
Tidying methods for Spark ML ALS | augment.ml_model_als glance.ml_model_als ml_als_tidiers tidy.ml_model_als |
Spark ML - Bisecting K-Means Clustering | ml_bisecting_kmeans |
Chi-square hypothesis testing for categorical data. | ml_chisquare_test |
Spark ML - Clustering Evaluator | ml_clustering_evaluator |
Compute correlation matrix | ml_corr |
Spark ML - Decision Trees | ml_decision_tree ml_decision_tree_classifier ml_decision_tree_regressor |
Default stop words | ml_default_stop_words |
Evaluate the Model on a Validation Set | ml_evaluate ml_evaluate.ml_evaluator ml_evaluate.ml_generalized_linear_regression_model ml_evaluate.ml_linear_regression_model ml_evaluate.ml_logistic_regression_model ml_evaluate.ml_model_classification ml_evaluate.ml_model_clustering ml_evaluate.ml_model_generalized_linear_regression ml_evaluate.ml_model_linear_regression ml_evaluate.ml_model_logistic_regression |
Spark ML - Evaluators | ml_binary_classification_eval ml_binary_classification_evaluator ml_classification_eval ml_evaluator ml_multiclass_classification_evaluator ml_regression_evaluator |
Spark ML - Feature Importance for Tree Models | ml_feature_importances ml_tree_feature_importance |
Frequent Pattern Mining - FPGrowth | ml_association_rules ml_fpgrowth ml_freq_itemsets |
Spark ML - Gaussian Mixture clustering. | ml_gaussian_mixture |
Spark ML - Gradient Boosted Trees | ml_gbt_classifier ml_gbt_regressor ml_gradient_boosted_trees |
Spark ML - Generalized Linear Regression | ml_generalized_linear_regression |
Tidying methods for Spark ML linear models | augment.ml_model_generalized_linear_regression augment.ml_model_linear_regression augment._ml_model_linear_regression glance.ml_model_generalized_linear_regression glance.ml_model_linear_regression ml_glm_tidiers tidy.ml_model_generalized_linear_regression tidy.ml_model_linear_regression |
Spark ML - Isotonic Regression | ml_isotonic_regression |
Tidying methods for Spark ML Isotonic Regression | augment.ml_model_isotonic_regression glance.ml_model_isotonic_regression ml_isotonic_regression_tidiers tidy.ml_model_isotonic_regression |
Spark ML - K-Means Clustering | ml_compute_cost ml_compute_silhouette_measure ml_kmeans |
Evaluate a K-mean clustering | ml_kmeans_cluster_eval |
Spark ML - Latent Dirichlet Allocation | ml_describe_topics ml_lda ml_log_likelihood ml_log_perplexity ml_topics_matrix |
Tidying methods for Spark ML LDA models | augment.ml_model_lda glance.ml_model_lda ml_lda_tidiers tidy.ml_model_lda |
Spark ML - Linear Regression | ml_linear_regression |
Spark ML - LinearSVC | ml_linear_svc |
Tidying methods for Spark ML linear svc | augment.ml_model_linear_svc glance.ml_model_linear_svc ml_linear_svc_tidiers tidy.ml_model_linear_svc |
Spark ML - Logistic Regression | ml_logistic_regression |
Tidying methods for Spark ML Logistic Regression | augment.ml_model_logistic_regression augment._ml_model_logistic_regression glance.ml_model_logistic_regression ml_logistic_regression_tidiers tidy.ml_model_logistic_regression |
Extracts metrics from a fitted table | ml_metrics_binary |
Extracts metrics from a fitted table | ml_metrics_multiclass |
Extracts metrics from a fitted table | ml_metrics_regression |
Extracts data associated with a Spark ML model | ml_model_data |
Spark ML - Multilayer Perceptron | ml_multilayer_perceptron ml_multilayer_perceptron_classifier |
Tidying methods for Spark ML MLP | augment.ml_model_multilayer_perceptron_classification glance.ml_model_multilayer_perceptron_classification ml_multilayer_perceptron_tidiers tidy.ml_model_multilayer_perceptron_classification |
Spark ML - Naive-Bayes | ml_naive_bayes |
Tidying methods for Spark ML Naive Bayes | augment.ml_model_naive_bayes glance.ml_model_naive_bayes ml_naive_bayes_tidiers tidy.ml_model_naive_bayes |
Spark ML - OneVsRest | ml_one_vs_rest |
Tidying methods for Spark ML Principal Component Analysis | augment.ml_model_pca glance.ml_model_pca ml_pca_tidiers tidy.ml_model_pca |
Spark ML - Pipelines | ml_pipeline |
Spark ML - Power Iteration Clustering | ml_power_iteration |
Frequent Pattern Mining - PrefixSpan | ml_freq_seq_patterns ml_prefixspan |
Spark ML - Random Forest | ml_random_forest ml_random_forest_classifier ml_random_forest_regressor |
Spark ML - Pipeline stage extraction | ml_stage ml_stages |
Spark ML - Extraction of summary metrics | ml_summary |
Tidying methods for Spark ML Survival Regression | augment.ml_model_aft_survival_regression glance.ml_model_aft_survival_regression ml_survival_regression_tidiers tidy.ml_model_aft_survival_regression |
Tidying methods for Spark ML tree models | augment.ml_model_decision_tree_classification augment.ml_model_decision_tree_regression augment.ml_model_gbt_classification augment.ml_model_gbt_regression augment.ml_model_random_forest_classification augment.ml_model_random_forest_regression augment._ml_model_decision_tree_classification augment._ml_model_decision_tree_regression augment._ml_model_gbt_classification augment._ml_model_gbt_regression augment._ml_model_random_forest_classification augment._ml_model_random_forest_regression glance.ml_model_decision_tree_classification glance.ml_model_decision_tree_regression glance.ml_model_gbt_classification glance.ml_model_gbt_regression glance.ml_model_random_forest_classification glance.ml_model_random_forest_regression ml_tree_tidiers tidy.ml_model_decision_tree_classification tidy.ml_model_decision_tree_regression tidy.ml_model_gbt_classification tidy.ml_model_gbt_regression tidy.ml_model_random_forest_classification tidy.ml_model_random_forest_regression |
Spark ML - UID | ml_uid |
Tidying methods for Spark ML unsupervised models | augment.ml_model_bisecting_kmeans augment.ml_model_gaussian_mixture augment.ml_model_kmeans glance.ml_model_bisecting_kmeans glance.ml_model_gaussian_mixture glance.ml_model_kmeans ml_unsupervised_tidiers tidy.ml_model_bisecting_kmeans tidy.ml_model_gaussian_mixture tidy.ml_model_kmeans |
Spark ML - ML Params | ml-params ml_is_set ml_param ml_params ml_param_map |
Spark ML - Model Persistence | ml-persistence ml_load ml_save ml_save.ml_model |
Spark ML - Transform, fit, and predict methods (ml_ interface) | is_ml_estimator is_ml_transformer ml-transform-methods ml_fit ml_fit.default ml_fit_and_transform ml_predict ml_predict.ml_model_classification ml_transform |
Spark ML - Tuning | ml-tuning ml_cross_validator ml_sub_models ml_train_validation_split ml_validation_metrics |
Mutate | mutate |
Replace Missing Values in Objects | na.replace |
Nest | nest |
Pivot longer | pivot_longer |
Pivot wider | pivot_wider |
Random string generation | random_string |
Reactive spark reader | reactiveSpark |
Register a Package that Implements a Spark Extension | registered_extensions register_extension |
Register a Parallel Backend | registerDoSpark |
Replace NA | replace_na |
Right join | right_join |
Create DataFrame for along Object | sdf_along |
Bind multiple Spark DataFrames by row and column | sdf_bind sdf_bind_cols sdf_bind_rows |
Broadcast hint | sdf_broadcast |
Checkpoint a Spark DataFrame | sdf_checkpoint |
Coalesces a Spark DataFrame | sdf_coalesce |
Collect a Spark DataFrame into R. | sdf_collect |
Copy an Object into Spark | sdf_copy_to sdf_import |
Cross Tabulation | sdf_crosstab |
Debug Info for Spark DataFrame | sdf_debug_string |
Compute summary statistics for columns of a data frame | sdf_describe |
Support for Dimension Operations | sdf_dim sdf_ncol sdf_nrow |
Invoke distinct on a Spark DataFrame | sdf_distinct |
Remove duplicates from a Spark DataFrame | sdf_drop_duplicates |
Create a Spark dataframe containing all combinations of inputs | sdf_expand_grid |
Convert column(s) from avro format | sdf_from_avro |
Spark DataFrame is Streaming | sdf_is_streaming |
Returns the last index of a Spark DataFrame | sdf_last_index |
Create DataFrame for Length | sdf_len |
Gets number of partitions of a Spark DataFrame | sdf_num_partitions |
Compute the number of records within each partition of a Spark DataFrame | sdf_partition_sizes |
Persist a Spark DataFrame | sdf_persist |
Pivot a Spark DataFrame | sdf_pivot |
Project features onto principal components | sdf_project |
Compute (Approximate) Quantiles with a Spark DataFrame | sdf_quantile |
Partition a Spark Dataframe | sdf_partition sdf_random_split |
Generate random samples from a Beta distribution | sdf_rbeta |
Generate random samples from a binomial distribution | sdf_rbinom |
Generate random samples from a Cauchy distribution | sdf_rcauchy |
Generate random samples from a chi-squared distribution | sdf_rchisq |
Read a Column from a Spark DataFrame | sdf_read_column |
Register a Spark DataFrame | sdf_register |
Repartition a Spark DataFrame | sdf_repartition |
Model Residuals | sdf_residuals sdf_residuals.ml_model_generalized_linear_regression sdf_residuals.ml_model_linear_regression |
Generate random samples from an exponential distribution | sdf_rexp |
Generate random samples from a Gamma distribution | sdf_rgamma |
Generate random samples from a geometric distribution | sdf_rgeom |
Generate random samples from a hypergeometric distribution | sdf_rhyper |
Generate random samples from a log normal distribution | sdf_rlnorm |
Generate random samples from the standard normal distribution | sdf_rnorm |
Generate random samples from a Poisson distribution | sdf_rpois |
Generate random samples from a t-distribution | sdf_rt |
Generate random samples from the uniform distribution U(0, 1). | sdf_runif |
Generate random samples from a Weibull distribution. | sdf_rweibull |
Randomly Sample Rows from a Spark DataFrame | sdf_sample |
Read the Schema of a Spark DataFrame | sdf_schema |
Separate a Vector Column into Scalar Columns | sdf_separate_column |
Create DataFrame for Range | sdf_seq |
Sort a Spark DataFrame | sdf_sort |
Spark DataFrame from SQL | sdf_sql |
Convert column(s) to avro format | sdf_to_avro |
Unnest longer | sdf_unnest_longer |
Unnest wider | sdf_unnest_wider |
Perform Weighted Random Sampling on a Spark DataFrame | sdf_weighted_sample |
Add a Sequential ID Column to a Spark DataFrame | sdf_with_sequential_id |
Add a Unique ID Column to a Spark DataFrame | sdf_with_unique_id |
Save / Load a Spark DataFrame | sdf-saveload sdf_load_parquet sdf_load_table sdf_save_parquet sdf_save_table |
Spark ML - Transform, fit, and predict methods (sdf_ interface) | sdf-transform-methods sdf_fit sdf_fit_and_transform sdf_predict sdf_transform |
Select | select |
Separate | separate |
Retrieves or sets status of Spark AQE | spark_adaptive_query_execution |
Retrieves or sets advisory size of the shuffle partition | spark_advisory_shuffle_partition_size |
Apply an R Function in Spark | spark_apply |
Create Bundle for Spark Apply | spark_apply_bundle |
Log Writer for Spark Apply | spark_apply_log |
Retrieves or sets the auto broadcast join threshold | spark_auto_broadcast_join_threshold |
Retrieves or sets initial number of shuffle partitions before coalescing | spark_coalesce_initial_num_partitions |
Retrieves or sets the minimum number of shuffle partitions after coalescing | spark_coalesce_min_num_partitions |
Retrieves or sets whether coalescing contiguous shuffle partitions is enabled | spark_coalesce_shuffle_partitions |
Define a Spark Compilation Specification | spark_compilation_spec |
Read Spark Configuration | spark_config |
Kubernetes Configuration | spark_config_kubernetes |
Retrieve Available Settings | spark_config_settings |
Function that negotiates the connection with the Spark back-end | spark_connect_method |
Retrieve the Spark Connection Associated with an R Object | spark_connection |
Find Spark Connection | spark_connection_find |
spark_connection class | spark_connection-class |
Runtime configuration interface for the Spark Context. | spark_context_config |
Retrieve a Spark DataFrame | spark_dataframe |
Default Compilation Specification for Spark Extensions | spark_default_compilation_spec |
Define a Spark dependency | spark_dependency |
Fallback to Spark Dependency | spark_dependency_fallback |
Create Spark Extension | spark_extension |
Set the SPARK_HOME environment variable | spark_home_set |
Set of functions to provide integration with the RStudio IDE | spark_ide_columns spark_ide_connection_actions spark_ide_connection_closed spark_ide_connection_open spark_ide_connection_updated spark_ide_objects spark_ide_preview |
Inserts a Spark DataFrame into a Spark table | spark_insert_table |
Download and install various versions of Spark | spark_available_versions spark_install spark_installed_versions spark_install_dir spark_install_tar spark_uninstall |
It lets the package know if it should test a particular functionality or not | spark_integ_test_skip |
Retrieve a Spark JVM Object Reference | spark_jobj |
spark_jobj class | spark_jobj-class |
Surfaces the last error from Spark captured by internal `spark_error` function | spark_last_error |
Reads from a Spark Table into a Spark DataFrame. | spark_load_table |
View Entries in the Spark Log | spark_log |
Read file(s) into a Spark DataFrame using a custom reader | spark_read |
Read Apache Avro data into a Spark DataFrame. | spark_read_avro |
Read binary data into a Spark DataFrame. | spark_read_binary |
Read a CSV file into a Spark DataFrame | spark_read_csv |
Read from Delta Lake into a Spark DataFrame. | spark_read_delta |
Read image data into a Spark DataFrame. | spark_read_image |
Read from JDBC connection into a Spark DataFrame. | spark_read_jdbc |
Read a JSON file into a Spark DataFrame | spark_read_json |
Read libsvm file into a Spark DataFrame. | spark_read_libsvm |
Read a ORC file into a Spark DataFrame | spark_read_orc |
Read a Parquet file into a Spark DataFrame | spark_read_parquet |
Read from a generic source into a Spark DataFrame. | spark_read_source |
Reads from a Spark Table into a Spark DataFrame. | spark_read_table |
Read a Text file into a Spark DataFrame | spark_read_text |
Saves a Spark DataFrame as a Spark table | spark_save_table |
Runtime configuration interface for the Spark Session | spark_session_config |
Generate random samples from some distribution | spark_statistical_routines |
Generate a Table Name from Expression | spark_table_name |
Get the Spark Version Associated with a Spark Connection | spark_version |
Get the Spark Version Associated with a Spark Installation | spark_version_from_home |
Open the Spark web interface | spark_web |
Write Spark DataFrame to file using a custom writer | spark_write |
Serialize a Spark DataFrame into Apache Avro format | spark_write_avro |
Write a Spark DataFrame to a CSV | spark_write_csv |
Writes a Spark DataFrame into Delta Lake | spark_write_delta |
Writes a Spark DataFrame into a JDBC table | spark_write_jdbc |
Write a Spark DataFrame to a JSON file | spark_write_json |
Write a Spark DataFrame to a ORC file | spark_write_orc |
Write a Spark DataFrame to a Parquet file | spark_write_parquet |
Write Spark DataFrame to RDS files | spark_write_rds |
Writes a Spark DataFrame into a generic source | spark_write_source |
Writes a Spark DataFrame into a Spark table | spark_write_table |
Write a Spark DataFrame to a Text file | spark_write_text |
Access the Spark API | hive_context java_context spark-api spark_context spark_session |
Manage Spark Connections | spark-connections spark_connect spark_connection_is_open spark_disconnect spark_disconnect_all spark_submit |
Return the port number of a `sparklyr` backend. | sparklyr_get_backend_port |
Show database list | src_databases |
Find Stream | stream_find |
Generate Test Stream | stream_generate_test |
Spark Stream's Identifier | stream_id |
Apply lag function to columns of a Spark Streaming DataFrame | stream_lag |
Spark Stream's Name | stream_name |
Read files created by the stream | stream_read_cloudfiles stream_read_csv stream_read_delta stream_read_json stream_read_kafka stream_read_orc stream_read_parquet stream_read_socket stream_read_table stream_read_text |
Render Stream | stream_render |
Stream Statistics | stream_stats |
Stops a Spark Stream | stream_stop |
Spark Stream Continuous Trigger | stream_trigger_continuous |
Spark Stream Interval Trigger | stream_trigger_interval |
View Stream | stream_view |
Watermark Stream | stream_watermark |
Write files to the stream | stream_write_console stream_write_csv stream_write_delta stream_write_json stream_write_kafka stream_write_orc stream_write_parquet stream_write_text |
Write Memory Stream | stream_write_memory |
Write Stream to Table | stream_write_table |
Cache a Spark Table | tbl_cache |
Use specific database | tbl_change_db |
Uncache a Spark Table | tbl_uncache |
transform a subset of column(s) in a Spark Dataframe | transform_sdf |
Unite | unite |
Unnest | unnest |