Loading...

Follow Revolutions on Feedspot

Continue with Google
Continue with Facebook
or

Valid
Revolutions by David Smith - 4d ago

For almost five years, the entire CRAN repository of R packages has been archived on a daily basis at MRAN. If you use CRAN snapshots from MRAN, we'd love to hear how you use them in this survey. If you're not familiar with the concept, or just want to learn more, read on.

Every day since September 17, 2014, we (Microsoft and, before the acquisition, Revolution Analytics) have archived a snapshot of the entire CRAN repository as a service to the R community. These daily snapshots have several uses:

  • As a longer-term archive of binary R packages. (CRAN keeps an archive of package source versions, but binary versions of packages are kept for a limited time. CRAN keeps package binaries only for the current R version and the prior major version, and only for the latest version of the package). 
  • As a static CRAN repository you can use like the standard CRAN repository, but frozen in time. This means changes to CRAN packages won't affect the behavior of R scripts in the future (for better or worse). options(repos="https://cran.microsoft.com/snapshot/2017-03-15/") provides a CRAN repository that works with R 3.3.3, for example — and you can choose any date since September 17, 2014.
  • The checkpoint package on CRAN provides a simple interface to these CRAN snapshots, allowing you use a specific CRAN snapshot by specifying a date, and making it easy to manage multiple R project each using different snapshots.
  • Microsoft R Open, Microsoft R Client, Microsoft ML Server and SQL Server ML Services all use fixed CRAN repository snapshots from MRAN by default.
  • The rocker project provides container instances for historical versions of R, tied to an appropriate CRAN snapshot from MRAN suitable for the corresponding R version.
Browse the MRAN time machine to find specific CRAN snapshots by date. (Tip: click the R logo to open the snapshot URL in its own new window.)

MRAN and the CRAN snapshot system was created at a time when reproducibility was an emerging concept in the R ecosystem. Now, there are several methods available to ensure that your R code works consistently, even as R and CRAN changes. Beyond virtualization and containers, you have packages like packrat and miniCRAN, RStudio's package manager, and the full suite of tools for reproducible research.

As CRAN has grown and changes to packages have become more frequent, maintaining MRAN is an increasingly resource-intensive process. We're contemplating changes, like changing the frequency of snapshots, or thinning the archive of snapshots that haven't been used. But before we do that we'd  like to hear from the community first. Have you used MRAN snapshots? If so, how are you using them? How many different snapshots have you used, and how often do you change that up? Please leave your feedback at the survey link below by June 14, and we'll use the feedback we gather in our decision-making process. Responses are anonymous, and we'll summarize the responses in a future blog post. Thanks in advance!

Take the MRAN survey here.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

By Hong Ooi and Alex Kyllo

This post is to announce the availability of AzureKusto, the R interface to Azure Data Explorer (internally codenamed “Kusto”), a fast, fully managed data analytics service from Microsoft. It is available from CRAN, or you can install the development version from GitHub via devtools::install_github("cloudyr/AzureKusto").

AzureKusto provides an interface (including DBI compliant methods for connecting to Kusto clusters and submitting Kusto Query Language (KQL) statements, as well as a dbplyr style backend that translates dplyr queries into KQL statements. On the administrator side, it extends the AzureRMR framework to allow for creating clusters and managing database principals.

Connecting to a cluster

To connect to a Data Explorer cluster, call the kusto_database_endpoint() function. Once you are connected, call run_query() to execute queries and command statements.

library(AzureKusto)

## Connect to a Data Explorer cluster with (default) device code authentication
Samples <- kusto_database_endpoint(
server="https://help.kusto.windows.net",
database="Samples") res <- run_query(Samples,
"StormEvents | summarize EventCount = count() by State | order by State asc") head(res) ## State EventCount ## 1 ALABAMA 1315 ## 2 ALASKA 257 ## 3 AMERICAN SAMOA 16 ## 4 ARIZONA 340 ## 5 ARKANSAS 1028 ## 6 ATLANTIC NORTH 188 # run_query can also handle command statements, which begin with a '.' character res <- run_query(Samples, ".show tables | count") res[[1]] ## Count ## 1 5
dplyr Interface

The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect(). All the standard verbs are supported.

library(dplyr)
StormEvents <- tbl_kusto(Samples, "StormEvents")
q <- StormEvents %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
    arrange(State)
show_query(q) ## <KQL> database('Samples').['StormEvents'] ## | summarize ['EventCount'] = count() by ['State'] ## | order by ['State'] asc
collect(q) ## # A tibble: 67 x 2 ## State EventCount ## <chr> <dbl> ## 1 ALABAMA 1315 ## 2 ALASKA 257 ## 3 AMERICAN SAMOA 16 ## ...
DBI interface

AzureKusto implements a subset of the DBI specification for interfacing with databases in R.

The following methods are supported:

  • Connections: dbConnect, dbDisconnect, dbCanConnect
  • Table management: dbExistsTable, dbCreateTable, dbRemoveTable, dbReadTable, dbWriteTable
  • Querying: dbGetQuery, dbSendQuery, dbFetch, dbSendStatement, dbExecute, dbListFields, dbColumnInfo

It should be noted, though, that Data Explorer is quite different to the SQL databases that DBI targets. This affects the behaviour of certain DBI methods and renders other moot.

library(DBI)

Samples <- dbConnect(AzureKusto(),
                     server="https://help.kusto.windows.net",
                     database="Samples")

dbListTables(Samples)
## [1] "StormEvents"       "demo_make_series1" "demo_series2"     
## [4] "demo_series3"      "demo_many_series1"

dbExistsTable(Samples, "StormEvents")
##[1] TRUE

dbGetQuery(Samples, "StormEvents | summarize ct = count()")
##      ct
## 1 59066

If you have any questions, comments or other feedback, please feel free to open an issue on the GitHub repo.

And one more thing...

As of Build 2019, Data Explorer can also run R (and Python) scripts in-database. For more information on this feature, currently in public preview, see the Azure blog and the documentation article.

 

 

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Revolutions by David Smith - 2w ago

A major update to the open-source R language, R 3.6.0, was released on April 26 and is now available for download for Windows, Mac and Linux. As a major update, it has many new features, user-visible changes and bug fixes. You can read the details in the release announcement, and in this blog post I'll highlight the most significant ones.

Changes to random number generation. R 3.6.0 changes the method used to generate random integers in the sample function. In prior versions, the probability of generating each integer could vary from equal by up to 0.04% (or possibly more if generating more than a million different integers). This change will mainly be relevant to people who do large-scale simulations, and it also means that scripts using the sample function will generate different results in R 3.6.0 than they did in prior versions of R. If you need to keep the results the same (for reproducibility or for automated testing), you can revert to the old behavior by adding RNGkind(sample.kind="Rounding")) to the top of your script.

Changes to R's serialization format. If you want to save R data to disk with R 3.6.0 and read it back with R version 3.4.4 or earlier, you'll need to use readRDS("mydata.Rd",version=2) to save your data in the old serialization format, which has been updated to Version 3 for this release. (The same applies to the functions save, serialize, and byte-compiled R code.)  The R 3.5 series had forwards-compatibility in mind, and can already read data serialized in the Version 3 format.

Improvements to base graphics. You now have more options the appearance of axis labels (and perpendicular labels no longer overlap), better control over text positioning, a formula specification for barplots, and color palettes with better visual perception.

Improvements to package installation and loading, which should eliminate problems with partially-installed packages and reduce the space required for installed packages.

More functions now support vectors with more than 2 billion elements, including which and pmin/pmax.

Various speed improvements to functions including outer, substring, stopifnot, and the $ operator for data frames.

Improvements to statistical functions, including standard errors for T tests and better influence measures for multivariate models.

R now uses less memory, thanks to improvements to functions like drop, unclass and seq to use the ALTREP system to avoid duplicating data.

More control over R's memory usage, including the ability limit the amount of memory R will use for data. (Attempting to exceed this limit will generate a "cannot allocate memory" error.) This is particularly useful when R is being used in production, to limit the impact of a wayward R process on other applications in a shared system.

Better consistency between platforms. This has been an ongoing process, but R now has fewer instances of functions (or function arguments) that are only available on limited platforms (e.g. on Windows but not on Linux). The documentation is now more consistent between platforms, too. This should mean fewer instances of finding R code that doesn't run on your machine because it was written on a different platform.

There are many other specialized changes as well, which you can find in the release notes. Other than the issues raised above, most code from prior versions of R should run fine in R 3.6.0, but you will need to re-install any R packages you use, as they won't carry over from your R 3.5.x installation. Now that a couple of weeks have passed since the release, most packages should be readily available on CRAN for R 3.6.0.

As R enters its 20th year of continuous stable releases, please do take a moment to reflect on the ongoing commitment of the R Core Team for their diligence in improving the R engine multiple times a year. Thank you to everyone who has contributed. If you'd like to support the R project, here's some information on contributing to the R Foundation.

Final note: the code name for R 3.6.0 is "Planting of a Tree". The R code-names generally refer to Peanuts comics: if anyone can identify what the R 3.6.0 codename is referring to, please let us know in the comments!

R-announce mailing list: R 3.6.0 is released

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

It's taken a little bit longer than usual, but Microsoft R Open 3.5.2 (MRO) is now available for download for Windows and Linux. This update is based on R 3.5.2, and accordingly fixes a few minor bugs compared to MRO 3.5.1. The main change you will note is that new CRAN packages released since R 3.5.1 can now be used with this version of MRO.

Microsoft R Open 3.5.3, based on R 3.5.3, will be available next week, on May 10. Microsoft R Open 3.6.0, based on the recently-released R 3.6.0, is currently under development, and we'll make an announcement here when it's available too.

One thing to note: as of version 3.5.2 Microsoft R Open is no longer distributed for MacOS systems. If you want to continue to take advantage of the Accelerate framework on MacOS for, it's not too difficult to tweak the CRAN binary for MacOS to enable multi-threaded computing, and the additional open source bundled packages (like checkpoint and iterators) are available to install from CRAN or GitHub. 

As always, we hope you find Microsoft R Open useful, and if you have any comments or questions please visit the Microsoft R Open forum. You can follow the development of Microsoft R Open at the MRO Github repository. To download Microsoft R Open, simply follow the link below.

MRAN: Download Microsoft R Open

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Microsoft Graph is a comprehensive framework for accessing data in various online Microsoft services, including Azure Active Directory (AAD), Office 365, OneDrive, Teams, and more. AzureGraph is an R package that provides a simple R6-based interface to the Graph REST API, and is the companion package to AzureRMR and AzureAuth.

Currently, AzureGraph aims to provide an R interface only to the AAD part, with a view to supporting R interoperability with Azure: registered apps and service principals, users and groups. Like AzureRMR, it could potentially be extended to support other services.

AzureGraph is on CRAN, so you can install it via install.packages("AzureGraph"). Alternatively, you can install the development version from GitHub via devtools::install_github("cloudyr/AzureGraph").

Authentication
AzureGraph uses a similar authentication procedure to AzureRMR and the Azure CLI. The first time you authenticate with a given Azure Active Directory tenant, you call create_graph_login() and supply your credentials. AzureGraph will prompt you for permission to create a special data directory in which to cache the obtained authentication token and AD Graph login. Once this information is saved on your machine, it can be retrieved in subsequent R sessions with get_graph_login(). Your credentials will be automatically refreshed so you don’t have to reauthenticate.
library(AzureGraph)

# authenticate with AAD
# - on first login, call create_graph_login()
# - on subsequent logins, call get_graph_login()
gr <- create_graph_login()

Linux DSVM note: If you are using a Linux Data Science Virtual Machine in Azure, you may have problems running create_graph_login() (ie, without arguments). In this case, try create_graph_login(auth_type="device_code").

Users and groups

The basic classes for interacting with user accounts and groups are az_user and az_group. To instantiate these, call the get_user and get_group methods of the login client object.

# account of the logged-in user (if you authenticated via the default method)
me <- gr$get_user()

# alternative: supply an email address or GUID
me2 <- gr$get_user("hongooi@microsoft.com")

# IDs of my groups
head(me$list_group_memberships)
#> [1] "98326d14-365a-4257-b0f1-5c3ce3104f75" "b21e5600-8ac5-407b-8774-396168150210"
#> [3] "be42ef66-5c13-48cb-be5c-21e563e333ed" "dd58be5a-1eac-47bd-ab78-08a452a08ea0"
#> [5] "4c2bfcfe-5012-4136-ab33-f10389f2075c" "a45fbdbe-c365-4478-9366-f6f517027a22"

# a specific group
(grp <- gr$get_group("82d27e38-026b-4e5d-ba1a-a0f5a21a2e85"))
#> <Graph group 'AIlyCATs'>
#>   directory id: 82d27e38-026b-4e5d-ba1a-a0f5a21a2e85
#>   description: ADS AP on Microsoft Teams.
#> - Instant communication.
#> - Share files/links/codes/...
#> - Have fun. :)

The actual properties of an object are stored as a list in the properties field:

# properties of a user account
names(me$properties)
#>  [1] "@odata.context"                 "id"                             "deletedDateTime"
#>  [4] "accountEnabled"                 "ageGroup"                       "businessPhones"
#>  [7] "city"                           "createdDateTime"                "companyName"
#> [10] "consentProvidedForMinor"        "country"                        "department"
#> [13] "displayName"                    "employeeId"                     "faxNumber"
#> ...

me$properties$companyName
#> [1] "MICROSOFT PTY LIMITED"

# properties of a group
names(grp$properties)
#>  [1] "@odata.context"                "id"                            "deletedDateTime"
#>  [4] "classification"                "createdDateTime"               "description"
#>  [7] "displayName"                   "expirationDateTime"            "groupTypes"
#> [10] "mail"                          "mailEnabled"                   "mailNickname"
#> [13] "membershipRule"                "membershipRuleProcessingState" "onPremisesLastSyncDateTime"
#> ...

You can also view any directory objects that you own and/or created, via the list_owned_objects and list_registered_objects methods of the user object. These accept a type argument to filter the list of objects by the specified type(s).

me$list_owned_objects(type="application")
#> [[1]]
#> <Graph registered app 'AzureRapp'>
#>   app id: 5af7bc65-8834-4ee6-90df-e7271a12cc62
#>   directory id: 132ce21b-ebb9-4e75-aa04-ad9155bb921f
#>   domain: microsoft.onmicrosoft.com

me$list_owned_objects(type="group")
#> [[1]]
#> <Graph group 'AIlyCATs'>
#>   directory id: 82d27e38-026b-4e5d-ba1a-a0f5a21a2e85
#>   description: ADS AP on Microsoft Teams.
#> - Instant communication.
#> - Share files/links/codes/...
#> - Have fun. :)
#>
#> [[2]] 
#> <Graph group 'ANZ Data Science and AI V-Team'>
#>   directory id: 4e237eed-5f9b-4abd-830b-9322cb472b66
#>   description: ANZ Data Science V-Team
#>
#> ...
Registered apps and service principals

To get the details for a registered app, use the get_app or create_app methods of the login client object. These return an object of class az_app. The first method retrieves an existing app, while the second creates a new app.

# an existing app
gr$get_app("5af7bc65-8834-4ee6-90df-e7271a12cc62")
#> <Graph registered app 'AzureRapp'>
#>   app id: 5af7bc65-8834-4ee6-90df-e7271a12cc62
#>   directory id: 132ce21b-ebb9-4e75-aa04-ad9155bb921f
#>   domain: microsoft.onmicrosoft.com

# create a new app
(appnew <- gr$create_app("AzureRnewapp"))
#> <Graph registered app 'AzureRnewapp'>
#>   app id: 1751d755-71b1-40e7-9f81-526d636c1029
#>   directory id: be11df41-d9f1-45a0-b460-58a30daaf8a9
#>   domain: microsoft.onmicrosoft.com

By default, creating a new app will also generate a strong password with a duration of one year, and create a corresponding service principal in your AAD tenant. You can retrieve this with the get_service_principal method, which returns an object of class az_service_principal.

appnew$get_service_principal()
#> <Graph service principal 'AzureRnewapp'>
#>   app id: 1751d755-71b1-40e7-9f81-526d636c1029
#>   directory id: 7dcc9602-2325-4912-a32e-03e262ffd240
#>   app tenant: 72f988bf-86f1-41af-91ab-2d7cd011db47

# or directly from the login client (supply the app ID in this case)
gr$get_service_principal("1751d755-71b1-40e7-9f81-526d636c1029")
#> <Graph service principal 'AzureRnewapp'>
#>   app id: 1751d755-71b1-40e7-9f81-526d636c1029
#>   directory id: 7dcc9602-2325-4912-a32e-03e262ffd240
#>   app tenant: 72f988bf-86f1-41af-91ab-2d7cd011db47

To update an app, call its update method. For example, use this to set a redirect URL or change its permissions. Consult the Microsoft Graph documentation for what properties you can update. To update its password specifically, call the update_password method.

#' # set a public redirect URL
newapp$update(publicClient=list(redirectUris=I("http://localhost:1410")))

#' # change the password
newapp$update_password()
Common methods

The classes described above inherit from a base az_object class, which represents an arbitrary object in Azure Active Directory. This has the following methods:

  • delete(confirm=TRUE): Delete an object. By default, ask for confirmation first.
  • update(...): Update the object information in Azure Active Directory (mentioned above when updating an app).
  • do_operation(...): Carry out an arbitrary operation on the object.
  • sync_fields(): Synchronise the R object with the data in Azure Active Directory.
  • list_group_memberships(): Return the IDs of all groups this object is a member of.
  • list_object_memberships(): Return the IDs of all groups, administrative units and directory roles this object is a member of.

For efficiency the list_group_memberships and list_object_memberships methods return only the IDs of the groups/objects, since these lists can be rather long.

# get my OneDrive
me$do_operation("drive")
See also

See the following links on Microsoft Docs for more information.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I've noted over the past month or so.

Open Source AI, ML & Data Science News

Facebook open-sources PyTorch-BigGraph for producing embeddings for graphs where the model is too large to fit in memory. Also published: an embeddings graph for 50 million Wikipedia concepts.

Visual Studio Code expands Python support, including a new variable explorer and data viewer, improved debugging capabilities, and real-time collaboration via Live Share.

Pyright, a static type-checker for Python, available as a command-line tool and a VS Code extension.

Frequently Asked Questions about Tensorflow 2.0, now in alpha testing.

VoTT, an open source annotation and labeling tool for image and video assets, has been updated to version 2.

Industry News

Google announces AI Platform (beta), an integrated platform for developers to create AI capabilities and deploy them to GCP or on premises.

New enterprise AI solutions from Google (in beta): Document Understanding AI, Contact Center AI, and Recommendations AI for retail.

Google made a number of other of AI-related announcements at Google Next, including new ML capabilities in BigQuery, AutoML Tables to generate an ML model predicting a data column with one click, updated pre-trained models for vision and natural language services, general availability of Cloud TPU v3, and more.

NVIDIA Tesla G4 GPUs are now available in Google Colab for faster training with larger models.

Microsoft News

Microsoft commits to host the world's leading environmental data sets on Azure as part of a broader sustainability initiative.

Microsoft's research in Machine Teaching: the process of augmenting models with human input rather than using data alone.

Anomaly Detector, a new Azure Cognitive Service to detect unusual values in time series data, is now in preview.

Azure Custom Vision, the transfer learning service for image classification with user-provided labels, is now generally available.

Azure Video Indexer can now be trained to recognize specific people in video from user-provided photographs.

Azure Cognitive Services are now accessible in Apache Spark workloads, via Spark ML pipelines.

You can now deploy Tensorflow models to Azure Data Box Edge with Azure ML Service and ONNX, for local inference on high-performance FPGA chips.

Azure HDInsight now supports Apache Hadoop 3.0, and updated versions of Hive, HBase, and Phoenix.

Managed MLflow is now generally available in Azure Databricks, to track experiments and manage models and projects. Microsoft has also joined the open-source MLflow project.

PowerBI now offers AutoML, to enable business analysts to build machine learning models from data without coding.

Learning resources

Advanced Natural Language Processing with spaCy, a free online course from Ines Montani, a core developer of the Python package. These tutorial notebooks on sentiment analysis by Ben Trevett also make use of spaCy.

The Economist reworks some of their worst published charts, providing a lesson in better data visualization.

Discriminating Systems: Gender, Race and Power in AI: Results from a year-long study on diversity in the AI sector from the AI Now Institute.

A tutorial on creating a serverless HTTP endpoint with Python and Azure Functions using Visual Studio Code.

Tutorial: incrporating predictions from an R model in Power BI report.

AWS Machine Learning University: free access to the curriculum used to teach Amazon staff about ML on AWS.

An introduction to reinforcement learning, with AWS RoboMaker.

Applications

Ganvatar: an impressive demonstration of synthesizing faces along semantic vector axes (age, gender, happiness).

Using the time series and outlier detection features of the Kusto Query Language to detect cyber threats with Azure Sentinel.

Forecasting orange juice sales in a grocery chain (Jupyter Notebook), using automated machine learning in Azure ML Service.

Seek, a smartphone app that uses computer vision to identify plant and animal species in real time.

Find previous editions of the monthly AI roundup here.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from Microsoft and elsewhere that I've noted over the past month or so.

Open Source AI, ML & Data Science News

TensorFlow Privacy: a Python library for training machine learning models with differential privacy, for use with sensitive data to generate models that don't learn details about specific people.

Tensorflow Federated, an open-source library for Federated Learning, enabling many participating clients to train shared ML models while keeping their data local.

R 3.5.3 has been released.

NNI, an open source AutoML toolkit for neural architecture search and hyper-parameter tuning, from Microsoft Research.

Industry News

Open AI has published a paper describing GPT-2, an unsupervised language model that can generate paragraphs of coherent text that could be mistaken for human writing. Only a scaled-down version has been released, for fear of abuse.

Mozilla releases Common Voices, a public domain dataset of 1,400 hours of recorded voice data in 18 languages.

GCP's Deep Learning Virtual Machine now incorporates RAPIDS, the open-source GPU-acceleration library for Python machine learning.

Google introduces GPipe, an open-source library for training large-scale deep neural networks on devices with otherwise insufficient resources.

Determined AI, platform-agnostic software for automated machine learning and infrastructure management, has been announced.

Microsoft News

Azure Machine Learning Service now incorporates RAPIDS, the open source NVIDIA library that provides accelerated GPU support for Pandas dataframes and scikit-learn machine learning algorithms.

ONNX Runtime adds the NVIDIA TensorRT execution provider, for improved inferencing support on NVIDIA GPU devices.

The Intel Optimized Data Science Data Science Virtual Machine, providing up to 10x performance increase for CPU-based deep learning workloads, is now available on Azure Marketplace.

New Python features in VS Code include validation of breakpoint targets, a new test explorer, and the ability to run selected code without the need to define code cells.

New enhancements for Cognitive Services Computer Vision: bounding boxes for detected objects, and more types of objects detected (including thousands of brand logos).

Azure Cognitive Services containers for Text Analytics and Language Understanding containers are now supported on edge devices with Azure IoT Edge.

MMLSpark 0.16 is released, with improvements to machine learning methods for Apache Spark including Azure Search integration and the Smart Adaptive Recommender.

Microsoft Research releases MT-DNN as a PyTorch package, an architecture for natural language understandings that combines elements of MTL and BERT.

Learning resources

Microsoft launches AI Business School, a collection of learning modules aimed at business leaders on AI strategy, technology and responsible impact.

A high-level article on how and why transfer learning works, and the connection with embeddings.

The Quartz AI Studio, a task-oriented resource for data journalists, provides useful guides for text and image analysis for general practitioners as well.

Causal Inference: The Mixtape, a freely-licensed book by Scott Cunningham that comes with its own playlist.

An architecture for using R at scale on GCP, by Mark Edmondson.

The Python Data Science Handbook (by Jake VanderPlas) in Jupyter Notebook form. Also available in Azure Notebooks and Google Colab.

Datasets for Machine Learning: A list of the biggest datasets from across the web.

Applications

SPADE, a method for synthesizing photo-realistic images from a simple sketch.

Seeing AI now includes a feature to help vision-impaired users explore photographs by touch, and is now supported on iPad devices.

Google deploys a compact RNN transducer to mobile phones that can transcribe speech on-device and streams output letter-by-letter, and a quasi-recurrent neural network for handwriting transcription.

NVIDIA introduces the Kaldi ASR Framework for high-speed speech transcription.

Find previous editions of the monthly AI roundup here.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Revolutions by David Smith - 2M ago

The R Core Team announced yesterday the release of R 3.5.3, and updated binaries for Windows and Linux are now available (with Mac sure to follow soon). This update fixes three minor bugs (to the functions writeLines, setClassUnion, and stopifnot), but you might want to upgrade just to avoid the "package built under R 3.5.4" warnings you might get for new CRAN packages in the future.

R releases typically reference the Peanuts cartoon, but the code-name for this release, "Great Truth", is somewhat of a mystery. There may be a clue in the release date, March 11. Anyone got any ideas?

For more details on this latest update to the R language, check out the announcement below. And as always, thanks to the members of the R Core Team for their contributions to all R users with the R project.

R-announce mailing list: R 3.5.3 is released

 

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Which US city has the worst weather? To answer that question, data analyst Taras Kaduk counted the number of pleasant days in each city and ranked them accordingly. For this analysis, a "pleasant" day is one where the average temperature was in the range 55°F-75°F, the maximum was in the range 60°-90°, the minimum was in the range 40°-70°, and there was no significant rain or snow. (The weather data was provided by the NOAA Global Surface Summary of the Day dataset and downloaded to R with the rnoaa package.)

With those criteria and focusing just on the metro regions with more than 1 million people, the cities with the fewest pleasant days are:

  1. San Juan / Carolina / Caguas, Puerto Rico (hot year-round)
  2. Rochester, NY (cold in the winter, rain in the summer)
  3. Detroit / Warren / Dearborn, MI (cold in the winter, rain in the summer)

You can see the top (bottom?) 25 cities in this list in the elegant chart below (also by Tara Kaduk), which shows each city as a polar bar chart and with one ring for each of the 6 years of data analyzed. 

And if you're wondering which cities have the best weather, here's the corresponding chart for the 25 cities with the most pleasant days. San Diego / Carlsbad (CA) tops that list.

You can find the R code behind the analysis and charts in this Github repository. (The polar charts above required a surprisingly small amount of code: it's a simple transformation of a regular bar chart with the ggplot2 coord_polar transformation — quite appropriate given the annual cycle of weather data.) And for the full description of the analysis including some other nice graphical representations of the data, check out the blog post linked below.

Taras Kaduk: Where are the places with the best (and the worst) weather in the United States

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Let's say you've developed a predictive model in R, and you want to embed predictions (scores) from that model into another application (like a mobile or Web app, or some automated service). If you expect a heavy load of requests, R running on a single server isn't going to cut it: you'll need some kind of distributed architecture with enough servers to handle the volume of requests in real time.

This reference architecture for real-time scoring with R, published in Microsoft Docs, describes a Kubernetes-based system to distribute the load to R sessions running in containers. This diagram from the article provides a good overview:

You can find detailed instructions for deploying this architecture in Github. This architecture uses Azure-specific components, but you could also use their open source equivalents if you wanted to host them yourself:

For more details on this architecture, take a look at the Microsoft Docs article linked below.

Microsoft Docs: Real-time scoring of R machine learning models

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview