In-database XGBoost and other new features in Oracle Database 21c
Oracle » Oracle Machine Learning Blog
by Mark Hornick
3y ago
We are pleased to announce the availability of new algorithms and features for OML4SQL – the SQL interface to Oracle Machine Learning – on Oracle Database 21c (21.3) for Linux. In Oracle Database 21c, OML4SQL includes two new algorithms: XGBoost and MSET-SPRT. The highly popular and powerful extreme gradient boosting trees, or XGBoost, extends the OML classification and regression suite of algorithms, while also introducing ranking. You can use XGBoost as a stand-alone predictor or incorporate it into real-world production pipelines for a wide range of problems such as ad click-through rate pr ..read more
Visit website
Metrics for Regression Using OML4Py
Oracle » Oracle Machine Learning Blog
by Jie Liu
3y ago
Metrics for Regression The goal of a regression task is to build models based on features to predict a target quantity, that is, a numeric value. After a regression model is applied to a test set, the next step is to evaluate the model performance by checking the error between the regression output and the true value. A certain set of metrics is often used to evaluate the regression model such as mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE) and R squared, etc. What are the differences among those metrics? How does one choose a metric correctly? How does t ..read more
Visit website
Oracle Machine Learning for Python now available on Linux with Oracle Database 21c
Oracle » Oracle Machine Learning Blog
by Mark Hornick
3y ago
We are pleased to announce the availability of OML4Py on Oracle Database 21c (21.3). Earlier this year, we announced OML4Py on Oracle Autonomous Database. Now, Python users can extend the power of Python when analyzing data in Oracle Database. Oracle Machine Learning for Python (OML4Py) makes the open source Python scripting language and environment ready for the enterprise and big data. Designed for problems involving both large and small data volumes, Oracle Machine Learning for Python integrates Python with Oracle Database, allowing users to run Python commands and scripts for data explorat ..read more
Visit website
A Simple Guide to Parameter Passing Using the OML4Py REST API
Oracle » Oracle Machine Learning Blog
by Sherry LaMonica
3y ago
One of the advantages of Oracle Machine Learning for Python (OML4Py) is embedded Python execution, which allows users to invoke user-defined Python functions from a REST interface using Python engines spawned and controlled by the Oracle Autonomous Database environment. In addition, those functions can be invoked in a data-parallel and task-parallel manner with multiple Python engines. In this blog, we’ll focus on some tips when passing parameters to your user-defined Python functions via the REST interface.  Passing parameters to an OML4Py user-defined function in a REST request dif ..read more
Visit website
Wiki ESA Model Available for Database 19c
Oracle » Oracle Machine Learning Blog
by Sherry LaMonica
3y ago
A new Explicit Semantic Analysis (ESA) Wikipedia model is available for Oracle Database 19c. The new ESA model was built under database version 19.0 using the July 1, 2021 Wikipedia dump. This pre-built model is based on millions of Wikipedia articles reduced to about 161,000 topics. The Explicit Semantic Analysis algorithm is an unsupervised algorithm used for feature extraction. ESA operates at the level of concepts and meaning rather than just the surface form vocabulary of a word or document. The resulting calculations represent the meaning of a piece of ..read more
Visit website
Cross Validation Using OML4Py Part II
Oracle » Oracle Machine Learning Blog
by Jie Liu
3y ago
Cross validation is a widely used model validation approach. As we discussed in Part I, it has the benefit of providing a more complete picture of model performance. However, cross validation can be costly with large datasets since one needs to repeat the training and testing for K times if a K-fold cross validation is chosen. By leveraging embedded Python execution in OML4Py, we can parallelize the K train/test processes, which is suitable for the use case when the user has an open source model to evaluate through cross validation. In this blog, we will show our approach to speed up cross val ..read more
Visit website
Selecting the best number of clusters at scale using OML4Py – Elbow Method
Oracle » Oracle Machine Learning Blog
by Jie Liu
3y ago
Among clustering algorithms, K-means is the most popular. It is fast, scalable and easy to interpret. Therefore, it is almost the default first choice when data scientists want to cluster data and get insight into the inner structure of a dataset. A good introduction to this method can be found in the Oracle datascience blog. However, there is one key parameter for K-means clustering that needs to be selected appropriately. That is K, the number of clusters for the algorithm to generate, which is left for the user to choose. For a given dataset without any prior knowledge, we cannot know for ..read more
Visit website
Deploy an XGBoost Model using OML Services
Oracle » Oracle Machine Learning Blog
by Sherry LaMonica
3y ago
Until recently, data scientists had only a handful of tools to work with, but today there is a robust ecosystem of frameworks and hardware runtimes. While this growing toolbox is extremely useful, each framework has the potential to become a silo, lacking interoperability. Supporting interoperability requires customization, and reimplementing models for movement between frameworks can slow development by weeks or months. The Open Neural Network Exchange (ONNX) format was created to ease the process of model porting between frameworks, some of which may be more desirable for specific phases of ..read more
Visit website
Metrics for Classification Using OML4Py Part II
Oracle » Oracle Machine Learning Blog
by Jie Liu
3y ago
In part I, we discussed popular metrics such as accuracy, confusion matrix, precision, recall and the F1 score. The common characteristic for those metrics is that they rely on a given threshold for producing the ultimate prediction. In most cases, a classification model originally produces a probably score. In order to arrive at a prediction, one needs to come up with a threshold: a case is predicted as positive when the probably score is greater than the threshold and vice versa. Therefore, if we increase the threshold, we will have a higher recall and lower precision. How are precision and ..read more
Visit website
Metrics for Classification Using OML4Py Part I
Oracle » Oracle Machine Learning Blog
by Jie Liu
3y ago
Metrics for Classification Think about the following scenario. As a seasoned data scientist, you spent a lot of time and effort tackling a challenging dataset. Finally, you built a good model, with creative feature engineering and smart data cleaning. The screen displays a great AUC (0.9, unbelievable!) score. How exciting! Then the stakeholder comes in and throws out a question: How does such a high AUC score help me in targeting customers? Sometimes there is a gap between data science terminology and business needs. AUC can be a perfect example. How to explain the AUC to people unfamiliar wi ..read more
Visit website

Follow Oracle » Oracle Machine Learning Blog on FeedSpot

Continue with Google
Continue with Apple
OR