Data Science Stack Exchange on Feedspot

How to identify outliers on a box and whisker plot that seems to be compressed?

Data Science Stack Exchange

by Susmitha Acharya

2d ago

I have plotted box plots for the features of an ML problem, to identify outliers. I have scaled the data using a MinMaxScaler so that the scaled data is in the range [0,1]. For some columns, the two quartiles are visible clearly, but for some features the first and the third quartile are exactly the same. How can I identify outliers for such features? [![Box and whisker plot**][1]**][1] [1]: https://i.stack.imgur.com/hkMc8.jpgstrong text ..read more

Visit website

How does dependency information impact binary classification in multi-label prediction models?

Data Science Stack Exchange

by malisokan

2d ago

TL;DR: I don't understand the dependency issue with binary classification (binary relevance) compared to multi-label prediction models. I often read in papers that some kind of "dependency information" is missing when a binary classification is used for a multi-label prediction problem. I mean a multi-label problem is solved with single binary classifiers. Let's take a simple data set with multiple target variables: Age Gender TargetA TargetB TargetC 34 m 1 1 1 22 f 0 0 1 45 m 1 0 0 If the goal is to predict future TargetA, TargetB and TargetC, then I see these possibilities ..read more

Visit website

Tensorflow SegNet architecture

Data Science Stack Exchange

by D .Stark

2d ago

I was unable to find a complete description of the SegNet architecture for image segmentation (specifically, the decoder layers). Therefore, I would like to clarify the correctness of my implementation (schematically): Input(x, x, 3) Conv2d(64)+BatchNormalization+ReLU Conv2d(64)+BatchNormalization+ReLU MaxPoolWithArgMax Conv2d(128)+BatchNormalization+ReLU Conv2d(128)+BatchNormalization+ReLU MaxPoolWithArgMax Conv2d(256)+BatchNormalization+ReLU Conv2d(256)+BatchNormalization+ReLU Conv2d(256)+BatchNormalization+ReLU MaxPoolWithArgMax Conv2d(512)+BatchNormalization+ReLU Conv2d(512)+BatchNorm ..read more

Visit website

Trying to read items from invoice extractor bot with open ai langchain llm

Data Science Stack Exchange

by Mcore8x

5d ago

Trying out this tutorial for the invoice extraction using open ai llm but the code seems to be working well only with invoices with single item, if there are more than one item in the invoice that seems to be not picking up items from 2nd row in the invoices. This source code is capable to pick up from first item from multiple invoices but how to modify it to pick up all items from a multi-item invoice? Tutorial https://www.analyticsvidhya.com/blog/2023/10/building-invoice-extraction-bot-using-langchain-and-llm/ source code https://mega.nz/file/9ERV1IZI#iMNm_bzFMnssaIv2rAprYD9qhYILLP6R4J7r7rOq ..read more

Visit website

Saving ML models with pickle to be deployed using Flask

Data Science Stack Exchange

by Kehinde Olatunji

5d ago

I trained some ensemble Ml to predict, I needed to save with pickle so as to be able to deploy using Flask. To save with pickle I have tried several methods and read several articles but could not get a clue, when trying to use Linear Regression in flask I got error the LR is not defined. please how do I save the entire Ensemble models with pickle command. And the write path to deploy in Flask to be able to make predictions. Thank you for your help ..read more

Visit website

Multiclass matrix loss function in scikit-learn / xgboost / lightgbm

Data Science Stack Exchange

by Avi T

5d ago

I have data with 4 classes: $c_1, c_2, c_3, c_4$. I'd like to create a classifier which has different scaling for the loss function per class combination: $$ \begin{bmatrix} 0 & l \left( \hat{c}_{1}, {c}_{2} \right) & l \left( \hat{c}_{1}, {c}_{3} \right) & l \left( \hat{c}_{1}, {c}_{4} \right) \\ l \left( \hat{c}_{2}, {c}_{1} \right) & 0 & l \left( \hat{c}_{2}, {c}_{3} \right) & l \left( \hat{c}_{2}, {c}_{4} \right) \\ l \left( \hat{c}_{3}, {c}_{1} \right) & l \left( \hat{c}_{3}, {c}_{2} \right) & 0 & l \left( \hat{c}_{3}, {c}_{4} \right) \\ l \left( \hat{c ..read more

Visit website

How will weights learn in CNN for multi class classification?

Data Science Stack Exchange

by Jai

5d ago

How are the weights of filters in a CNN can learn meaningful features in multiclasses classification if they keep changing as different images are passed through the network during training.Say we are doing multi class classification using CNN and my doubt is say there are 5 classes, and and no of kernels/filters are say 10, so let's say my first image is a pen, and we pass it through the model and kernel weights will be changed right, and then say we pass an image of a book and then the weights will again be changed right? So if the weights of filters keep changing how will it learn anything ..read more

Visit website

Recreating results from Research Paper

Data Science Stack Exchange

by Panos_42

5d ago

so I have been trying to recreate the results from this particular paper (Neural Collaborative Filtering). The dataset I use closely resembles this . I understand that I should my data into train and testing sets. The question I have is whether or not I should create the test.negative file myself or if it is automatically handled by negative sampling inside the code (which basically contains the negative feedback based on the absence of data). I would really appreciate your feedback! Thanks in advance. Here is the official implementation of this paper on github ..read more

Visit website

Elastic Net alpha value using GLMNET 4.1-8

Data Science Stack Exchange

by user162172

5d ago

Is it a valid method to “brute force” the alpha value for an elastic net? What I mean is trying alpha = .1, .2, .3, .4 and so on to 1.0 and looking at the highest R-squared value of each and choosing the corresponding alpha to use for the model going forward? If so, is R2 the best metric to use to determine the best alpha ..read more

Visit website

Improve my f1_score for classification - pandas/sklearn

Data Science Stack Exchange

by user162343

5d ago

I would like advice on how to improve my f1_score for classification. I currently have something around 0.57. Dataset: lotWaferDie - lot, board and chip on which defects were measured string values like W02-D12_11,.. XRel - relative position of the defect in the axis X YRel - relative position of the defect in the Y axis XSize - the size of the defect in the X axis YSize - the size of the defect in the Y axis DefArea - defect area DefSize - the size of the defect dieRow - the row of the defect on the board dieCol - defect column on the board xidx - index of the defect line on the board yidx ..read more

Visit website

Follow Data Science Stack Exchange on FeedSpot