The basic idea behind SVM is to find a hyperplane that separates different classes of data as accurately as possible. A hyperplane is a decision boundary that separates the data into different classes. The key idea behind SVM is to find the hyperplane that maximizes the margin, which is the distance between the decision boundary and the closest data points from each class.

The SVM algorithm works by transforming the data into a high-dimensional space where the data can be easily separated by a hyperplane. This is done by mapping the data into a new feature space through a kernel function. The kernel function is a mathematical function that helps to transform the data into a new space where it can be separated more easily.

There are several different types of kernel functions that can be used with SVM, including linear, polynomial, and radial basis function (RBF) kernels. Each kernel function has its own advantages and disadvantages, and the choice of kernel function depends on the type of data and the problem that needs to be solved.

One of the main advantages of SVM is its ability to handle non-linearly separable data. Unlike other algorithms such as logistic regression, SVM can handle data that is not linearly separable by using a non-linear kernel function. This makes SVM a powerful tool for solving complex classification problems.

Another advantage of SVM is its ability to handle high-dimensional data. SVM can handle large amounts of data with many features, making it a useful tool for solving problems with high-dimensional data.

SVM has a few disadvantages as well. One of the main disadvantages is that it can be computationally expensive when working with large datasets. It also requires a good understanding of the data and the problem that needs to be solved in order to choose the appropriate kernel function.

In conclusion, Support Vector Machine is a powerful and versatile machine learning algorithm that can be used for both classification and regression tasks. Its ability to handle non-linearly separable data and high-dimensional data makes it a useful tool for solving complex problems. However, it can be computationally expensive and require a good understanding of the data and the problem to be solved.

One of the main advantages of Chatterjee’s Correlation Coefficient over Pearson Correlation Coefficient is its ability to handle data with outliers and skewness. Unlike Pearson Correlation Coefficient, which assumes that the data is normally distributed, Chatterjee’s Correlation Coefficient can be used on any type of data distribution. This makes it a more robust measure of correlation and a better choice for data sets that do not meet the assumptions of normality.

To calculate Chatterjee’s Correlation Coefficient, we first need to calculate the sample means of both variables, as well as the sample standard deviations. Next, we calculate the product of the standard deviations, which is then divided by the sum of the squares of the standard deviations. Finally, we multiply this value by the sample correlation coefficient.

The resulting value ranges from -1 to 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation.

One limitation of Chatterjee’s Correlation Coefficient is that it does not account for non-linear relationships between variables. In cases where the relationship between variables is not linear, other measures such as the Spearman Correlation Coefficient or Kendall’s Tau should be used.

In conclusion, Chatterjee’s Correlation Coefficient is a useful statistical measure for evaluating the linear relationship between two variables. It is particularly useful when dealing with data sets that have outliers and skewness, and can be used as an alternative to the commonly used Pearson Correlation Coefficient. However, it is important to keep in mind that it does not account for non-linear relationships and other measures may be more appropriate in those cases.

In this video, I’ll show you how you can estimate Non-linear Correlation using Chatterjee’s Correlation Coefficient.

I hope you all like it 🙂

Link to my Profile

Link to the Official Tweet

There is a common misconception that R-squared cannot be negative.

In this video, I’ll explain that when the predictions of the linear regression model are worse than the simple mean of the target variable then R-squared values are negative.

I hope you all like it 🙂

That is, transforming text into a meaningful vector (or array) of numbers. The standard way of doing this is to use a bag of words approach.

CountVectorizer & TfidfVectorizer are the 2 ways in which text can be converted to numbers!

In this video, I’ll try to explain the impact of changing Term Frequency and Inverse Document Frequency on the overall vector generated.

I hope you all like it 🙂

While playing around with a big text dataset, I discovered an amazing library called FlashText.

FlashText is a library faster than Regular Expressions for NLP tasks for bigger datasets which reduces days of replacement computation time to minutes. If you deal with this kind of problem of text cleaning & modification then, I would really suggest you try the library!

In this video, I’ll show you how you can leverage the power of FlashText.

I hope you all like it 🙂

Well if you know a bit of Python then you don’t have to rely on any external source!

PyTube is a very serious, lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

In this video, I’ll show you how you can easily download YouTube videos using PyTube & also view the video in Google Colab!

I hope you all like it 🙂

So is my new video only gonna explain how the constraint surface for lasso regression is pointy, but the one for ridge regression is round?

In this video, I’ll show you that if you have high multicollinearity in your features, then by applying Lasso Regression you can shrink the coefficients of some of the unwanted features to 0 thus eliminating multicollinearity.

I hope you all like it 🙂

AR(1) is the first order auto-regression meaning that the current value is based on the immediately preceding value.

So, In this video, I’ll implement the Auto-regressive (AR(1)) model from scratch in Python.