from research, found 3 conflicting results:
can explain when use linearsvc
vs. svc(kernel="linear")
?
it seems linearsvc marginally better svc , more finicky. if scikit
decided spend time on implementing specific case linear classification, why wouldn't linearsvc
outperform svc
?
mathematically, optimizing svm convex optimization problem, unique minimizer. means there 1 solution mathematical optimization problem.
the differences in results come several aspects: svc
, linearsvc
supposed optimize same problem, in fact liblinear
estimators penalize intercept, whereas libsvm
ones don't (iirc). leads different mathematical optimization problem , different results. there may other subtle differences such scaling , default loss function (edit: make sure set loss='hinge'
in linearsvc
). next, in multiclass classification, liblinear
one-vs-rest default whereas libsvm
one-vs-one.
sgdclassifier(loss='hinge')
different other 2 in sense uses stochastic gradient descent , not exact gradient descent , may not converge same solution. obtained solution may generalize better.
between svc
, linearsvc
, 1 important decision criterion linearsvc
tends faster converge larger number of samples is. due fact linear kernel special case, optimized in liblinear, not in libsvm.
Comments
Post a Comment