IBM physicists have proposed and experimentally implemented two algorithms on a superconducting quantum processor involving the detection of patterns in large amounts of data (doi.org/10.1038/s41586-019-0980-2). The results demonstrated that very high success rates were achievable even in the presence of noise. Researchers from the Massachusetts Institute of Technology and Oxford University also participated in the study.

Quantum computing has drawn considerable attention in recent years as a means of implementing machine learning beyond the capabilities of classical computers. A number of algorithms have been proposed to run on quantum devices, with some addressing a common problem for machine learning: the analysing a large amount of data using kernel methods to determine similarities between patterns.

A classical approach to kernel methods involves the construction of an approximate labeling function using so- called support vector machines (SVMs). However, this method encounters limitations when the feature space becomes large and the kernel functions become computationally expensive to estimate, obstacles that a quantum processor could potentially overcome.

## Two Qubits of Five-Qubit Quantum Computer Used for Demonstration

IBM research staff member Antonio Córcoles, who participated in the study, explained why the research team used only two out of the five superconducting qubits of IBM’s smaller public quantum computer: “The feature space in our demonstration was reached by using two qubits and has 15 dimensions. This is a critical distinction for our method, as we want to use a space of density matrices, or qubit states, for our feature maps.

“This space has a dimensionality 4^{n}-1, where n is the number of qubits. We wanted to demonstrate the algorithm in real quantum hardware with good classification success while, at the same time, showcase the current capabilities of quantum processors. The algorithm presented in the paper does not become more complicated in terms of execution resources as the dimensions become larger, but the amount of noise does grow in a way that powerful error correction or mitigation techniques become essential to the success of the algorithm.”

## Two Algorithms Addressed Classifier Construction

The two algorithms run by the researchers solve a supervised learning problem: the construction of a classifier. Both exploit the exponentially large quantum state space as feature space, a key element of quantum advantage, using controllable entanglement and interference.

One method, a quantum variational classifier, used a variational quantum circuit to classify the data in a way similar to the method of conventional SVMs. The second, a quantum kernel estimator, estimated the kernel function on the quantum computer and optimized a classical SVM. The experiment was split into two phases, training the classifier and optimization.

“Both methods use both the quantum processor and a classical processor,” Córcoles said. “In the first case, the quantum processor executes the parameterized circuit and yields the value of the cost function, based on which, a classical machine determines the new set of parameters for the quantum circuit until reaching convergence. Once the training has finished and an optimal set of parameters has been obtained, the quantum processor is used to run the parameterized circuit for each data point to classify with such ideal parameters.

“In the second method, the quantum processor is also used twice. First, it is used during the training phase, to estimate the kernel matrix, or overlap, between each pair of data points in the feature space. This would be the hard part to calculate when inner products in the feature space become classically intractable.”

## Classification Success Almost 100% with Both Algorithms

Córcoles continued: “Once one has the kernel matrix, obtaining the support vectors can be easily done with a classical computer. This is a convex problem and it’s easy to solve by classical means. The dimensionality of the kernel matrix is independent of the dimensionality of the feature space and depends only on the number of data points in the classification set.

“The second time the quantum processor is used is for the classification phase, by estimating the inner product of each data point to classify with each of the previously obtained support vectors in the feature space. This operation directly provides the label for each data point.”

For the variational algorithm, the training consisted of 2000 shots followed by 20,000 shots for the classification phase. For the quantum kernel algorithm, the team used 50,000 shots for training and testing. They noted an increase in classification success with increasing circuit depth, reaching values very close to 100%, with no substantial difference in the performance of either algorithm.

## Algorithms Made Available on Qiskit Codebase

The researchers used different types of error mitigation techniques, one of which is the subject of a second recent Nature paper. These methods apply to expectation values of quantum observables and cannot be used to correct outputs of a quantum processor on a shot-by-shot basis.

“We have demonstrated that quantum computers can boost machine learning by exploiting their computational spaces,” Córcoles noted. “We have done that with a very specific example for a feature map. The interesting question following this work is what other feature maps could be used for this purpose and how to find them.

“For that opportunity, we made the code to run these algorithms available for everyone, as part of our Qiskit open source software codebase (https://github.com/Qiskit/qiskit-tutorials-community/tree/master/ artificial_intelligence). This code can be used by Qiskit users to not only to run similar classification problems either in simulation or in real quantum hardware, but also to explore new paths towards novel feature maps and ways to exploit quantum for supervised learning.”