Random feature extraction

Randomization is a powerful tool for handling optimization problems that would otherwise be considered intractable (e.g. in the context of Montecarlo simulations [1]). In recent years, there has been a renewed interest in the machine learning community for the idea of 'random feature extraction', i.e. the possibility of projecting the original input of the problem to an highly dimensional feature space, the parameters of this mapping being assigned stochastically.

Under many aspects, this can be considered suboptimal, since it does not take into account any information on the problem at hand. However, it allows to formulate the resulting learning problem as a standard linear regression. This, in turn, brings a series of advantages that make it suitable in many real world contexts:

1. The resulting learning algorithm is extremely easy to implement, even in hardware.
2. Standard linear algebra routines can solve linear problem with several millions of elements.
3. It is easy to scale the resulting algorithm.
4. A huge number of algorithms (relative to linear regression) that extend the basic framework can be applied straighforwardly.

Example 1: Random Vector Functional-Link (RVFL) Networks

An RVFL network with a single output is described by a linear combination of $B$ non-linear transformations of its input [2,3]:

where the parameters $\boldsymbol{\theta}_i$ are assigned at the beginning of the learning process from a predefined probability distribution. A schematic depiction of this is provided below, where the trainable connections are shown with solid lines:

Depiction of an RVFL network with two inputs, three hidden functions, and one output.

Clearly, the model is linear in the set of free parameters $\boldsymbol{\beta} = \left[ \beta_1, \ldots, \beta_B \right]$. Hence, the overall learning problem can be formulated as a least-square linear regression. Recently, this class of networks has been popularized under the name of Extreme Learning Machine [4,5].

Example 2: Echo State Networks (ESN)

Nothing prevents the previous hidden layer from having recurrent connections, i.e. connections between the hidden neurons, or between the output and the hidden neurons, and so on. This gives rise to a recurrent network, which is able to process efficiently dynamic patterns. In fact, the idea of randomly generating the recurrent part of a network has been independently proposed in at least three different forms:

1. The Echo State Network [6].
2. The Liquid State Machine [7].
3. The Backpropagation-Decorrelation algorithm [8].

The general idea is shown in the image below, where as before trainable connections are shown using fixed lines.

Schema of an ESN.

It is easy to show that, also in this case, the final learning problem can be formulated in term of a linear regression. This is a striking advantage compared to classical approaches that try to adapt also the recurrent portion of the network, e.g. backpropagation through time.

Research at the ISPAMM Lab

Research on random feature extraction at the ISPAMM laboratory currently revolves around the following themes:

1. Distributed learning using RVFL and ESN networks [9].
2. Semi-supervised approaches using RVFL networks.
3. Music (or audio) classification [10].

References

[1] Andrieu, C., De Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine learning, 50(1-2), 5-43.
[2] Pao, Y. H., Park, G. H., & Sobajic, D. J. (1994). Learning and generalization characteristics of the random vector functional-link net. Neurocomputing, 6(2), 163-180.
[3] Igelnik, B., & Pao, Y. H. (1995). Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Transactions on Neural Networks, 6(6), 1320-1329.
[4] Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: theory and applications. Neurocomputing, 70(1), 489-501.
[5] Scardapane, S., Comminiello, D., Scarpiniti, M. & Uncini, A. (2015). Online Sequential Extreme Learning Machine with Kernels. IEEE Transactions on Neural Networks and Learning Systems (under press).
[6] Jaeger, H. (2001). The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148, 34.
[7] Maass, W., Natschläger, T., & Markram, H. (2002). Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural computation, 14(11), 2531-2560.
[8] Steil, J. J. (2006). Online stability of backpropagation–decorrelation recurrent learning. Neurocomputing, 69(7), 642-650.
[9] Scardapane, S., Wang, D., Panella, M., & Uncini, A. (2015). Distributed Learning for Random Vector Functional-Link Networks. Information Sciences. 301, 271-284.
[10] Scardapane, S., Comminiello, D., Scarpiniti, M., & Uncini, A. (2013, September). Music classification using extreme learning machines. In 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA), (pp. 377-381). IEEE.