----
Introduces **BoxSERS**, a complete and ready-to-use python library for the application of data augmentation, dimensional reduction, spectral correction, machine learning and other methods specially designed and adapted for vibrational spectra(Raman,FTIR, SERS, etc.).
## Table of contents
* [BoxSERS Installation](#boxsers-installation)
* [Requirements](#requirements)
* [Included Features](#included-features)
* [Module misc_tools](#module-boxsersmisc_tools)
* [Module visual_tools](#module-boxsersvisual_tools)
* [Module preprocessing](#module-boxserspreprocessing)
* [Module data augmentation](#module-boxsersdata_augmentation)
* [Dimensional Reduction](#dimensional-reduction)
* [Unsupervised Machine Learning](#unsupervised-machine-learning)
* [Supervised Machine Learning](#supervised-machine-learning)
* [Module validation_metrics](#module-validation_metrics)
* [License](#license)
## BoxSERS Installation
From PypY
```bash
pip install boxsers
```
From Github
```bash
pip install git+https://github.com/ALebrun-108/BoxSERS.git
```
### Requirements
Listed below are the main modules needed to operate the codes:
* Sklearn
* Scipy
* Numpy
* Pandas
* Matplotlib
* Tensor flow (GPU or CPU)
Labels associated to spectra can be in one of the following three forms:
| Label Type | Examples |
| ------------- | ------------------------------------ |
| Text | Cholic, Deoxycholic, Lithocholic, ...|
| Integer | 0, 3, 1 , ... |
| Binary | [1 0 0 0], [0 0 0 1], [0 1 0 0], ... |
## Included Features
___
### Module ``boxsers.misc_tools``
This module provides functions for a variety of utilities.
* **data_split :** Randomly splits an initial set of spectra into two new subsets named in this
function: subset A and subset B.
* **load_rruff :** Export a subset of Raman spectra from the RRUFF database in the form of three related lists
containing Raman shifts, intensities and mineral names.
### Module ``boxsers.visual_tools``
This module provides different tools to visualize vibrational spectra quickly.
* **spectro_plot :** Returns a plot with the selected spectrum(s)
* **random_plot :** Plot a number of randomly selected spectra from a set of spectra.
* **distribution_plot :** Return a bar plot that represents the distributions of spectra for each classes in
a given set of spectra
```python
# Code example:
from boxsers.misc_tools import data_split
from boxsers.visual_tools import spectro_plot, random_plot, distribution_plot
wn = 3
spec =5
# randomly splits the spectra(spec) and the labels(lab) into test and training subsets.
(spec_train, spec_test, lab_train, lab_test) = data_split(wn, spec , b_size=0.4)
# resulting train|test set proportions = 0.6|0.4
# plots the classes distribution within the training set.
distribution_plot(lab_train, title='Train set distribution')
# spectra array = spec, raman shift column = wn
random_plot(wn, spec, random_spectra=4) # plots 4 randomly selected spectra
spectro_plot(wn, spec[0], spec[2]) # plots first and third spectra
```
### Module ``boxsers.preprocessing``
This module provides multiple functions to preprocess vibrational spectra. These features
improve spectrum quality and can improve performance for machine learning applications.
* **baseline_substraction** : Subtracts the baseline signal from the spectrum(s) using
Asymmetric Least Squares estimation.
* **intensity_normalization** : Normalizes the spectrum(s) using one of the available norms in this function.
* **savgol_smoothing** : Smoothes the spectrum(s) using a Savitzky-Golay polynomial filter.
* **spectral_cut** : Subtracts or sets to zero a delimited spectral region of the spectrum(s)
* **spline_interpolation** : Performs a one-dimensional interpolation spline on the spectra to reproduce
them with a new x-axis.
```python
# Code example:
import numpy as np
from boxsers.preprocessing import baseline_subtraction, spectral_cut, intensity_normalization, spline_interpolation
# interpolates with splines the spectra and converts them to a new raman shift range(new_wn)
new_wn = np.linspace(500, 3000, 1000)
spec_cor = spline_interpolation(spec, wn, new_wn)
# removes the baseline signal measured with the als method
(spec_cor, baseline) = baseline_subtraction(spec, lam=1e4, p=0.001, niter=10)
# normalizes each spectrum individually so that the maximum value equals one and the minimum value zero
spec_cor = intensity_normalization(spec)
# removes part of the spectra delimited by the Raman shift values wn_start and wn_end
spec_cor, wn_cor = spectral_cut(spec, wn, wn_start, wn_end)
```
### Module ``boxsers.data_augmentation``
This module provides several data augmentation methods that generate new spectra by adding
different variations to existing spectra.
* **aug_mixup** : Randomly generates new spectra by mixing together several spectra with a Dirichlet
probability distribution.
* **aug_noise** : Randomly generates new spectra with Gaussian noise added.
* **aug_multiplier** : Randomly generates new spectra with multiplicative factors applied.
* **aug_ioffset** : Randomly generates new spectra shifted in intensity.
* **aug_xshift** : Randomly generates new spectra shifted in wavelength.
* **aug_linslope** : Randomly generates new spectra with additional linear slopes
```python
# Code example:
from boxsers.data_augmentation import aug_mixup, aug_noise
spectra_nse, label_nse = aug_noise(spec, lab, snr=10)
spectra_mult, label_mult = aug_multiplier(spectra, labels, 0.15,)
spectro_plot(wn, spec, spec_nse, spec_mult_sup, spec_mult_inf, legend=legend)
spec_nse, lab_nse = SpectroDataAug.aug_noise(spec, lab, param_nse, quantity=2, mode='random')
spec_mul, lab_mul = SpectroDataAug.aug_multiplier(spec, lab, mult_lim, quantity=2, mode='random')
# stacks all generated spectra and originals in a single array
spec_aug = np.vstack((x, spec_nse, spec_mul))
lab_aug = np.vstack((lab, lab_nse, lab_mul))
# spectra and labels are randomly mixed
x_aug, y_aug = shuffle(x_aug, y_aug)
```
### Module ``boxsers.dimension_reduction``
This module provides different techniques to perform dimensionality reduction of
vibrational spectra.
* **SpectroPCA:** Returns a plot with the selected spectrum(s)
* **SpectroPCA** : Plot a number of randomly selected spectra from a set of spectra.
* **distribution_plot** : Return a bar plot that represents the distributions of spectra for each classes in
a given set of spectra
### Dimensional Reduction
```python
# Code example:
from boxsers.dimension_reduction import SpectroPCA, SpectroFA, SpectroICA
pca_model = SpectroPCA(n_comp=50)
pca_model.fit_model(spec_train)
pca_model.scatter_plot(spec_test, spec_test, targets=classnames, component_x=1, component_y=2)
pca_model.component_plot(wn, component=2)
spec_pca = pca_model.transform_spectra(spec_test)
```
### Unsupervised Machine Learning
```python
# Code example:
from boxsers.machine_learning import SpectroGmixture, SpectroKmeans
kmeans_model = SpectroKmeans(n_cluster=5)
kmeans_model.fit_model(spec_train)
kmeans_model.scatter_plot(spec_test)
```
### Supervised Machine Learning
* Convolutional Neural Networt (3 x Convolutional layer 1D , 2 x Dense layer)
```python
from boxsers.pca_model import SpectroPCA, SpectroFA, SpectroICA
pca_model = SpectroICA(n_comp=50)
pca_model.fit_model(x_train)
pca_model.scatter_plot(x_test, y_test, targets=classnames, comp_x=1, comp_y=2)
pca_model.pca_component(Wn, 2)
x_pca = pca_model.transform_spectra(x_train)
```
### Module ``validation_metrics``
This module provides different tools to evaluate the quality of a model’s predictions.
* **cf_matrix** : Returns a confusion matrix (built with scikit-learn) generated on a given set of spectra.
* **clf_report** : Returns a classification report generated from a given set of spectra