# DeepNeighbor
<br />
<p align="center">
<a href="https://github.com/othneildrew/Best-README-Template">
<img src="deepneighbor_logo.png" alt="Logo" width="120" height="120">
</a>
<p align="center">
Embedding-based Retrieval for ANN Search and Recommendations!
<br />
<a href="https://colab.research.google.com/drive/1j6uWt_YYyHBQDK7EN3f5GTTZTmNn2Xc5?usp=sharing">View Demo</a>
·
<a href="https://github.com/Lou1sWang/deepneighbor/issues">Report Bug</a>
·
<a href="https://github.com/Lou1sWang/deepneighbor/issues">Request Feature</a>
</p>
</p>
[](https://pypi.org/project/deepneighbor)
[](https://pypi.org/project/deepneighbor)
[](https://github.com/LouisBIGDATA/deepneighbor)

[](https://github.com/Lou1sWang/deepneighbor/)
[](https://pepy.tech/project/deepneighbor)
[](https://github.com/Lou1sWang/deepneighbor/issues)
[](https://GitHub.com/Lou1sWang/deepneighbor/graphs/commit-activity)
[](louiswang524@gmail.com)
[](https://www.python.org/)
---
DeepNeighbor is a **High-level**,**Flexible** and **Extendible** package for embedding-based information retrieval from user-item interaction logs. Just as the name suggested, **'deep'** means deep learning models to get user/item embeddings, while **'neighbor'** means approximate nearest neighbor search in the embedding space.<br>
It mainly has two parts : Embed step and Search step by the following codes:<br>
<br>`model = Embed(data_path); model.train()`,which generates embeddings for users and items (Deep),
<br> `model.search()`, which looks for Approximate nearest neighbor for seed user/item (Neighbor) .
<br>
### Install
```python
pip install deepneighbor
```
### How To Use
```python
from deepneighbor import Embed
model = Embed(data,model='gat')
model.train()
model.search(seed = 'Louis', k=10)
```
### Input format
The input data for the **Embed()** should be a (*.csv or *.txt ) file path (e.g. '\data\data.csv')with two columns in order: 'user' and 'item'. For each user, the item are recommended to be ordered by time.
### Models & parameters in Embed()
- [x] Word2Vec `w2v`
- [ ] Factorization Machines `fm`
- [ ] Deep Semantic Similarity Model
- [ ] Siamese Network with triple loss
- [ ] Deepwalk
- [ ] Graph convolutional network
- [x] Neural Graph Collaborative Filtering algorithm `ngcf`
- [ ] Matrix factorization `mf`
- [x] Graph attention network `gat`
### Model Parameters
#### deepwalk
```python
model = Embed(data, model = 'deepwalk')
model.train(window_size=5,
workers=1,
iter=1
dimensions=128)
```
- ```window_size``` Skip-gram window size.
- ```workers```Use these many worker threads to train the model (=faster training with multicore machines).
- ```iter``` Number of iterations (epochs) over the corpus.
- ```dimensions``` Dimensions for the node embeddings
#### graph attention network
```python
model = Embed(data, model = 'gat')
model.train(window_size=5,
learning_rate=0.01,
epochs = 10,
dimensions = 128,
num_of_walks=80,
beta=0.5,
gamma=0.5,)
```
- ```window_size``` Skip-gram window size.
- ```learning_rate``` learning rate for optimizing graph attention network
- ```epochs``` Number of gradient descent iterations.
- ```dimensions``` Dimensions for the embeddings for each node (user/item)
- ```num_of_walks```Number of random walks.
- ```beta``` and ```gamma```Regularization parameter.
### How To Search
#### ```model.search(seed, k)```
- ```seed``` The Driver for the algorithms
- ```k``` Number of Nearest Neighbors.
### Examples
Open [Colab](https://colab.research.google.com/drive/1j6uWt_YYyHBQDK7EN3f5GTTZTmNn2Xc5?usp=sharing) to run the example with facebook data.
### Contact
Please contact louiswang524@gmail.com for collaboration or providing feedbacks.
### License
This project is under MIT License, please see [here](LICENSE) for details.