معرفی شرکت ها


cf-step-0.2.3


Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر
Card image cap
تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

Incremental collaborative filtering algorithms for recommender systems
ویژگی مقدار
سیستم عامل -
نام فایل cf-step-0.2.3
نام cf-step
نسخه کتابخانه 0.2.3
نگهدارنده []
ایمیل نگهدارنده []
نویسنده Dimitris Poulopoulos
ایمیل نویسنده dimitris.a.poulopoulos@gmail.com
آدرس صفحه اصلی https://github.com/dpoulopoulos/cf_step
آدرس اینترنتی https://pypi.org/project/cf-step/
مجوز Apache Software License 2.0
# CF STEP - Incremental Collaborative Filtering > Incremental learning for recommender systems CF STEP is an open-source library, written in python, that enables fast implementation of incremental learning recommender systems. The library is a by-product of the research project [CloudDBAppliance](https://clouddb.eu/). ## Install Run `pip install cf-step` to install the library in your environment. ## How to use For this example, we will use the popular [movielens](https://grouplens.org/datasets/movielens/) dataset. The dataset has collected and made available rating data sets from the [MovieLens](http://movielens.org) web site. The data sets were collected over various periods of time, depending on the size of the set. First let us load the data in a pandas `DataFrame`. We assume that the reader has downloaded the 1m movielense dataset and have unziped it in the `/tmp` folder. > To avoid creating a user and movie vocabularies we turn each user and movie to a categorical feature and use the pandas convenient cat attribute to get the codes ```python # local # load the data col_names = ['user_id', 'movie_id', 'rating', 'timestamp'] ratings_df = pd.read_csv('/tmp/ratings.dat', delimiter='::', names=col_names, engine='python') # transform users and movies to categorical features ratings_df['user_id'] = ratings_df['user_id'].astype('category') ratings_df['movie_id'] = ratings_df['movie_id'].astype('category') # use the codes to avoid creating separate vocabularies ratings_df['user_code'] = ratings_df['user_id'].cat.codes.astype(int) ratings_df['movie_code'] = ratings_df['movie_id'].cat.codes.astype(int) ratings_df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>user_id</th> <th>movie_id</th> <th>rating</th> <th>timestamp</th> <th>user_code</th> <th>movie_code</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1</td> <td>1193</td> <td>5</td> <td>978300760</td> <td>0</td> <td>1104</td> </tr> <tr> <th>1</th> <td>1</td> <td>661</td> <td>3</td> <td>978302109</td> <td>0</td> <td>639</td> </tr> <tr> <th>2</th> <td>1</td> <td>914</td> <td>3</td> <td>978301968</td> <td>0</td> <td>853</td> </tr> <tr> <th>3</th> <td>1</td> <td>3408</td> <td>4</td> <td>978300275</td> <td>0</td> <td>3177</td> </tr> <tr> <th>4</th> <td>1</td> <td>2355</td> <td>5</td> <td>978824291</td> <td>0</td> <td>2162</td> </tr> </tbody> </table> </div> Using the codes we can see how many users and movies are in the dataset. ```python # local n_users = ratings_df['user_code'].max() + 1 n_movies = ratings_df['movie_code'].max() + 1 print(f'There are {n_users} unique users and {n_movies} unique movies in the movielens dataset.') ``` There are 6040 unique users and 3706 unique movies in the movielens dataset. We will sort the data by Timestamp so as to simulate streaming events. ```python # local data_df = ratings_df.sort_values(by='timestamp') ``` The `Step` model supports only positive feedback. Thus, we will consider a rating of 5 as positive feedback and discard any other. We want to identify likes with `1` and dislikes with `0`. ```python # local # more than 4 -> 1, less than 5 -> 0 data_df['preference'] = np.where(data_df['rating'] > 4, 1, 0) # keep only ones and discard the others data_df_cleaned = data_df.loc[data_df['preference'] == 1] data_df_cleaned.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>user_id</th> <th>movie_id</th> <th>rating</th> <th>timestamp</th> <th>user_code</th> <th>movie_code</th> <th>preference</th> </tr> </thead> <tbody> <tr> <th>999873</th> <td>6040</td> <td>593</td> <td>5</td> <td>956703954</td> <td>6039</td> <td>579</td> <td>1</td> </tr> <tr> <th>1000192</th> <td>6040</td> <td>2019</td> <td>5</td> <td>956703977</td> <td>6039</td> <td>1839</td> <td>1</td> </tr> <tr> <th>999920</th> <td>6040</td> <td>213</td> <td>5</td> <td>956704056</td> <td>6039</td> <td>207</td> <td>1</td> </tr> <tr> <th>999967</th> <td>6040</td> <td>3111</td> <td>5</td> <td>956704056</td> <td>6039</td> <td>2895</td> <td>1</td> </tr> <tr> <th>999971</th> <td>6040</td> <td>2503</td> <td>5</td> <td>956704191</td> <td>6039</td> <td>2309</td> <td>1</td> </tr> </tbody> </table> </div> Following, let us initialize our model. ```python # local net = SimpleCF(n_users, n_movies, factors=128, mean=0., std=.1) objective = lambda pred, targ: targ - pred optimizer = SGD(net.parameters(), lr=0.06) device = 'cuda' if torch.cuda.is_available() else 'cpu' model = Step(net, objective, optimizer, device=device) ``` Finally, let us get 20% of the data to fit the model for bootstrapping and create the Pytorch Dataset that we will use. ```python # local pct = int(data_df_cleaned.shape[0] * .2) bootstrapping_data = data_df_cleaned[:pct] ``` We will create a dataset from our Dataframe. We extract four elements: * The user code * The movie code * The rating * The preference ```python # local features = ['user_code', 'movie_code', 'rating'] target = ['preference'] data_set = TensorDataset(torch.tensor(bootstrapping_data[features].values), torch.tensor(bootstrapping_data[target].values)) ``` Create the Pytorch DataLoader that we will use. Batch size should always be `1` for online training. ```python # local data_loader = DataLoader(data_set, batch_size=512, shuffle=False) ``` Let us now use the *batch_fit()* method of the *Step* trainer to bootstrap our model. ```python # local model.batch_fit(data_loader) ``` 100%|██████████| 89/89 [00:01<00:00, 81.00it/s] Then, to simulate streaming we get the remaining data and create a different data set. ```python # local data_df_step = data_df_cleaned.drop(bootstrapping_data.index) data_df_step = data_df_step.reset_index(drop=True) data_df_step.head() # create the DataLoader stream_data_set = TensorDataset(torch.tensor(data_df_step[features].values), torch.tensor(data_df_step[target].values)) stream_data_loader = DataLoader(stream_data_set, batch_size=1, shuffle=False) ``` Simulate the stream... ```python # local k = 10 # we keep only the top 10 recommendations recalls = [] known_users = [] with tqdm(total=len(stream_data_loader)) as pbar: for idx, (features, preferences) in enumerate(stream_data_loader): itr = idx + 1 user = features[:, 0] item = features[:, 1] rtng = features[:, 2] pref = preferences if user.item() in known_users: predictions = model.predict(user, k) recall = recall_at_k(predictions.tolist(), item.tolist(), k) recalls.append(recall) model.step(user, item, rtng, pref) else: model.step(user, item, rtng, pref) known_users.append(user.item()) pbar.update(1) ``` 100%|██████████| 181048/181048 [15:23<00:00, 195.94it/s] Last but not least, we visualize the results of the recall@10 metric, using a moving average window of 5k elements. ```python # local avgs = moving_avg(recalls, 5000) plt.title('Recall@10') plt.xlabel('Iterations') plt.ylabel('Metric') plt.ylim(0., .1) plt.plot(avgs) plt.show() ``` ![png](docs/images/output_27_0.png) Finally, save the model's weights. ```python # local model.save(os.path.join('artefacts', 'positive_step.pt')) ``` ## References 1. Vinagre, J., Jorge, A. M., & Gama, J. (2014, July). Fast incremental matrix factorization for recommendation with positive-only feedback. In International Conference on User Modeling, Adaptation, and Personalization (pp. 459-470). Springer, Cham. 2. Hu, Y., Koren, Y., & Volinsky, C. (2008, December). Collaborative filtering for implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining (pp. 263-272). Ieee.


نیازمندی

مقدار نام
>=1.3.0 torch
- tqdm
- pandas
- matplotlib


زبان مورد نیاز

مقدار نام
>=3.6 Python


نحوه نصب


نصب پکیج whl cf-step-0.2.3:

    pip install cf-step-0.2.3.whl


نصب پکیج tar.gz cf-step-0.2.3:

    pip install cf-step-0.2.3.tar.gz