首页 > 代码库 > Experiments on the NYC dataset(updated 3rd,Aug)

Experiments on the NYC dataset(updated 3rd,Aug)

Experiments on the NYC datasets,

here is the dataset link: https://sites.google.com/site/yangdingqi/home/foursquare-dataset

Forgive me being lazy and uploading a manuscript photo about the preprocessing of the data:

技术分享

The codes are available on the github, here is the link:
Binary Tests

Take into each user‘s check in time

 

And This is the result I run the code on cluster:

unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 1.0
max num in train 1.0
I am beginning to model
model has been fitted
this is the binary model
Time used: 4.789567
Train_auc is 0.999504
Test_aus is 0.654491
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)




unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 257
max num in train 205
I am beginning to model
model has been fitted
this is the model that consider the checkin times
Time used: 4.782983
Train_auc is 0.999508
Test_aus is 0.655189
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

As for the hybrid model, I have nort tried it yet, TBC.....

Experiments on the NYC dataset(updated 3rd,Aug)