Summary
-------
Cube for news storage and analysis
This cube provides an implementation of Semnews:
- store news articles and tweets.
- extract and synthetize information.
- provide semantic useful and original visualisation.
- analytics tools and datamining/machine learning processings.
Installation
------------
Creation of the instance:
* Create an instance using: cubicweb-ctl create semnews <name-of-instance>
* Create the instance's database using: cubicweb-ctl db-create <name-of-instance>
Add articles sources
--------------------
Source of articles could be created using:
* Blogs/RSS feeds::
session.create_entity('CWSource', name=<name of the source>, type=u'datafeed',
parser=u'rss-parser', lang=<lang of the source>,
url=<url of the blog/rss feed>,
config=u'synchronization-interval=120min')
* Tweet::
session.create_entity('CWSource', name=<name of the source>, type=u'datafeed',
parser=u'tweet-parser', lang=<lang of the source>,
url=<url of the blog/rss feed>,
config=u'synchronization-interval=120min')
The synchronization interval could be setted to a more specific value, or setted to "no" for manual synchronization
only.
Semnews comes with some predifined blogs/tweets/rss feeds:
* Some french political blogs. You can add them using::
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_blogs_fr.py
* Some international english newspapers. You can add them using::
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_newspapers.py
* Some french newspapers. You can add them using::
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_newspapers_fr.py
* Some french politician tweets. You can add them using::
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_twitters_fr.py
Add Named Entities sources
--------------------------
Semnews is based on a named entities process, that you have to define::
session.create_entity('NerProcess', name=<name of process>, host=<appid or sparql endpoint url>,
type=<rql or sparql>, lang=<optional lang of the ner source>,
request=<request to be performed>)
See the document of the NER cube for more details.
Example of source::
session.create_entity('NerProcess', name=u'dbpedia38-en', host=u'ner',
type=u'rql', lang=u'en',
request=u'Any U WHERE X label %(token)s, X cwuri U, '
'X ner_source NS, NS name "dbpedia38-en"')
Commands
--------
Semnews provide to commands:
* A command to extract named entities from articles::
cubicweb-ctl process-ner <name-of-instance>
* A command to cleanup recognized entities according to some Dbpedia categories
(see entities/external_resources.py)::
cubicweb-ctl cleanup-ner <name-of-instance>