activity-classifier-extractor
=============================
Generates activity classifications from low-level feature inputs in support
of analytic workflows within the `ContentAI Platform <https://www.contentai.io>`__,
published as the extractor
``dsai_activity_classifier``.
1. `Getting Started <#getting-started>`__
2. `Execution <#execution-and-deployment>`__
3. `Creating Models <#creating-models>`__
4. `Testing <#testing>`__
5. `Future Development <#future-development>`__
6. `Changes <CHANGES.md>`__
Getting Started
===============
| This library is used as a `single-run executable <#contentai-standalone>`__.
| Runtime parameters can be passed for processing that configure the
returned results and can be examined in more detail in the
`main <main.py>`__ script.
- ``verbose`` - *(bool)* - verbose input/output configuration printing (*default=false*)
- ``path_content`` - *(str)* - input video path for files to label (*default=video.mp4*)
- ``path_result`` - *(str)* - output path for samples (*default=.*)
- ``path_models`` - *(str)* - manifest path for model information (*default=data/models/manifest.json*)
- ``time_interval`` - *(float)* - time interval for predictions from models (*default=3.0*)
- ``average_predictions`` - *(bool)* - flatten predictions across time and class (*default=false*)
- ``round_decimals`` - *(int)* - rounding decimals for predictions (*default=5*)
- ``score_min`` - *(float)* - apply a minimum score threshold for classes (*default=0.1*)
dependencies
------------
| To install package dependencies in a fresh system, the recommended
technique is a set of
| vanilla pip packages. The latest requirements should be validated from
the ``requirements.txt`` file but at time of writing, they were the
following.
.. code:: shell
pip install --no-cache-dir -r requirements.txt
Execution and Deployment
========================
This package is meant to be run as a one-off processing tool that
aggregates the insights of other extractors.
command-line standalone
-----------------------
Run the code as if it is an extractor. In this mode, configure a few
environment variables to let the code know where to look for content.
One can also run the command-line with a single argument as input and
optionally ad runtime configuration (see `runtime
variables <#getting-started>`__) as part of the ``EXTRACTOR_METADATA``
variable as JSON.
.. code:: shell
EXTRACTOR_METADATA='{"compressed":True}'
Locally Run Classifier on Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For utility, the above line has been wrapped in the bash script
``run_local.sh``.
.. code:: shell
./run_local.sh <docker_image> [<source_directory> <output_data_dir> [<json_args>]] [<all_args>]
- run clip extraction on source with prior processing
<docker_image> = 0 IF local command-line based (args using arg parse)
= 1 IF local docker emulation
= IMAGE_NAME IF docker image name to run
./run_local.sh 0 --path_content features/ --path_result results/ --verbose
./run_local.sh 1 features/ results/ 0 '{\"verbose\"true}'
Through all of the above examples, the underlying command-line execution is
similar to this excution run on the testing data.
.. code:: shell
python -u activity_classifier/main.py --path_content testing/data/launch/video.mp4
--path_result testing/class --path_models activity_classifier/data/models/manifest.json --verbose
Feature-Based Similarity
~~~~~~~~~~~~~~~~~~~~~~~~
A helper script is also avaialble to compute the similarity of clips in
one or more feature files. *(v1.1.0)*
.. code:: shell
python -u activity_classifier/features.py --path_content testing/data/dummy.txt \\
--feature_type dsai_videocnn dsai_vggish --path_result testing/dist
ContentAI
---------
Deployment
~~~~~~~~~~
Deployment is easy and follows standard ContentAI steps.
.. code:: shell
contentai deploy dsai_activity_classifier
Deploying...
writing workflow.dot
done
Alternatively, you can pass an image name to reduce rebuilding a docker
instance.
.. code:: shell
docker build -t dsai_activity_classifier
contentai deploy metadata-flatten dsai_activity_classifier
Locally Downloading Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can locally download data from a specific job for this extractor to
directly analyze.
.. code:: shell
contentai data wHaT3ver1t1s --dir data
Run as an Extractor
~~~~~~~~~~~~~~~~~~~
.. code:: shell
contentai run https://bucket/video.mp4 -w 'digraph { dsai_videocnn -> dsai_activity_classifier; dsai_vggish -> dsai_activity_classifier }'
JOB ID: 1Tfb1vPPqTQ0lVD1JDPUilB8QNr
CONTENT: s3://bucket/video.mp4
STATE: complete
START: Fri Feb 15 04:38:05 PM (6 minutes ago)
UPDATED: 1 minute ago
END: Fri Feb 15 04:43:04 PM (1 minute ago)
DURATION: 4 minutes
EXTRACTORS
my_extractor
TASK STATE START DURATION
724a493 complete 5 minutes ago 1 minute
Or run it via the docker image. Please review the ``run_local.sh`` file for more information.
View Extractor Logs (stdout)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: shell
contentai logs -f <my_extractor>
my_extractor Fri Nov 15 04:39:22 PM writing some data
Job complete in 4m58.265737799s
Adding New Models
=================
There are two steps to adding new models.
1. First, train the models and formulate
a well-known structure (this can be done exhaustively across a number of model types).
See `MODELS.rst <MODELS.rst>`__ for more details.
2. Update the manifest according to the instructions below to indicate how the activity
classifier should load the model (e.g. the `framework`), the required features, and
a few fields for understanding other descriptions (e.g. the `name` and the `id`).
Updating The Manifest
---------------------
Adding models to the pre-determined set of models is as easy as editing a manifest file and
adding a model into git LFS.
1. Archive the new model into a serialized fileset. At time of writing, this was serializing
models from `sklearn <https://scikit-learn.org>`__ with simple
`pickle load/save serialization <https://scikit-learn.org/stable/modules/model_persistence.html>`__.
2. Gather all of the relevant output files and compress them if you can. Currently, the library
understands gzip compression extensions (e.g. ".gz").
3. Choose the appropriate sub-directory that corresponds to the upstream feature extractor. For
example, models built on ``3dcnn`` features may process new videos (via `extractor chaining <https://www.contentai.io/docs/extractor-chaining>`__)
to the extractor ``dsai_3dcnn``. If one doesn't exist yet, please create a new directory, but
remember what combination of audio and video features is required.
4. Modify the manifest file in ``activity_classifier/data/models/manifest.json`` for your new entry.
Specifically, the input video and audio features must be defined as well as the serialization
library. Below is an example block that indicates ``3dcnn` video and ``vggish`` audio features for
a model crated with ``sklearn`` where prediction results will be nested with the name ``Running``.
.. code:: shell
[ ...
{
"path": "3dcnn-vggish/lr-Running.pkl.gz",
"name": "Running",
"id": "ugc",
"framework": "sklearn",
"video": "dsai_videocnn",
"audio": "dsai_vggish"
},
... ]
5. Prepare to add your model files to the repo. **NOTE This repo uses `git-lfs <https://git-lfs.github.com/>`__
to store all binary files like models. If your model is added with regular git tools alone, you will
get a sternly worded email (and friendly advice on how to re-add correctly).**
.. code:: shell
(from the base directory only)
git lfs track activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz
git add activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz
git add activity_classifier/data/models/manifest.json
6. Test your model with the data in the ``testing`` directory. The CI/CD process should do this too
but it's always easier to find and fix problems here than with a vague email. The features in this
directory came from processing of the `HBO Max Launch Video <https://www.youtube.com/watch?v=9yLNhhHs3-k>`__,
which is publicly available as a reference.
.. code:: shell
(from the base directory)
./run_local.sh 0 --path_content testing/data/test.mp4 --time_interval 1.5
(check for predictions from your new model in data.json)
Testing
=======
Testing is included via tox. To launch testing for the entire package, just run `tox` at the command line.
Testing can also be run for a specific file within the package by setting the evironment variable `TOX_ARGS`.
.. code:: shell
TOX_ARG=test_basic.py tox
Future Development
==================
- additional training hooks?
Changes
=======
Generates activity classifications from low-level feature inputs in support
of analytic workflows within the `ContentAI Platform <https://www.contentai.io>`__.
1.3
---
1.3.7
~~~~~
- fix run_local typos
- more verbosity checks
1.3.6
~~~~~
- modeling.py separators
- docs reorg
1.3.5
~~~~~
- contentai key request fix
1.3.3
~~~~~
- docs update
- multiclass write
1.3.2
~~~~~
- docker build update, run example update
1.3.1
~~~~~
- docs fix for example of using package
- bug fix for default location, change inputs to classify function
1.3.0
~~~~~
- move models out of the primary package
- *breaking change*, rename input param `path_models` to `path_manifest`
1.2
---
1.2.2
~~~~~
- bump version for model migration to LFS
1.2.1
~~~~~
- fix docker/deployed image run command
1.2.0
~~~~~
- switch to package representation, push to pypi
- several updates for MANIFEST definition (id)
- inclusion of multi-parameter training and testing framework
- safety for model loading, catch exceptions, return gracefully
- update documents to split for binary models
1.1
---
1.1.1
~~~~~
- cosmetic change for reuse in other libraries
1.1.0
~~~~~
- refactor feature code, add utility for difference computation among segments
- min value thresholding to avoid low scoring results in output (default=0.1)
- refactor caching information for feature load (allow flatten, remove cache, allow multi-asset)
- allow recursive feature load for distance compute
1.0
---
1.0.2
~~~~~
- fixes for output, modify to require other extractors as dependencies
- fix order of paramters for local runs
1.0.1
~~~~~
- updates for integration of other models, fixes for prediction output
- add l2norm after average/merge in time of source features
1.0.0
~~~~~
- initial project merge from other sources
- generates json prediction dict
- callable as package
- includes some testing routines with windowing comparison