Bops is a numpy-based analysis module focusing on the manipulation, grouping and
filtering of data from various sources. Bops also has map-reduce functionality.
While there are some datasets which need distributed map-reduce jobs, the
author has the point of view that most do not. Bops gives tremendous power in
data grouping without the sacrifice of speed or simplicity.
Bops is tightly integrated with numpy to produce a very fast analysis package.
The module has one main class for data manipulation, called a 'bop'. Bops was
initially named for 'boolean operations'. The module has been greatly expanded
to include map-reduce and data grouping on top of the initial filtering
capability.
Bops expects a two-dimensional data structure for initialization along with the
attributes of the data (ie. column names). After the data is contained in
a 'bop', it can be filtered ('select' function), grouped on
multiple columns ('groupby' function) and sorted ('orderby' function).
The 'select' function allows you to filter on multiple aspects of the data
by manipulating numpy boolean arrays. The 'groupby' function can group data on
similar attributes. However, unlike the 'GROUP BY' functions found in SQL,
bops' 'groupby' function returns the data found in the group along with the
group's unique identifiers. On top of these functions, bops also contains a
sort function, called 'orderby', which allows the programmer to order the data
on multiple columns.
These functions provide enormous power in data analysis, primarily by grouping
data on multiple attributes then returning the results to be manipulated. This
strength is magnified by added map-reduce functionality. The map function
allows a programmer to aggregate data based on custom logic. A simple example
would be grouping by gender and decade of age (30's, 40's, ...) for every
row in a dataset. A reduce function would then be ran on each group found by
the map function. Using the same example, one could use the built-in len or sum
functions in python as the reduce function to generate histograms of gender
and age groups.
Giving the programmer capability and removing limitations is the goal for Bops
to aid data analysis.
**Changes v0.4.1 - 0.5**
* Added aliases:
* **float**: `np.float_`
* **int**: `np.int_`
* **bool**: `np.bool_`
* **str**: `np.str_`
* **unicode**: `np.unicode_`
* **complex**: `np.complex_`
* Changed the default *expand* option to True for the 'mapreduce' and 'mapreducebatch' functions. This is to comply with the groupby function, so that the *expand* options would match across the module.
**Mailing List**
A mailing list has been created to support the use of this module. You can join
and follow the discussion on `Google groups <http://groups.google.com/group/python-bops>`_. Any errors, issues and enhancements can be discussed here.
Bops aims to be a top-notch data analysis module, but only with your help can this module actually be great. Please chime into the discussion. Your inputs are welcome as well as any suggested features, patches or fixes.
**Google Code**
The module is now on Google Code: http://code.google.com/p/bops/
Issues, bugs and suggested enhancments can be submitted there.