Welcome to Mochila’s documentation!

Collections are vital for implementing algorithms: they are the bread an butter of data aggregation and processing. For example, a collection can be used to represent the list of ingredients of a recipe. Then, you might want your algorithm to scale the ingredients to accommodate a different number of portions than the original recipe. The Mochila package provides a powerful API to process data in collections in a declarative way.

A very famous declarative language is SQL. The SQL query “SELECT id, quantity*factor from ingredients” only express what we expect to have after processing. There is no information in the query that describes how to get it. The basic idea behind declarative statements is that we only need to specify the what we want and let the API determine the best way to do it.

Declarative Operations API

For all the operations, the function should accept one argument.

mochila.aggregate(key, value)

Returns a dictionary containing key: value pairs produced by evaluating the key and value functions on each item of the Mochila.

mochila.closure(function)

Returns a Bag containing the results of evaluating the transitive closure of the results produced by the function on each item of the Mochila.

mochila.collect(function)

Returns a Bag containing the results of evaluating the function on each item of the Mochila.

mochila.excludes_all(iterable)

Returns True if the Mochila excludes all the items the iterable. For single items use x not in M.

mochila.exists(function)

Returns True if there exists at least one item in the Mochila for which function returns True.

mochila.for_all(function)

Returns True if for all items in the Mochila the function returns True.

mochila.includes_all(iterable)

Returns True if the Mochila includes all items in the iterable. For single items use x in M.

mochila.one(function)

Returns True if there exists exactly one item in the Mochila for which function returns True.

mochila.reject(function)

Returns a sub-collection containing only items of the Mochila that do not satisfy the condition defined by the function, that is, reject the items for which the function returns True.

mochila.select(function)

Returns a sub-collection containing only items of the Mochila that do satisfy the condition defined by the function, that is, select the items for which the function returns True.

mochila.select_one(function, default=None)

Returns the first element in the Mochila for which function returns True, else default. If default is not given, it defaults to None

mochila.sort_by(function, reverse=None, inplace=False)

Returns a copy of the Mochila sorted by the results of evaluating the function on each item of the Mochila. The value corresponding to each item in the list is calculated once and then used for the entire sorting creation process. Sorting is done using only < comparisons between comparison values.

sortBy() accepts two arguments that can only be passed by keyword:

reverse is a boolean value. If set to True, then the Mochila items are sorted as if each comparison were reversed.

inplace is a boolean value. If set to True, then the Mochila is sorted in place and the operation returns the Mochila in which it was invoked (as opposed to a copy).

Using the Declarative Operations

Next we introduce an example data model (represented as a class) and show how the declarative operations can be used. You can find a larger data set in the persons.csv file in the test folder in the source code.

The data model contains information about (secret) agents:

>>> class Agent:
...     def __init__(self, code, first_name, last_name, rank, active, peer, *args):
...         """
...         An agent that is used for declarative operations
...         :param code: The agent id
...         :param first_name: first name
...         :param last_name: last name
...         :param rank: rank in the system
...         :param active: is the agent active
...         :param peer: peer agents
...         :param args: countries where the agent can be active
...         """
...         self.code = code
...         self.first_name = first_name.strip()
...         self.last_name = last_name.strip()
...         self.rank = int(rank)
...         self.active = True if active.strip() == 'TRUE' else False
...         if peer:
...             self.peers = peer.split('~')
...         else:
...             self.peers = []
...             self.visited = list(args)
...
...     def __str__(self):
...         return self.code
...
...     def __repr__(self):
...         return self.code
...
>>> people_csv = [
...     "730-46-0957,Cass,      Lamba,       91,  TRUE,         ,Sweden,Switzerland,China,Georgia,",
...     "186-01-5810,Barbara,   Rosewell,   487,  TRUE,         ,Indonesia,Poland,Albania,Portugal,",
...     "424-16-0664,Elnore,    Dillestone,  95,  TRUE,730-46-0957,Ireland,Russia,Philippines,Japan,",
...     "694-68-6118,Brig,      Derham,     367, FALSE,186-01-5810,Pakistan,Russia,China,Papua New Guinea,",
...     "212-70-6483,Adelbert,  Michelet,   166, FALSE,186-01-5810~424-16-0664,Honduras,Spain,Brazil,Sierra Leone,",
...     "824-34-4142,Aldon,     Craske,     437, FALSE,,Italy,Poland,Philippines,Argentina,",
...     "539-35-1184,Valentine, Woolvin,     13, FALSE,694-68-6118,Spain,Bulgaria,Indonesia,,",
...     "861-26-2185,Godard,    Gadie,       99, FALSE,212-70-6483,Portugal,Brazil,Thailand,China,",
...     "368-95-4835,Etan,      Bumphries,  291, FALSE,         ,Tunisia,France,Mexico,Armenia,",
...     "859-05-6244,Sutherlan, McElwee,    434,  TRUE,824-34-4142,China,Russia,Bosnia and Herzegovina,China,"]

>>> import csv
>>> agents = dict()
>>> agentreader = csv.reader(people_csv, delimiter=',')
>>> for row in agentreader:     # Load the agent data
...     p = Agent(*row)
...     agents[p.code] = p
>>> agents_get = agents.get
>>> for k, v in agents.items(): # Replace the string references for object references (peers)
...     v.peers = [agents_get(p) for p in v.peers]

We will use the aggregate operation to create a dict to hold the name and last name information of the agents. The key of the dict will be the agent’s code:

>>> import Mochila as m
>>> def get_name(p):
...     return "{} {}".format(p.first_name, p.last_name)
>>> def get_code(p):
...     return p.code
>>> mochila = m.Sequence(agents.values())
>>> d = mochila.aggregate(get_code, get_name)
>>> import pprint
>>> pprint.pprint(d)
{'186-01-5810': 'Barbara    Rosewell',
 '212-70-6483': 'Adelbert   Michelet',
 '368-95-4835': 'Etan       Bumphries',
 '424-16-0664': 'Elnore     Dillestone',
 '539-35-1184': 'Valentine  Woolvin',
 '694-68-6118': 'Brig       Derham',
 '730-46-0957': 'Cass       Lamba',
 '824-34-4142': 'Aldon      Craske',
 '859-05-6244': 'Sutherlan  McElwee',
 '861-26-2185': 'Godard     Gadie'}

A closure is a very powerful way to, for example, collect information recursively. Since the function is applied to all results recursively you need to make sure that all results can quack to its tune ;). We can use a closure to get all the peers of an agent recursively up the peer hierarchy. The closure has a collection of peers for each agent. We can use the flatten (see Core operations) to get the peer hierarchy for each agent (note that Bag is an unordered collection, hence the results):

>>> def get_peer(p):
...     return p.peers
...
>>> cl = mochila.closure(get_peer)
>>> for peerh in cl:
...     agent_peers = peerh.flatten()
...     pprint.pprint(agent_peers)
    Bag([186-01-5810])
    Bag()
    Bag([824-34-4142])
    Bag([186-01-5810, 424-16-0664, 730-46-0957])
    Bag([186-01-5810, 694-68-6118])
    Bag([730-46-0957])
    Bag()
    Bag()
    Bag()
    Bag([212-70-6483, 424-16-0664, 186-01-5810, 730-46-0957])

The collect operation is useful for collecting information form the data. In the next example we collect information about all the countries the agents have visited, but the collect function could be more complex:

>>> def get_countries(agent):
...    return [c for c in agent.visited if len(c) > 0]  # Ignore empty
>>> a_countries = seq.collect(get_countries).flatten().asOrderedSet()
>>> a_countries
OrderedSet(['Albania', 'Argentina', 'Armenia', 'Bosnia and Herzegovina', 'Brazil', 'Bulgaria', 'China', 'France', 'Georgia', 'Honduras', 'Indonesia', 'Ireland', 'Italy', 'Japan', 'Mexico', 'Pakistan', 'Papua New Guinea', 'Philippines', 'Poland', 'Portugal', 'Russia', 'Sierra Leone', 'Spain', 'Sweden', 'Switzerland', 'Thailand', 'Tunisia'])

The exists, for_all and one operations provide a quick way to validate the Mochila. The first to are akin to Python’s built-in any and all operations, the last one is a more restrictive any:

>>> def visited_russia(agent):
...     return "Russia" in agent.visited
>>> def visited_colombia(agent):
...    return "Colombia" in agent.visited
>>> to_russia = mochila.exists(visited_russia)
>>> pprint.pprint(to_russia)
True
>>> to_colombia = mochila.exists(visited_colombia)
>>> pprint.pprint(to_colombia)
False
>>> def above_average(agent):
...     return agent.rank < 500
>>> def active_agent(agent):
...     return agent.active
>>> average = mochila.for_all(above_average)
>>> pprint.pprint(average)
True
>>> active = mochila.for_all(active_agent)
>>> pprint.pprint(active)
False
>>> def visited_france(agent):
...     return "France" in agent.visited
>>> to_russia = mochila.one(visited_russia)
>>> pprint.pprint(to_russia)
False
>>> to_france = mochila.one(visited_france)
>>> pprint.pprint(to_france)
True

The reject and select operations allow to filter the Mochila, they are akin to Python’s built-in filter operation:

>>> to_russia = mochila.reject(visited_russia)
>>> print("{} > {}".format(len(mochila), len(to_russia)))
10 > 7
>>> check_russia = to_russia.select(visited_russia)
>>> print(len(check_russia))
0
>>> active_agents = mochila.select(active_agent).collect(get_name)
>>> pprint.pprint(active_agents)
Bag(['Cass Lamba', 'Barbara Rosewell', 'Elnore Dillestone', 'Sutherlan McElwee'])
>>> one_russia = mochila.select_one(visited_russia).code
>>> pprint.pprint(one_russia)
'424-16-0664'

The Mochilas

The declarative operations provided can be invoked on any Mochila. In Colombia, a Mochila is a type of knitted bag crafted by the indigenous people, mainly in the northen part of the country. Mochilas come in different colors and sizes and can satisfy all your carrying needs. This package provides five Mochilas (collection implementations).

A Mochila can be any of:

Susu (Sequence)
A Sequence is an enumerated collection of objects. The order in which the objects appear in the collection matters and the same object can appear multiple times at different positions.
Kapatera (Set)
A Set is a collection of objects. The order in which the appear in the collection is irrelevant and the same object can appear only once in the collection.
Maikisia (Ordered Set)
An OrderedSet is an enumerated collection of objects. The order in which the objects appear in the collection matters and the same object can appear only once in the collection.
Kattowi (Bag)
A Bag is a collection of objects. The order in which they appear in the collection is irrelevant and the same object can appear multiple times.
Uttiakajamatu (MultiSet)
A MultiSet is a collection of objects. The order in which they appear in the collection is irrelevant and the same object can appear multiple times. The main difference with a Bag is that, implementation wise, a Mutable Set can only contain hashable objects (a Bag does not).

The collections support all Python operations on collections of the equivalent type. That is, Sequence supports all List operations, Set supports Set, etc. Please refer to the specific collection documentation for detailed information and limitations/caveats.

For set like Mochilas, operator expressions are only supported against other set type Mochilas or collections that inherit from the Set type in the Collection Abstract Base Classes Further, the named operator functions are not supported.

Next, we provide a general overview of the Mochila API.

Mochila’s Supported Operations

All Mochilas are Python collections and as such can be iterated and indexed. Further, Mochilas are also iterable so they can be used in any construct that uses the iterable pattern or in methods that accept iterable arguments.

Core operations

mochila.add(x)

Add an item to the mochila. For ordered Mochilas the item is added at the end of the collection, for unordered Mochilas the item is added at a random position.

mochila.add_all(iterable)

Add all items in the itearable to the Mochila. For ordered Mochilas the items are added at the end of the collection, for unordered they are added at random positions.

mochila.clear()

Remove all items from the Mochila.

mochila.count(x)

Return the number of times x appears in the Mochila.

mochila.copy()

Return a shallow copy of the Mochila.

mochila.discard(x, n_copies=1)

Remove an item from the Mochila n_copies number of times. The n_copies argument only applies to non-unique Mochilas (Sequence, Bag. MultiSet)

mochila.discard_all(iterable)

Remove all items in the iterable from the Mochila. If an items appears multiple times in the Mochila, it will be removed once for each time it appears in the iterable.

mochila.excluding(x)

Returns a new Mochila that excludes x. The type of the returned Mochila is the same as the type of the Mochila on which the operation is invoked.

mochila.excluding_all(iterable)

Returns a new Mochila that excludes all the items in the iterable. The type of the returned Mochila is the same as the type of the Mochila on which the operation is invoked.

mochila.flatten()

Returns a new Mochila where no item is an iterable itself. All items in the Mochilla are recursively flatten.

mochila.including(x)

Returns a new Mochila that includes x. The type of the returned Mochila is the same as the type of the Mochila on which the operation is invoked.

mochila.including_all(iterable)

Returns a new Mochila that includes all items in the iterable. The type of the returned Mochila is the same as the type of the Mochila on which the operation is invoked.

mochila.remove(x)

Remove an item from the Mochila. Raises ValueError if the item is not in the Mochila.

mochila.remove_all(iterable)

Remove all items in the iterable from the Mochila. If an items appears multiple times in the Mochila, it will be removed once for each time it appears in the iterable. Raises ValueError if the item is not in the Mochila.

Metamorphosis Operations

These operations change one Mochila into another Mochila. When going from ordered to unordered, the ordering information is lost. When going from unordered to ordered, an arbitrary order is used, unless an ordering function is provided. When going from non-unique to unique some elements may be dropped (i.e. no duplicates).

mochila.asBag()

Returns a shallow copy of the items of the Mochila as a Bag

mochila.asSet()

Returns a shallow copy of the items of the Mochila as a Set

mochila.asSequence(key=None, reverse=None)

Returns a shallow copy of the items of the Mochila as a Sequence.

asSequence() accepts two arguments that can only be passed by keyword:

key specifies a function of one argument that is used to extract a comparison key from each item (for example, key=str.lower). The key corresponding to each item in the list is calculated once and then used for the entire Sequence creation process. Ordering of the Sequence is done using only < comparisons between comparison keys.

reverse is a boolean value. If set to True, then the Sequence items are sorted as if each comparison were reversed.

mochila.asOrderedSet(key=None, reverse=None)

Returns a shallow copy of the items of the Mochila as an Ordered Set

asOrderedSet() accepts two arguments that can only be passed by keyword:

key specifies a function of one argument that is used to extract a comparison key from each item (for example, key=str.lower). The key corresponding to each item in the list is calculated once and then used for the entire Sequence creation process. Ordering of the Sequence is done using only < comparisons between comparison keys.

reverse is a boolean value. If set to True, then the Sequence items are sorted as if each comparison were reversed.

Build Information

https://circleci.com/bb/arcanefoam/mochila/tree/master.svg?style=shield https://coveralls.io/repos/bitbucket/arcanefoam/mochila/badge.svg

Source and Contributions

The source code is hosted at Bitbucket/Mochila and information on how to contribute can be found in the CONTRIB file.

Thanks to

Indices and tables