首页 > 代码库 > Demo of Python "Map Reduce Filter"

Demo of Python "Map Reduce Filter"

Here I share with you a demo for python map, reduce and filter functional programming thatowned by me(Xiaoqiang).

I assume there are two DB tables, that `file_logs` and `expanded_attrs` which records more columns to expand table `file_logs`. For demonstration, we assume that there are more than one file logs for a same tuple of (platform_id, client_id). We need to feture out which is the one lasted updated for (platform_id=1, client_id=1) tuple.

Here is the thoughts:

1. Filter out all file logs for tuple (platform_id=1, client_id=1) from original file logs,
2. Merge expand table attributes into file_logs table in memory, like union selection.
3. Reduce the full version of file_logs for figuring out which is latest updated.

Demo codes shows here (use Python 2.6+, 2.7+):

BTW, you are welcome if you feature out a more effective way of working or any issues you found. Thanks. :)

#!/usr/bin/env python

"""
Requirement:
    known platform_id=1, client_id=1 as pid and cid.
    exists file_logs and expanded_attrs which are array of objects, expanded_attrs is a table of columns expand table file_logs
    as file_logs contains more than one for pid=1,cid=1, we need to find out which is the one latest updated.
"""

file_logs = [
    { 'file_log_id': '1', 'platform_id': '1', 'client_id': '1', 'file': 'path/to/platform/client/j-1/stdout' },
    { 'file_log_id': '2', 'platform_id': '1', 'client_id': '1', 'file': 'path/to/platform/client/j-2/stdout' },
    { 'file_log_id': '3', 'platform_id': '2', 'client_id': '3', 'file': 'path/to/platform/client/j-3/stdout' },
]

expanded_attrs = [
    { 'file_log_id': '1', 'attr_name': 'CLICK', 'attr_value': '100' },
    { 'file_log_id': '1', 'attr_name': 'SUPPRESSION', 'attr_value': '100' },
    { 'file_log_id': '1', 'attr_name': 'last_updated', 'attr_value': '2014-07-14' },
    { 'file_log_id': '2', 'attr_name': 'CLICK', 'attr_value': '200' },
    { 'file_log_id': '2', 'attr_name': 'SUPPRESSION', 'attr_value': '200' },
    { 'file_log_id': '2', 'attr_name': 'last_updated', 'attr_value': '2014-07-15' },
    { 'file_log_id': '3', 'attr_name': 'CLICK', 'attr_value': '300' },
    { 'file_log_id': '3', 'attr_name': 'SUPPRESSION', 'attr_value': '300' },
    { 'file_log_id': '3', 'attr_name': 'last_updated', 'attr_value': '2014-07-15' },
]

platform_id = '1'
client_id = '1'

target_scope_filelogs = filter(lambda x: x['platform_id'] == platform_id and x['client_id'] == client_id, file_logs)

map(
    lambda x:
        x.update(reduce(
            lambda xx, xy: xx.update({ xy['attr_name']: xy['attr_value'] }) is None and xx,
            filter(lambda xx: xx['file_log_id'] == x['file_log_id'], expanded_attrs),
            dict()
        )),
    target_scope_filelogs
)

print reduce(lambda x, y: x['last_updated'] > y['last_updated'] and x or y, target_scope_filelogs)
#> {'file_log_id': '2', 'platform_id': '1', 'last_updated': '2014-07-15', 'SUPPRESSION': '200', 'file': 'path/to/platform/client/j-2/stdout', 'client_id': '1', 'CLICK': '200'}