[pythonvis] Re: pandas groupby and max problem

From: "Jeffrey Thompson" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "jthomp" for DMARC)
To: <pythonvis@xxxxxxxxxxxxx>
Date: Fri, 24 Jul 2020 17:26:58 -0400

Hi William,

I don't know Panda, so I'm not sure what you are doing exactly with it, or what
you are using to do it with,
but I think that you could probably accomplish something similar to what you
are trying to do with the Python itertools module.
I recognized "groupby" as being in itertools,
and I expect it works in a similar manner to what it is doing in your code.
I have included excerpts from Python.org's documentation on itertools as it
applies to groupby,
and another method, which is less likely to be of value,
but I decided to included it in the
off chance that you could find it useful.

That method is islice().
---

The code supplied in Python.org's documentation here does not provide the
actual results.
You would need to supply the key function ['types of disability'],
I think the key function would be:
key_func = lambda x: x['types of disability']
then do the steps described in the groupby code example.
When that is done,
you can do:
sums = [ sum([x['Marks'] for x in a_group]) for a_group in groups]
final_list = [(a_key, a_sum) for a_key, a_sum in list(zip(uniqueKeys, sums))]
final_list = sorted(final_list, lambda x: x[1], reverse = True)
maximum_key_value = final_list[0]
max_key, max_value = maximum_key_value
# maximum_value will hold the maximum value, and max_key will be the type of
disability
===

itertools.groupby(iterable, key=None)
Make an iterator that returns consecutive keys and groups from the iterable.
The key is a function computing a key value for each element. If not specified
or is None, key defaults to an identity function and returns the element
unchanged. Generally, the iterable needs to already be sorted on the same key
function.

The operation of groupby() is similar to the uniq filter in Unix. It generates
a break or new group every time the value of the key function changes (which is
why it is usually necessary to have sorted the data using the same key
function). That behavior differs from SQL’s GROUP BY which aggregates common
elements regardless of their input order.

The returned group is itself an iterator that shares the underlying iterable
with groupby(). Because the source is shared, when the groupby() object is
advanced, the previous group is no longer visible. So, if that data is needed
later, it should be stored as a list:

groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)
groupby() is roughly equivalent to:

class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        self.id = object()
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey, self.id))
    def _grouper(self, tgtkey, id):
        while self.id is id and self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)
========

itertools.islice(iterable, stop)
itertools.islice(iterable, start, stop[, step])
Make an iterator that returns selected elements from the iterable. If start is
non-zero, then elements from the iterable are skipped until start is reached.
Afterward, elements are returned consecutively unless step is set higher than
one which results in items being skipped. If stop is None, then iteration
continues until the iterator is exhausted, if at all; otherwise, it stops at
the specified position. Unlike regular slicing, islice() does not support
negative values for start, stop, or step. Can be used to extract related fields
from data where the internal structure has been flattened (for example, a
multi-line report may list a name field on every third line). Roughly
equivalent to:

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass
If start is None, then iteration starts at zero. If step is None, then the step
defaults to one.

-----Original Message-----
From: pythonvis-bounce@xxxxxxxxxxxxx <pythonvis-bounce@xxxxxxxxxxxxx> On Behalf
Of William Wong
Sent: Friday, July 24, 2020 8:26 AM
To: pythonvis@xxxxxxxxxxxxx
Subject: [pythonvis] pandas groupby and max problem

Hello,

Sorry to seek help again.

I have a pandas dataframe contain columns: Names, Types of disability, Marks

I then apply pd.groupby(['Types of disability'])['Marks'].sum() to find the
total marks for each type of disability.

I want to know which Types of disability has the highest mark, which I was able
to use pd.groupby(['Types of disability'])['Marks'].sum().max()
to find the highest mark after summation,  but I don't know how to find the
respective column in the Tyeps of disability of this max marks.

Thanks,

William

List web page is
//www.freelists.org/webpage/pythonvis

To unsubscribe, send email to
pythonvis-request@xxxxxxxxxxxxx with "unsubscribe" in the Subject field.

List web page is
//www.freelists.org/webpage/pythonvis

To unsubscribe, send email to
pythonvis-request@xxxxxxxxxxxxx with "unsubscribe" in the Subject field.

References:
- [pythonvis] pandas groupby and max problem
  - From: William Wong

[pythonvis] Re: pandas groupby and max problem

Other related posts: