Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cannot reindex from a duplicate axis" on minute level data #2731

Open
dnola opened this issue Jun 30, 2020 · 0 comments
Open

"cannot reindex from a duplicate axis" on minute level data #2731

dnola opened this issue Jun 30, 2020 · 0 comments

Comments

@dnola
Copy link

@dnola dnola commented Jun 30, 2020

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment:

Google Colab - default environment (colab.research.google.com)

  • Operating System: (Windows Version or $ uname --all)

  • Python Version: $ python --version
    Python 3.6.9

  • Python Bitness: $ python -c 'import math, sys;print(int(math.log(sys.maxsize + 1, 2) + 1))'
    64

  • How did you install Zipline: (pip, conda, or other (please explain))
    pip

  • Python packages: $ pip freeze or $ conda list
    (see attached, but basically default Google Colab environment)
    requirements.txt

Now that you know a little about me, let me tell you about the issue I am
having:

Description of Issue

I am working with Alpha Vantage minute level data, format is:

date open high low close volume dividend split
2020-06-23 09:31:00 998.88 999.48 996.03 997.0 158984.0 0.0 0.0
2020-06-23 09:32:00 997.105 997.3099 994.0101 996.485 55373.0 0.0 0.0
2020-06-23 09:33:00 996.7799 996.78 996.24 996.77 4836.0 0.0 0.0
2020-06-23 09:34:00 997.06 997.33 995.01 996.4025 61141.0 0.0 0.0
2020-06-23 09:35:00 996.05 1001.0 996.05 999.8751 64530.0 0.0 0.0
2020-06-23 09:36:00 999.82 1002.88 998.29 1000.1736 52888.0 0.0 0.0
2020-06-23 09:37:00 1000.3922 1002.74 999.85 1001.27 39002.0 0.0 0.0
2020-06-23 09:38:00 1000.665 1001.5 999.105 1001.135 25644.0 0.0 0.0
2020-06-23 09:39:00 1001.465 1001.465 999.6089 999.99 19731.0 0.0 0.0
  • What did you expect to happen?

I'm using the CSV data ingest process. Data ingests fine, however, when I attempt to run a benchmark such as:

%%zipline --start 2020-06-23 --end 2020-06-26 --bundle custom-csvdir-bundle --data-frequency minute

from zipline.api import symbol, order, record

def initialize(context):
    pass

def handle_data(context, data):
    # order(symbol('TSLA'), 10)
    # record(AAPL=data[symbol('TSLA')].price)
    pass

I get an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-379-1f7b7e62a9ea> in <module>()
----> 1 get_ipython().run_cell_magic('zipline', '--start 2020-06-23 --end 2020-06-26 --bundle custom-csvdir-bundle --data-frequency minute', "\nfrom zipline.api import symbol, order, record\n\ndef initialize(context):\n    pass\n\ndef handle_data(context, data):\n    # order(symbol('TSLA'), 10)\n    # record(AAPL=data[symbol('TSLA')].price)\n    pass")

18 frames
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119 

/usr/local/lib/python3.6/dist-packages/zipline/__main__.py in zipline_magic(line, cell)
    309             '%s%%zipline' % ((cell or '') and '%'),
    310             # don't use system exit and propogate errors to the caller
--> 311             standalone_mode=False,
    312         )
    313     except SystemExit as e:

/usr/local/lib/python3.6/dist-packages/click/core.py in main(self, args, prog_name, complete_var, standalone_mode, **extra)
    780             try:
    781                 with self.make_context(prog_name, args, **extra) as ctx:
--> 782                     rv = self.invoke(ctx)
    783                     if not standalone_mode:
    784                         return rv

/usr/local/lib/python3.6/dist-packages/click/core.py in invoke(self, ctx)
   1064         _maybe_show_deprecated_notice(self)
   1065         if self.callback is not None:
-> 1066             return ctx.invoke(self.callback, **ctx.params)
   1067 
   1068 

/usr/local/lib/python3.6/dist-packages/click/core.py in invoke(*args, **kwargs)
    608         with augment_usage_errors(self):
    609             with ctx:
--> 610                 return callback(*args, **kwargs)
    611 
    612     def forward(*args, **kwargs):  # noqa: B902

/usr/local/lib/python3.6/dist-packages/click/decorators.py in new_func(*args, **kwargs)
     19 
     20     def new_func(*args, **kwargs):
---> 21         return f(get_current_context(), *args, **kwargs)
     22 
     23     return update_wrapper(new_func, f)

/usr/local/lib/python3.6/dist-packages/zipline/__main__.py in run(ctx, algofile, algotext, define, data_frequency, capital_base, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, blotter)
    274         local_namespace=local_namespace,
    275         environ=os.environ,
--> 276         blotter=blotter,
    277     )
    278 

/usr/local/lib/python3.6/dist-packages/zipline/utils/run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
    227     ).run(
    228         data,
--> 229         overwrite_sim_params=False,
    230     )
    231 

/usr/local/lib/python3.6/dist-packages/zipline/algorithm.py in run(self, data, overwrite_sim_params)
    754         try:
    755             perfs = []
--> 756             for perf in self.get_generator():
    757                 perfs.append(perf)
    758 

/usr/local/lib/python3.6/dist-packages/zipline/algorithm.py in get_generator(self)
    627         method to get a standard construction generator.
    628         """
--> 629         return self._create_generator(self.sim_params)
    630 
    631     def run(self, data=None, overwrite_sim_params=True):

/usr/local/lib/python3.6/dist-packages/zipline/algorithm.py in _create_generator(self, sim_params)
    590             self.initialized = True
    591 
--> 592         benchmark_source = self._create_benchmark_source()
    593 
    594         self.trading_client = AlgorithmSimulator(

/usr/local/lib/python3.6/dist-packages/zipline/algorithm.py in _create_benchmark_source(self)
    562             data_portal=self.data_portal,
    563             emission_rate=self.sim_params.emission_rate,
--> 564             benchmark_returns=benchmark_returns,
    565         )
    566 

/usr/local/lib/python3.6/dist-packages/zipline/sources/benchmark_source.py in __init__(self, benchmark_asset, trading_calendar, sessions, data_portal, emission_rate, benchmark_returns)
     49         elif benchmark_returns is not None:
     50             self._daily_returns = daily_series = benchmark_returns.reindex(
---> 51                 sessions,
     52             ).fillna(0)
     53 

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in reindex(self, index, **kwargs)
   2679     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   2680     def reindex(self, index=None, **kwargs):
-> 2681         return super(Series, self).reindex(index=index, **kwargs)
   2682 
   2683     @Appender(generic._shared_docs['fillna'] % _shared_doc_kwargs)

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   3021         # perform the reindex on the axes
   3022         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3023                                   fill_value, copy).__finalize__(self)
   3024 
   3025     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3039             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3040                                              fill_value=fill_value,
-> 3041                                              copy=copy, allow_dups=False)
   3042 
   3043         return obj

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3143                                                 fill_value=fill_value,
   3144                                                 allow_dups=allow_dups,
-> 3145                                                 copy=copy)
   3146 
   3147         if copy and new_data is self._data:

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4137         # some axes don't allow reindexing with dups
   4138         if not allow_dups:
-> 4139             self.axes[axis]._can_reindex(indexer)
   4140 
   4141         if axis >= self.ndim:

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
   2942         # trying to reindex on an axis with duplicates
   2943         if not self.is_unique and len(indexer):
-> 2944             raise ValueError("cannot reindex from a duplicate axis")
   2945 
   2946     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis

Here is how you can reproduce this issue on your machine:

See attached dummy CSV, put it in a directory /content/stonks/minute/TSLA.csv
TSLA.csv.zip

Here is the Bundle I used for it:

zipline_bundle = """
import pandas as pd

from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities

start_session = pd.Timestamp('2020-06-23', tz='utc')
end_session = pd.Timestamp('2020-06-29', tz='utc')

register(
    'custom-csvdir-bundle',
    csvdir_equities(
        ['minute'],
        '/content/stonks/',
    ),
    calendar_name='NYSE', # US equities
    start_session=start_session,
    end_session=end_session
)
"""

with open("/root/.zipline/extension.py", 'w') as f:
  f.write(zipline_bundle)

Reproduction Steps

Note that this works:

%%zipline --start 2020-06-23 --end 2020-06-25 --bundle custom-csvdir-bundle --data-frequency minute

from zipline.api import symbol, order, record

def initialize(context):
    pass

def handle_data(context, data):
    # order(symbol('TSLA'), 10)
    # record(AAPL=data[symbol('TSLA')].price)
    pass
  1. Change the top line to
    %%zipline --start 2020-06-23 --end 2020-06-26 --bundle custom-csvdir-bundle --data-frequency minute
    Now you get the above message

What steps have you taken to resolve this already?

Ensuring series is well formatted, regular, free of gaps, duplicate indexes. Error seems to be coming from the benchmark part.

Anything else?

Thank you!

Sincerely,
$ whoami

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.