1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] pandas.read_sas() fails when bad timestamps exist

Discussão em 'Python' iniciado por Stack, Outubro 3, 2024 às 19:02.

  1. Stack

    Stack Membro Participativo

    I have a file with some bad timestamps and the read_sas method in pandas fails. There seems to be no recourse. The file is read fine in R with haven package, and the bad timestamps are identifiable.

    df = pd.read_sas('my_sas_file.sas7bdat')

    Traceback (most recent call last):
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/io/sas/sas7bdat.py", line 83, in _convert_datetimes
    return pd.to_datetime(sas_datetimes, unit=unit, origin="1960-01-01")
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 1068, in to_datetime
    values = convert_listlike(arg._values, format)
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 393, in _convert_listlike_datetimes
    return _to_datetime_with_unit(arg, unit, name, tz, errors)
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/tools/datetimes.py", line 557, in _to_datetime_with_unit
    arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)
    File "pandas/_libs/tslib.pyx", line 312, in pandas._libs.tslib.array_with_unit_to_datetime
    pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: cannot convert input with unit 's'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/io/sas/sasreader.py", line 175, in read_sas
    return reader.read()
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/io/sas/sas7bdat.py", line 742, in read
    rslt = self._chunk_to_dataframe()
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/io/sas/sas7bdat.py", line 792, in _chunk_to_dataframe
    rslt[name] = _convert_datetimes(rslt[name], "s")
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/io/sas/sas7bdat.py", line 85, in _convert_datetimes
    s_series = sas_datetimes.apply(_parse_datetime, unit=unit)
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/series.py", line 4771, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/apply.py", line 1123, in apply
    return self.apply_standard()
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/apply.py", line 1174, in apply_standard
    mapped = lib.map_infer(
    File "pandas/_libs/lib.pyx", line 2924, in pandas._libs.lib.map_infer
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/core/apply.py", line 142, in f
    return func(x, *args, **kwargs)
    File "/opt/anaconda3/lib/python3.10/site-packages/pandas/io/sas/sas7bdat.py", line 55, in _parse_datetime
    return datetime(1960, 1, 1) + timedelta(seconds=sas_datetime)
    OverflowError: days=-1176508800; must have magnitude <= 999999999


    In R, things work just fine....:

    > library(haven)
    > df <- read_sas('my_sas_file.sas7bdat')
    > summary(df$DtObgnOrig)
    Min. 1st Qu.
    "-3219212-04-24 00:00:00.0000" "2004-01-27 00:00:00.0000"
    Median Mean
    "2008-08-11 00:00:00.0000" "2009-03-08 10:15:16.7828"
    3rd Qu. Max.
    "2014-12-09 00:00:00.0000" "2027-06-12 00:00:00.0000"
    NA's
    "93215826"


    Anyone have any magic tricks to make the read_sas work, but null out the bad timestamps somehow?

    Continue reading...

Compartilhe esta Página