1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How do I sort "SELECT INTO" queries that are built on top of each other by their...

Discussão em 'Python' iniciado por Stack, Setembro 10, 2024.

  1. Stack

    Stack Membro Participativo

    I need to migrate a dozen MS Access databases with 500+ queries to SSIS and therefore, I changed the code to TSQL, see How can I get TSQL from easy MS Access SQL with little to no handiwork?.

    The queries in these projects are built on top of each other with materialised tables that you SELECT INTO MS Access so that you can fetch it in the next "FROM" block. There have been remarks that this should not be needed since MS Access does allow queries from scattered servers. And that the easy queries that are run here should not need the materialisation for performance. Yet, even if that is true, I cannot change the projects afterwards. They are as they are, and perhaps I am not the only one running into such a setting. Perhaps there are other good reasons for this setting. There are forms that allow you to filter some column and afterwards give out the filtered tables as a download with all the subquery steps that were needed. Therefore, the setting may still make sense even if it were not for performance or scattered servers.

    How can I sort the TSQL queries one-dimensionally in an Excel file by their ancestry levels?

    I need blocks that put together the query families inside a database, and on top of that, I would like to know the sort order for the queries inside these families.

    The aim is to see at one sight how I should go on in SSIS to mirror the TSQL workflow. This is just a puzzle of dependencies, and it likely can be run on anything that has INTO and FROM in its SQL, with any tools and languages you can think of. I still flag this with MS Access, MS Excel, TSQL and Python, to narrow it down to my setting.

    From the MS Excel input file that you can get from the link above, this can be done with just three columns as I know from self-answering the question, and getting the INTO block from the TSQL is such an easy Regex that you can quickly calculate it yourself without the link above:

    • Datenbank = MS Access database
    • Into = The "INTO" table cut out with Regex (see the link above)
    • TSQL = TSQL query that was made from the MS Access SQL

    Example rows in the Excel file (first line for the column names):


    Datenbank, INTO, TSQL
    database1.accdb, tbl_INTO1, select 1 as test INTO tbl_INTO1
    database1.accdb, tbl_INTO2, select test INTO tbl_INTO2 FROM tbl_INTO1
    database1.accdb, tbl_INTO3, select test INTO tbl_INTO3 FROM tbl_INTO2
    database1.accdb, tbl_INTO4, select test INTO tbl_INTO4 FROM tbl_INTO1
    database1.accdb, tbl_INTO8, select 1 as test2 INTO tbl_INTO8
    database1.accdb, tbl_INTO9, select test2 INTO tbl_INTO9 FROM tbl_INTO8

    Clearly, tbl_INTO2 follows tbl_INTO1 and should therefore get a higher level in the tree of dependencies. But also tbl_INTO4 should be higher than tbl_INTO1. And tbl_INTO8 and tbl_INTO9 have nothing to do with the rest, they should be in their own family block.

    This is about 500+ filled TSQL cells spread across a dozen MS access databases in that Excel file. How do I sort "SELECT INTO" queries that are built on top of each other by their "INTO" / "FROM" table links?

    Continue reading...

Compartilhe esta Página