Ieškokite raktinių žodžių tekste

Searching for keywords in source text is one of the most common tasks when working with data. Let’s look at its solution in several ways using the following example:

Ieškokite raktinių žodžių tekste

Let’s suppose that you and I have a list of keywords – the names of car brands – and a large table of all kinds of spare parts, where descriptions can sometimes contain one or several such brands at once, if the spare part fits more than one brand of car. Our task is to find and display all detected keywords in neighboring cells through a given separator character (for example, a comma).

1 metodas. Power Query

Of course, first we turn our tables into dynamic (“smart”) using a keyboard shortcut "Ctrl"+T arba komandas Pagrindinis – formatuoti kaip lentelę (Pagrindinis – formatuoti kaip lentelę), give them names (for example Pašto ženklaiи Atsarginės dalys) and load one by one into the Power Query editor by selecting on the tab Duomenys – iš lentelės/diapazono (Duomenys – iš lentelės/diapazono). If you have older versions of Excel 2010-2013, where Power Query is installed as a separate add-in, then the desired button will be on the tab „Power Query“. If you have a brand new version of Excel 365, then the button Iš lentelės/diapazono called there now Su lapais (Iš lapo).

After loading each table in Power Query, we return back to Excel with the command Pagrindinis — Uždaryti ir įkelti — Uždaryti ir įkelti į… — Sukurti tik ryšį (Home — Close & Load — Close & Load to… — Only create connection).

Now let’s create a duplicate request Atsarginės dalysby right-clicking on it and selecting Duplicate request (Duplicate query), then rename the resulting copy request to Rezultatai and we will continue to work with him.

The logic of actions is the following:

  1. Skirtuke Išplėstinė Stulpelio pridėjimas pasirinkti komandą Pasirinktinis stulpelis (Pridėti stulpelį – tinkintas stulpelis) and enter the formula = Brands. Paspaudus ant OK we will get a new column, where in each cell there will be a nested table with a list of our keywords – automaker brands:

    Ieškokite raktinių žodžių tekste

  2. Use the button with double arrows in the header of the added column to expand all nested tables. At the same time, the lines with descriptions of spare parts will multiply by a multiple of the number of brands, and we will get all possible pairs-combinations of “spare part-brand”:

    Ieškokite raktinių žodžių tekste

  3. Skirtuke Išplėstinė Stulpelio pridėjimas pasirinkti komandą Sąlyginis stulpelis (Conditional column) and set a condition for checking the occurrence of a keyword (brand) in the source text (part description):

    Ieškokite raktinių žodžių tekste

  4. To make the search case insensitive, manually add the third argument in the formula bar Compare.OrdinalIgnoreCase to the occurrence check function Text.Contains (if the formula bar is not visible, then it can be enabled on the tab apžvalga):

    Ieškokite raktinių žodžių tekste

  5. We filter the resulting table, leaving only ones in the last column, i.e. matches and remove the unnecessary column Įvykiai.
  6. Grouping identical descriptions with the command Grupuoti pagal kortelė Transformacija (Transformuoti – Grupuoti pagal). As an aggregation operation, choose Visos eilutės (Visos eilutės). At the output, we get a column with tables, which contains all the details for each spare part, including the brands of automakers we need:

    Ieškokite raktinių žodžių tekste

  7. To extract grades for each part, add another calculated column on the tab Stulpelio pridėjimas – pasirinktinis stulpelis (Pridėti stulpelį – tinkintas stulpelis) and use a formula consisting of a table (they are located in our column Detalės) and the name of the extracted column:

    Ieškokite raktinių žodžių tekste

  8. We click on the button with double arrows in the header of the resulting column and select the command Extract values (Extract values)to output stamps with any delimiter character you want:

    Ieškokite raktinių žodžių tekste

  9. Removing an unnecessary column Detalės.
  10. To add to the resulting table the parts that disappeared from it, where no brands were found in the descriptions, we perform the procedure for combining the query Pasekmė with original request Atsarginės dalys mygtukas Derinti kortelė Pagrindinis (Home — Merge queries). Connection type – Outer Join Right (Right outer join):

    Ieškokite raktinių žodžių tekste

  11. All that remains is to remove the extra columns and rename-move the remaining ones – and our task is solved:

    Ieškokite raktinių žodžių tekste

2 metodas. Formulės

If you have a version of Excel 2016 or later, then our problem can be solved in a very compact and elegant way using the new function KOMBINUOKITE (TEXTJOIN):

Ieškokite raktinių žodžių tekste

The logic behind this formula is simple:

  • Funkcija PAIEŠKA (RASTI) searches for the occurrence of each brand in turn in the current description of the part and returns either the serial number of the symbol, starting from which the brand was found, or the error #VALUE! if the brand is not in the description.
  • Then using the function IF (JEI) и EOSHIBKA (ISERROR) we replace the errors with an empty text string “”, and the ordinal numbers of the characters with the brand names themselves.
  • The resulting array of empty cells and found brands is assembled into a single string through a given separator character using the function KOMBINUOKITE (TEXTJOIN).

Performance Comparison and Power Query Query Buffering for Speedup

For performance testing, let’s take a table of 100 spare parts descriptions as initial data. On it we get the following results:

  • Recalculation time by formulas (Method 2) – 9 sec. when you first copy the formula to the entire column and 2 sec. at repeated (buffering affects, probably).
  • The update time of the Power Query query (Method 1) is much worse – 110 seconds.

Of course, a lot depends on the hardware of a particular PC and the installed version of Office and updates, but the overall picture, I think, is clear.

To speed up a Power Query query, let’s buffer the lookup table Pašto ženklai, because it does not change in the process of query execution and it is not necessary to constantly recalculate it (as Power Query de facto does). For this we use the function Table.Buffer from the built-in Power Query language M.

To do this, open a query Rezultatai ir skirtuke apžvalga Paspausk mygtuką Išplėstinis redaktorius (Žiūrėti – išplėstinis redaktorius). In the window that opens, add a line with a new variable Marky 2, which will be a buffered version of our automaker directory, and use this new variable later in the following query command:

Ieškokite raktinių žodžių tekste

After such refinement, the update speed of our request increases by almost 7 times – up to 15 seconds. Quite a different thing 🙂

  • Neaiškia teksto paieška Power Query
  • Masinis teksto keitimas formulėmis
  • Masinis teksto pakeitimas Power Query naudojant funkciją List.Accumulate

Palikti atsakymą