I have a SQL update query that runs with my test data, but doesn't complete (2 hours or more) with my production data.
The purpose of the query
I have an ADDRESSES table that uses code strings instead of IDs. So for example ADDRESSES.COUNTRY_CODE = "USA" instead of 3152. For referential integrity, I am changing these code strings to code IDs.
Schema
ADDRESSES
- ADDR_ID (PK)
- COUNTRY_CODE (varchar)
- Address line 1 (varchar)
- etc.
COUNTRY_CODES
- CODE_ID (PK)
- CODE_STRING (varchar)
- etc.
Steps
First, I create a temporary table to store the address records with the appropriate code ID:
CREATE TABLE ADDRESS_TEMP
AS
SELECT ADDR_ID, CODE_ID
FROM ADDRESSES
LEFT JOIN
COUNTRY_CODES
ON ADDRESSES.COUNTRY_CODE = CODE_STRING
Second, I null the COUNTRY_CODE column and change its type to NUMBER.
Third I set the COUNTRY_CODE column to the code IDs:
UPDATE ADDRESSES
SET COUNTRY_CODE =
(SELECT ADDRESS_TEMP.CODE_ID
FROM ADDRESS_TEMP
WHERE ADDRESS_TEMP.ADDR_ID = ADDRESSES.ADDR_ID)
It is this third step that is taking hours to complete (2 hours and counting). The ADDRESSES table has ~356,000 records. There is no error; it is still running.
Question
Why isn't this update query completing? Is it dramatically inefficient? I think I can see how the sub-query might be an N2 algorithm, but I'm inexperienced with SQL.
via Chebli Mohamed
Aucun commentaire:
Enregistrer un commentaire