by Pat Bremkamp
| Duplicate records accumulate in databases faster than dust in my PC's power supply. We've presented a few articles on detecting duplicates as the records are entered, but what do you do if you have an 80,000 record table that already has hundreds of duplicates? Pat gives us one solution. |
I enjoyed your articles on preventing duplicate records (see Warning the User of Duplicate Entries and Check for Duplicate Table Entries, but my problem was dealing with duplicate records that were already in my table. My table has about 80,000 records, each for a different product serial number. Over the years, duplicates had creeped into the database, but they were not identical: some records were complete and others were not, and the missing info might change from record to record. I realized the best way to delete the duplicates was to view the records and choose which ones to delete.
I found that it was much easier than I had thought. I built a form with an embedded browse that showed the fields I wanted to see, then put one button on the form called "Find Dupes":
Figure 1. This browse shows the fields I need to see to decide if 2 records are duplicates.
Figure 2. Here my script has found a potential duplicate.
Figure 3. And now, of course, I can delete one of the records.
Figure 4. Eventually the entire table has been searched for duplicates.
Thd "Find dupe" button in Figure 1 runs the following script:
dim tbl as p
dim shared sn1 as c 'first serial number
dim shared sn2 as C 'second serial number
dim shared x as n 'counter to follow progress
tbl=table.current()
tbl.index_primary_put("serialno") 'serial number index
if x>1 'for second and later pushes
tbl.fetch_find(sn1) 'start where you left off
else
x=1 'first time push
tbl.fetch_first()
end if
sn1=tbl.serialno 'get the first number
while .t.
tbl.fetch_next()
if tbl.fetch_EOF()
exit while
end if
x=x+1
statusbar.set_text("Processing record number "+str(x,6,0))
sn2=tbl.serialno 'get the next number
if sn1=sn2 'if a dupe
goto handle_dupe
else
sn1=sn2
end if
end while
ui_msg_box("COMPLETE","Entire table has been searched")
end
handle_dupe:
tbl.fetch_prev()'move the browse back one to show the two records
parentform:dupebrowse.resynch()
ui_msg_box("DUPLICATE","Duplicate serial number found")
end 'stop so record can be picked and deleted
I have used this little script successfully, and thought it might be useful for your readers.
| Pat Bremkamp writes: Currently, I am Senior Vice President for Toshiba Ceramics America in Hillsboro, Oregon. We manufacture quartz crucibles used for growing single crystal silicon for the semiconductor industry. I am basically the Plant Manager and I handle the computer hardware and software chores in my spare time (kind of like a Doctor I know). I have used A5 since A4 V2, primarily for quality data collection-measurements and defects- and customer reports. I have been playing with computers since 1965 when I learned the newfangled Basic language from Dr. Kemmeny back at Dartmouth, then the TRS-80 and the IBMPC (before the XT!). You can reach Pat at PatB@TCAOregon.com |
9/28/99
Don't forget, we need your feedback to make this site better!