Reading a Scanned Form

by Dr. Peter Wayne

Sometimes you just have to read a file prepared by an incompatible program. The most common of these file formats is the comma-delimited format, or .csv file.

Recently I was given the task of reading a file prepared by a bubble sheet scanner. You're all familiar with those sheets: it's the kind of multiple choice sheet that you fill in with No. 2 pencils. Scanners for those sheets come with software that prepares a comma-segmented variable file, or .csv file, containing the output of each scanned line.

Patient ID
 .          
         
           
           .
           
       .    
           
           
         .  
     .      

This shows how the Patient ID is filled out: the form has a box labeled Patient ID. The user fills out, by hand, the 6-digit Patient ID number, and then blocks in the appropriate fields in the Patient ID block.

The scanner software then reads an entire sheet and outputs a comma-delimited file which contains a value for each block that is present on the form. My task was to read the output of the .csv file and automatically prepare and print a bill detailing the services provided on the sheet. Conceptually, it's simple.

A peek at the .csv file

Let's look at a section of the actual .csv file:

"012004","08/31/98","995 ","3","5"," . ","P"," ","1","1"," ","1",...,"147"," "," "

The ellispses (...) indicate that there are many more variables in the .csv file. In fact, in this bubble form there are 146 fields that can be filled in. Each field is output in the .csv file, whether or not the bubble is filled in.

We need, of course, some sort of dictionary to .csv file, and the bubble form comes with a mapping of each field. In this case, the bubble form's dictionary tells us that the first segment is the Patient ID, the second segment is the date of service, the 3rd segment is the office id, the 5th segment is the doctor id, and subsequent segments continue to define the diagnoses and services that were provided.

First step - open the .csv file

The first task is to locate and open the .csv file. Our script to read the file will begin like this:

dim shared line as c
dim pat_id as c
dim datec as c
dim providID as c
dim doctor as c
dim APcode as c
dim StdDiagC as c

filename=ui_get_file("Open bubble file","","omsdsc.csv","X")
if filename="" then
	end 
end if

fp=file.open(filename,file_ro_exclusive)

' open the tables needed for data entry
patient=table.open("patient",file_ro_shared)
patient.index_primary_put("id")
billing=table.get("billing")
provider=table.open("provider",file_ro_shared)
provider.index_primary_put("id")
codes=table.open("codes")
codes.index_primary_put("id")
proced=table.open("proced")


line=fp.read_line()
fp.close()

' *****************************************
' break 'line' into its segments and then
' make entries into the appropriate tables
' This has to be coded!
' finally, close all tables
' ***************************************** 
patient.close()
provider.close()
codes.close()
proced.close()
end

Script 1. Open the .csv file and read a line

Time for a new function

Wouldn't it be convenient if Alpha Five provided a function to read the individual segment from the .csv file? Well, it may be too much to ask Alpha's design team to come up with a tool for every kind of file out there, but fortunately Xbasic allows us to create our own functions. We'll create a segment(n) function that will obtain the nth comma and quote-delimited segment from the line read from the file. The function will draw on Alpha Five's built-in functions, occurs and at:

function segment as c(piece as n)
dim shared line as c 
dim comma as c
dim maxseg as n
comma=","
maxseg=146
if piece=maxseg then
	segment=substr(line,at(comma,line,piece-1)+2,len(line)-at(comma,line,piece-1)-2) 
else
	segment=substr(line,at(comma,line,piece-1)+2,at(comma,line,piece)-at(comma,line,piece-1)-3)
end if
end function

Script 2. The segment function.

In this example I created a maxseg variable and set it to 146. That makes it easy for me to change the function if the bubble form is later redesigned for more or fewer fields.

I can then put the function into the script and use it to obtain each segment of the .csv file in order:

dim pat_id as c
dim datec as c
dim providID as c
dim doctor as c
dim APcode as c
dim StdDiagC as c
dim NSDiag as c
dim diag as c
dim filename as c
dim fp as p
dim shared line as c
dim billing as p
dim audit as p
dim patient as p
dim provider as p
dim proced as p
dim rec as n
dim provID as c

function segment as c(piece as n)
dim shared line as c 
dim comma as c
dim maxpiece as n
comma=","
maxseg=146
if piece=maxseg then
	segment=substr(line,at(comma,line,piece-1)+2,len(line)-at(comma,line,piece-1)-2) 
else
	segment=substr(line,at(comma,line,piece-1)+2,at(comma,line,piece)-at(comma,line,piece-1)-3)
end if
end function

filename=ui_get_file("Open bubble file","","omsdsc.csv","X")
if filename="" then
	end 
end if

fp=file.open(filename,file_ro_exclusive)


patient=table.open("patient",file_ro_shared)
patient.index_primary_put("id")
billing=table.get("billing")

provider=table.open("provider",file_ro_shared)
provider.index_primary_put("id")
codes=table.open("codes")
codes.index_primary_put("id")
proced=table.open("proced")
audit=table.open("procedure audit")

for i=1 to 60
	proc_code[i]=""
	proc_teeth[i]="" 
next 
line=fp.read_line()
pat_id=segment(1)
datec=segment(2)
providID=segment(3)
doctor=segment(4)
StdDiagC=segment(5)
NSDiag=segment(6)
APCode=segment(7)	
Walk=segment(8)
Ref=segment(9)
ADA=segment(10)
HCFA=segment(11)
ToPost=segment(12)
j=0
for i=1 to 36
	code=segment(11+2*i)
	teeth=segment(12+2*i)
next
' **********************
' more fields read in
' but script shortened for readability
' ***********************

rec=patient.fetch_find(pat_id)
if rec<0 then
	' error - this patient not found
	ui_msg_box("Error in patient id","Cannot find patient matching id of "+pat_id)
	goto abort 
end if
rec=provider.fetch_find(doctor)
if rec<0 then
	ui_msg_box("Error in doctor id","Cannot find doctor matching id of "+doctor)
	goto abort
end if
	
if StdDiagC>"" then
	select
		case StdDiagC="1"
			diag="213.00"
		case StdDiagC="2"
			diag="520.60"
		case StdDiagC="3"
			diag="522.80"
		case StdDiagC="4"
			diag="524.30"
		case StdDiagC="5"
			diag="873.63"
		case else
			ui_msg_box("Error in reading Diagnosis Code","Skipping patient "+pat_id)
			goto abort
	end select
else
	diag=NSDiag
end if

' enter a bill
bmode=billing.mode_get()
select
	case bmode=0
 		billing.enter_begin()
	case bmode=1
		parentform.commit()
		billing.enter_begin()
end select
		valid=.t.
		on error goto bad_bill
		billing.id=pat_id
		billing.first=patient.first
		billing.last=patient.last	
		billing.provider=provider.prov_name
		billing.prov_numb=provider.prov_numb
		billing.prov_id=provider.id
		billing.icd1=diag
		billing.place_serv="O"
		billing.treatment=APCode
		billing.date=ctod(DateC)
		billing.time=time()
		billing.bill_sche="1"
		billing.bill_memo="Created by bubble reader"
billing.enter_end(valid)
	
' entries into other tables come here but are deleted for readability

abort:
fp.close()
patient.close()
provider.close()
codes.close()
proced.close()
audit.close()
end

bad_bill:
billing.enter_end(.f.)
resume abort

Script 3. The final Read bubble file script.

You should observe that line was declared both in the script and in the function as a shared variable. If it is not declared as shared, then line would be local to the function, and it would not carry the value into the function that it contains in the larger script. I could have passed the value of line as a parameter to the function, but that would slow down the script unnecessarily.

Notice, too, that all the hard work is done by the segment() function. To get the value of the individual fields, all the script has to do is to call segment() with the approriate segment number, as in

pat_id=segment(1)
datec=segment(2)
providID=segment(3)
doctor=segment(4)

Including more fields then merely becomes a matter of mapping each new field to its segment number. What could be simpler?

In this article, we saw how the use of one well-designed function makes it possible for Alpha Five to do things its designers never dreamed about. CSV files are often output by banks and brokerage houses for on-line portfolio and banking, so the techniques used here are germane to a wide variety of applications.

12/13/98

Don't forget, we need your feedback to make this site better!

Return to home