要先解釋一下架構, 地址的組成, 最低階的是文字(漢字/數字等), 往上會依行政需要而區分. 難的就是行政上分 鄉鎮市區 區里鄰 路街大道 巷弄衖號 等, 另外還會有些例外狀況如: 異體字, 有些漢字在Big5編碼未收錄,
以較正式的做法, 是找 民政單位 的 鄉鎮市區 等單位的清單, 或 中華郵政 的清單來比對.
以比較簡易的做法, 則是使用民政單位的階層來判斷.
而民政的地址階層又是另一個問題, 例如:
- 常見的地址是 xx區xx路nn巷 , 但有些地方有 xx區xx巷 , 而沒有 路/街/大道 , 而這些巷名又是漢字.
- 有些地址會有 村/里/鄰 , 有些沒有.
- 有些地址直接以 xx村nn號 標示, 但有些是 xx村xxnn(號) .
- 縣/市
- 鄉/鎮/市/區
- 村/里
- 鄰
- 路/街/大道/巷
- 巷
- 弄
- 衖
- (之)號(之)
- 樓(之)
依原問題需要, 只需要從地址中抓出路名, 所以從 路/街/大道/巷 更小單位的部份可以捨去, 但更大單位可能有:
- 高雄市xx路
- 高雄市xx區xx路
- 高雄市xx區xx里xx路
- 高雄市xx區xx里xx鄰xx路
- 找出 路/街/大道/巷 , 剔除小單位.
- 把更大單位剔除.
在 MS SQL 有幾個指令可以用: charindex 從字串找出特定字的位置, substring 從特定位置切除字串, case 篩選不同條件. 排列後就如下:
- 如果地址有 路/街/大道/巷 的字串, 就繼續下一階動作.
- 如果地址有 鄰/里/村/區/鄉/鎮/市/縣 的字串, 才進行切除.
寫出來的程式如下(少處理了巷):
create table #temp_add (Address nvarchar(500));insert into #temp_add values (N'台南市安定區保定路333號');select address, CASEWHEN charindex(N'路',address) >0 THENcasewhen charindex(N'鄰',address) >0 then ''+substring(address,charindex(N'鄰',address)+1,charindex(N'路',address)-charindex(N'鄰',address))when charindex(N'里',address) >0 then ''+substring(address,charindex(N'里',address)+1,charindex(N'路',address)-charindex(N'里',address))when charindex(N'村',address) >0 then ''+substring(address,charindex(N'村',address)+1,charindex(N'路',address)-charindex(N'村',address))when charindex(N'區',address) >0 then ''+substring(address,charindex(N'區',address)+1,charindex(N'路',address)-charindex(N'區',address))when charindex(N'鄉',address) >0 then ''+substring(address,charindex(N'鄉',address)+1,charindex(N'路',address)-charindex(N'鄉',address))when charindex(N'鎮',address) >0 then ''+substring(address,charindex(N'鎮',address)+1,charindex(N'路',address)-charindex(N'鎮',address))when charindex(N'市',address) >0 then ''+substring(address,charindex(N'市',address)+1,charindex(N'路',address)-charindex(N'市',address))when charindex(N'縣',address) >0 then ''+substring(address,charindex(N'縣',address)+1,charindex(N'路',address)-charindex(N'縣',address))endWHEN charindex(N'街',address) >0 THENcasewhen charindex(N'鄰',address) >0 then ''+substring(address,charindex(N'鄰',address)+1,charindex(N'街',address)-charindex(N'鄰',address))when charindex(N'里',address) >0 then ''+substring(address,charindex(N'里',address)+1,charindex(N'街',address)-charindex(N'里',address))when charindex(N'村',address) >0 then ''+substring(address,charindex(N'村',address)+1,charindex(N'街',address)-charindex(N'村',address))when charindex(N'區',address) >0 then ''+substring(address,charindex(N'區',address)+1,charindex(N'街',address)-charindex(N'區',address))when charindex(N'鄉',address) >0 then ''+substring(address,charindex(N'鄉',address)+1,charindex(N'街',address)-charindex(N'鄉',address))when charindex(N'鎮',address) >0 then ''+substring(address,charindex(N'鎮',address)+1,charindex(N'街',address)-charindex(N'鎮',address))when charindex(N'市',address) >0 then ''+substring(address,charindex(N'市',address)+1,charindex(N'街',address)-charindex(N'市',address))when charindex(N'縣',address) >0 then ''+substring(address,charindex(N'縣',address)+1,charindex(N'街',address)-charindex(N'縣',address))endWHEN charindex(N'大道',address) > 0 THENcasewhen charindex(N'鄰',address) >0 then ''+substring(address,charindex(N'鄰',address)+1,charindex(N'大道',address)-charindex(N'鄰',address)+1)when charindex(N'里',address) >0 then ''+substring(address,charindex(N'里',address)+1,charindex(N'大道',address)-charindex(N'里',address)+1)when charindex(N'村',address) >0 then ''+substring(address,charindex(N'村',address)+1,charindex(N'大道',address)-charindex(N'村',address)+1)when charindex(N'區',address) >0 then ''+substring(address,charindex(N'區',address)+1,charindex(N'大道',address)-charindex(N'區',address)+1)when charindex(N'鄉',address) >0 then ''+substring(address,charindex(N'鄉',address)+1,charindex(N'大道',address)-charindex(N'鄉',address)+1)when charindex(N'鎮',address) >0 then ''+substring(address,charindex(N'鎮',address)+1,charindex(N'大道',address)-charindex(N'鎮',address)+1)when charindex(N'市',address) >0 then ''+substring(address,charindex(N'市',address)+1,charindex(N'大道',address)-charindex(N'市',address)+1)when charindex(N'縣',address) >0 then ''+substring(address,charindex(N'縣',address)+1,charindex(N'大道',address)-charindex(N'縣',address)+1)endENDfrom #temp_adddrop table #temp_add