Apache Impala
Driver Options
Hadoop vendor - Download and install the driver made available by the Hadoop cluster provider (Cloudera, MapR, etc.). To locate the driver please consult the vendor’s website.
Posit Professional Drivers - Workbench, RStudio Desktop Pro, Connect, or Shiny Server Pro users can download and use Posit Professional Drivers at no additional charge. These drivers include an ODBC connector for Apache Impala. Posit delivers standards-based, supported, professional ODBC drivers. Use Posit Professional Drivers when you run R or Shiny with your production systems. See the Posit Professional Drivers for more information.
Package Options
The odbc
package, in combination with a driver, provides DBI
support and an ODBC connection.
Connection Settings
There are six settings needed to make a connection:
- Driver - See the Drivers section for setup information
- Host - A network path to the database server
- Schema - The name of the schema
- UID - The user’s network ID or server local account
- PWD - The account’s password
- Port - Should be set to 21050
<- DBI::dbConnect(odbc::odbc(),
con Driver = "[your driver's name]",
Host = "[your server's path]",
Schema = "[your schema's name]",
UID = rstudioapi::askForPassword("Database user"),
PWD = rstudioapi::askForPassword("Database password"),
Port = 21050)
Known issues
Switching from Impala to Hive
If you create a table in Impala and then drop the Hive metadata, you will need to invalidate the Impala metadata.
<- dbConnect(odbc::odbc(), "Impala")
impala_con dbWriteTable(impala_con, "mtcars", mtcars)
<- dbConnect(odbc::odbc(), "Hive")
hive_con dbRemoveTable(hive_con, "mtcars")
dbReadTable(impala_con, "mtcars") # succeeds
dbExistsTable(impala_con, "mtcars") # fails
dbGetQuery(odbcCon, "INVALIDATE METADATA mtcars")
dbExistsTable(impala_con, "mtcars") # succeeds
This happens because dropping the Hive metadata does not drop the Impala metadata. More information can be found in the Cloudera documentation here.