Solutions for BigData Engineers: HBASE: Basic commands

In this post, I will explain some basic commands of hbase. These are very basic commands, which help in administration of hbase.

Command 1:
This command is used to login to hbase shell.

You can type "help" in the hbase shell prompt to get help of commands.

Command 2: To create a table.

Create a HBase table named "reviews" with 3 column families: summary, reviewer, and details.

Command 3: List tables present on your system.

Command 4: To describe properties of 'reviews' table.

As highlighted in the snapshot above, the table status is "enabled", Also IN_MEMORY property of this table is set to "false". This means that, the table is not given priority and caching of data is not done.

Command 5: To check whether a table is "enabled" or "disabled".

The output of first command "is_disabled" is false. This means that, the table is not disabled.

Command 6: Command to disable a table.

In the above snapshot, the first command disables the table and the second command check whether the status is disabled. The output "true" states that, the table "review" is disabled.

Command 7: Alter table property.

Alter table IN_MEMORY property to true.

Set the number of versions for the summary and reviewer column families to 2. HBase can store multiple versions of data for each column family. If your application does not require multiple versions, the VERSIONS property for each column family should be set to 1.

Command 8: Verify the property changes were captured correctly.

Now, enable the table as below:

Command 9: GET, PUT, COUNT and SCAN.

Insert some data into Hbase table. The PUT command enables to write data into a single cell of an Hbase table.

Executing the above command caused HBase to add a row with a row key of "101" to the "reviews" table and to write the value of "hat" into the "product" column of the "summary" column family.

This command dynamically created the summary:product column and that no data type was specified for this column.Imagine, if you have more data for this row. In that case, you need to issue additional PUT commands – one for each cell (i.e., each columnfamily:column) in the target row.

Operations in background:

HBase wrote your data to a Write-Ahead Log (WAL) in your distributed file system to allow for recovery from a server failure. In addition, it cached your data (in a MemStore) of a specific region managed by a specific Region Server. At some point, when the MemStore becomes full, your data will be flushed to disk and stored in files (HFiles) in your distributed file system. Each HFile contains data related to a specific column family.

Add more cells (columns and data values) to this row:

Conceptual view of the column family:

It has one row with 3 column families. The summary column family for this row contains two columns, while the other two column families for this row each have one column.Physically, data in each column family is stored together in your distributed file system (in one or more HFiles).

Now, if you see output of "get" command:

The output shows that 4 rows. This row count refers to the number of lines (rows) displayed on the screen. Since information about each cell is displayed on a separate line and there are 4 cells in row 101, the GET command reports 4 rows.

Count the number of rows in the entire table and verify that there is only 1 row:

Now, add 2 more rows to "reviews" table.

Note that review 112 lacks any detailed information (e.g., a comment), while review 133 contains a tip in its details. Note also that review 133 includes the reviewer’s location, which is not present in

the other rows. Let’s explore how HBase captures this information.

Retrieve the entire contents of the table using this SCAN command:

Note that SCAN correctly reports that the table contains 3 rows. The display contains more than 3 lines, because each line includes information for a single cell in a row. Note also that each row in your table has a different schema and that missing information is simply omitted.

Furthermore, each displayed line includes not only the value of a particular cell in the table but also its associated row key (e.g., 101), column family name (e.g., details), column name (e.g., comment), and timestamp. As you learned earlier, HBase is a key-value store. Together, these four attributes (row key, column family name, column qualifier, and timestamp) form the key.

Consider the implications of storing this key information with each cell value. Having a large number of columns with values for all rows (in other words, dense data) means that a lot of key information is repeated. Also, large row key values and long column family / column names increase the table’s storage requirements.

TIP:

Restrict the scan results to retrieve only the contents of the summary column family and the reviewer:name column for row keys starting at ‘120’ and ending at ‘150’.

Only row ‘133’ qualifies. Note that the reviewer’s location (reviewer:location) and all the review details (details:tip) were omitted from the results due to the scan parameters specified.

Deleting data:

Command 10: Delete Tina’s name from her review (row 112) and scan to see the result.

Command 11: Delete all cells associated with Tina’s review (i.e., all data for row 112) and scan see the change.

DELETE doesn’t remove data from the table immediately. Instead, it marks the data for deletion, which prevents the data from being included in any subsequent data retrieval operations. Because the underlying files that form an HBase table (HFiles) are immutable, storage for deleted data will not be recovered until an administrator initiates a major compaction operation. This operation consolidates data and reconciles deletions by removing both the deleted data and the delete indicator.

Dropping table:

Command 12: Create a sample table with 1 column family.

Command 13: Disable the table you just created. (Before you can drop a table, you must disable or deactivate it.)

Keep reading :)

Ref: https://developer.ibm.com/hadoop/docs/getting-started/tutorials/hbase-intro-lab/hbase-intro-lab-2-issuing-basic-hbase-commands/

Solutions for BigData Engineers

Pages

Labels

Tuesday, 24 November 2015

HBASE: Basic commands

No comments:

Post a Comment