{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Content Management: Validate item metadata\n",
    "\n",
    "> * 👟 Ready To Run!\n",
    "* 🔒 Requires Administrator Privileges\n",
    "* ⌨️ Administration\n",
    "* 👤 Content Management\n",
    "\n",
    "\n",
    "Some organizations require specific background and descriptive information on data items before they'll consider it a valid data holding. This background and descriptive information is known as metadata. An item's metadata can record whatever information is important for the organization to know about that item. In addition to descriptive information, this might include information about how accurate and recent the item is, restrictions associated with using and sharing the item, and important processes in its life cycle. \n",
    "\n",
    "Each organization can define the metadata attributes necessary for the item to be considered valid. In addition, an organizaton may rely on specific [metadata standards and styles](http://enterprise.arcgis.com/en/portal/latest/use/metadata.htm#ESRI_SECTION2_9AB0CCA6A1C443C5A0AEA956D15C1E55) to help identify the information it needs to know about geospatial and relevant nonspatial resources and how to store and present that information. For more details and approaches for storing metatdata, see the [Enterprise Metadata documentation](http://enterprise.arcgis.com/en/portal/latest/use/metadata.htm#ESRI_SECTION2_9AB0CCA6A1C443C5A0AEA956D15C1E55). \n",
    "\n",
    "This notebook demonstrates one potential method to inspect items to ensure they contain certain default Item Description metadata properties an organization has deemed necessary. The notebook outputs a csv file with a value of False for each property an Item does not have, True for those it does, plus some additional item attributes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import the necessary libraries and connect to the GIS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import datetime as dt\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from arcgis.gis import GIS\n",
    "\n",
    "gis = GIS(\"home\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you add an Item to your Organization, certain metadata properties are required, including an item `title` and `tags`. The item `type` is also required, and with that type a set of `typeKeywords` are automatically added to an item. No matter how you add items to the Organization, these metadata properties are present.\n",
    "\n",
    "Let's specify an additional list of properties that our organization will require to describe items in our Organization. We'll create a list of strings to make sure items have a description, a thumbnail (other than the default), and a snippet."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the Organization's valid Metadata Profile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_profile = ['description', 'thumbnail', 'snippet']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll define a function that loops through our item profile list, and inspects the value for each profile attribute for the items each user in our Organization owns. For each thumbnail, we'll check to see whether the default thumbnail has been changed. \n",
    "\n",
    "We'll then create a list of True/False values for each item:\n",
    " * True if it has the property or has added a thumbnail\n",
    " * False if the property is missing or the item uses the default thumbnail.\n",
    " \n",
    "We'll then append the item id and url (if present) to this True/False list for later use to create an informative file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define a function to inspect the metadata of an item"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_missing_item_attrs(portal_item):\n",
    "    \"\"\"Returns a list of True/False values for specific \n",
    "    properties as well as the item id and url (if \n",
    "    applicable for each item in the portal.\n",
    "    \"\"\"\n",
    "    non_compliance = []\n",
    "    for attr in item_profile:\n",
    "        if attr == 'thumbnail':\n",
    "            if getattr(portal_item, attr) is not None:\n",
    "                if 'ago_downloaded' in getattr(portal_item, attr):\n",
    "                    non_compliance.append(False)\n",
    "                else:\n",
    "                    non_compliance.append(True)\n",
    "            else:\n",
    "                non_compliance.append(False)\n",
    "        else:\n",
    "            if getattr(portal_item, attr) == None:\n",
    "                non_compliance.append(False)\n",
    "            else:\n",
    "                non_compliance.append(True)\n",
    "    non_compliance.append(portal_item.id)\n",
    "    non_compliance.append(portal_item.url)\n",
    "    return non_compliance"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Data Structure for each item's metadata status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll use a Python `dictionary` to create a data structure so we can inspect each item. We'll create a list of users in the GIS.  While looping over the list of users, we'll examine each folder the user owns for items and call the function we defined above on each item to create a list of the status for each metadata attribute we're interested in.\n",
    "\n",
    "We'll then use the list for each item to populate a dictionary. Each key will be a unique name for each item (Since item titles in an Organization can be indentical, we'll use string indexing and concatenation to combine item attributes into a name that uniquely identifies each item). Each value will be a list with the True/False attributes regarding the metadata plus the item id and url.  \n",
    "\n",
    "In addition to this dictionary, the cell below prints information on each user, each folder the user owns, and number of items in each folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ANDREW\n",
      "--------------------------------------------------\n",
      "\tRoot Folder: andrew\n",
      "\t=========================\n",
      "\t\t- 0 items\n",
      "\n",
      "\n",
      "ANIETO\n",
      "--------------------------------------------------\n",
      "\tRoot Folder: anieto\n",
      "\t=========================\n",
      "\t\t- 2 items\n",
      "\n",
      "\n",
      "PORTALADMIN\n",
      "--------------------------------------------------\n",
      "\tRoot Folder: portaladmin\n",
      "\t=========================\n",
      "\t\t- 100 items\n",
      "\n",
      "\n",
      "RJACKSON\n",
      "--------------------------------------------------\n",
      "\tRoot Folder: rjackson\n",
      "\t=========================\n",
      "\t\t- 17 items\n",
      "\n",
      "\n",
      "SAMPLES_QA122\n",
      "--------------------------------------------------\n",
      "\tRoot Folder: samples_qa122\n",
      "\t=========================\n",
      "\t\t- 43 items\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "item_profile_status = {}\n",
    "for user in gis.users.search():\n",
    "    print(f\"{user.username.upper()}\\n{'-'*50}\")\n",
    "    print(f\"\\tRoot Folder: {user.username.lower()}\\n\\t{'='*25}\")\n",
    "    if user.items():\n",
    "        print(f\"\\t\\t- {len(user.items())} items\")\n",
    "        for item in user.items():\n",
    "            missing_item_atts = get_missing_item_attrs(item)\n",
    "            item_profile_status[item.title[:50] + '_' +\n",
    "                str(int(item.created/1000))] = missing_item_atts\n",
    "    else:\n",
    "        print(f\"\\t\\t- {len(user.items())} items\")\n",
    "    if user.folders:\n",
    "          for folder in user.folders:\n",
    "              if user.items(folder=folder):\n",
    "                  print(f\"\\t{folder['title']}\\n\\t{'='*25}\")\n",
    "                  print(f\"\\t\\t- {len(user.items(folder=folder))} items\")\n",
    "                  for item in user.items(folder=folder):\n",
    "                      missing_item_atts = get_missing_item_attrs(item)\n",
    "                      item_profile_status[item.title[:50] + '_' +\n",
    "                          str(int(item.created/1000))] = missing_item_atts\n",
    "              else:\n",
    "                  print(f\"\\t{folder['title'].capitalize()}\\n\\t{'='*25}\")\n",
    "                  print(f\"\\t\\t-0 items\")\n",
    "    print(\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Pandas Dataframe for writing out to a csv file "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first inspect the dictionary of data items:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_profile_status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll create a list based upon our original item profile list. We'll add two members to the list corresponding to the item id and url values we recorded for each item."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_item_profile = item_profile + ['itemID', 'url']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['description', 'thumbnail', 'snippet', 'itemID', 'url']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_item_profile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll create the dataframe, using the new list as the `index` for transposing the dataframe to one with each item as a row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.set_option('display.max_colwidth', 175) # for display of lengthy text values\n",
    "\n",
    "item_profile_df = pd.DataFrame(data=item_profile_status, \n",
    "                               index=new_item_profile).T\n",
    "item_profile_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write the dataframe to a `csv` file and add it as an item\n",
    "\n",
    "We'll add a timestamp to the output file to ensure uniqueness when adding the csv item to the Organization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "output_dir = \"/arcgis/home/\"\n",
    "out_file = \"org_item_profile_status_\" + \\\n",
    "            str(int(dt.datetime.now().timestamp())) + \\\n",
    "            \".csv\"\n",
    "\n",
    "item_profile_df.to_csv(os.path.join(output_dir, out_file), \n",
    "                       index_label='item_name')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gis.content.add({}, output_dir + out_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "This notebook checked attribute values for an organization's items against a pre-defined list of properties for item metadata, and based upon those values recorded the status of the metadata property. It combined these values with the `id` and `url` for any service backing the item (if applicable) and then wrote the results to a `csv` file that was added to the Organization. This file can then be analyzed to message item owners to update the metadata for items to comply with organizational requirements."
   ]
  }
 ],
 "metadata": {
  "esriNotebookRuntime": {
   "notebookRuntimeName": "ArcGIS Notebook Python 3 Standard",
   "notebookRuntimeVersion": "10.7.1"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
