Finding and Identifying Cross Linked Course Content

Posted Jul 2, 2025 Updated Jul 7, 2025

By Terry Patterson

12 min read

Introduction

Course deletions have always been a difficultly in Blackboard. If you were self-hosted and didn’t have enough database or application server power, your system could go down with the simple deletion of some courses. (I know, I did it.). Then even in managed hosting or SaaS environments, deleting thousands of courses could delay course copies or imports for hours while the task workers processed these requests. So how can you delete courses without having either impact? This is why I wrote this script.

Note: This is a living document, some information maybe incorrect or incomplete. Please contact me if you have any updates and/or corrections.

So this was built to delete courses in small batches so it wouldn’t overtax the environment nor block the processing of other tasks such as course copies or imports. It also removed me from having to babysit the process so I could enjoy weekends and/or vacations.

Running the Query

To find cross-linked content within courses, we use the following query. Note that a major part of the query is commented out. This query is too big to run as a complete process on our database. So we are going to break each part out into sections.


select 
	content.course_pk1
	, content.course_id
	, case 
		when position('internal/' in xyf_files.full_path) > 0 then '/'||split_part(xyf_files.full_path, '/', 2)||'/'||split_part(xyf_files.full_path, '/', 3)||'/'||split_part(xyf_files.full_path, '/', 4)||'/' 
		when position('courses/' in xyf_files.full_path) > 0 or position('orgs/' in xyf_files.full_path) > 0 then '/'||split_part(xyf_files.full_path, '/', 2)||'/'||split_part(xyf_files.full_path, '/', 3)||'/' 
		else '/'||split_part(xyf_files.full_path, '/', 2)||'/' 
		end as course_home
	, count(*) as embed_count
	, content.cnthndlr_handle as embed_type
from (
	--select course_main.pk1 as course_pk1, course_contents.pk1 as content_pk1, 'Attached' as cnthndlr_handle, course_contents.title as content_title, course_main.course_id, files.file_name as xid
	--	from course_contents_files
	--	left outer join files on course_contents_files.files_pk1 = files.pk1
	--	left outer join course_contents on course_contents_files.course_contents_pk1 = course_contents.pk1
	--	left outer join course_main on course_contents.crsmain_pk1 = course_main.pk1 
	--union
	--	select course_main.pk1 as course_pk1, course_contents.pk1 as content_pk1, 'Embedded Content' as cnthndlr_handle, course_contents.title as content_title, course_main.course_id, array_to_string(regexp_matches(course_contents.main_data, 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid
	--	from course_contents
	--	left outer join course_main on course_contents.crsmain_pk1 = course_main.pk1
	--	where course_contents.main_data like '%bbcswebdav%'
	--union
	--	select course_main.pk1 as course_pk1, announcements.pk1 as content_pk1, 'Announcement' as cnthndlr_handle, announcements.subject as content_title, course_main.course_id, array_to_string(regexp_matches(announcements.announcement, 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid
	--	from announcements
	--	left outer join course_main on announcements.crsmain_pk1 = course_main.pk1
	--	where announcements.announcement like '%bbcswebdav%'
	--union
	--	select course_main.pk1 as course_pk1, qti_asi_data.pk1 as content_pk1, 'Assessment' as cnthndlr_handle, qti_asi_data.title as content_title, course_main.course_id, array_to_string(regexp_matches(encode(qti_asi_data.data::bytea, 'escape'), 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid 
	--	from qti_asi_data 
	--	left outer join course_main on qti_asi_data.crsmain_pk1 = course_main.pk1  
	--	where encode(data::bytea, 'escape') like '%bbcswebdav%' 
	--      and course_main.data_src_pk1 IN (PK1) 
	--union
	--	select course_main.pk1 as course_pk1, forum_main.pk1 as content_pk1, 'Discussion Forum' as cnthndlr_handle, forum_main.name as content_title, course_main.course_id, array_to_string(regexp_matches(forum_main.description, 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid
	--	from forum_main
	--	left outer join conference_main on forum_main.confmain_pk1 = conference_main.pk1 
	--	left outer join course_main on conference_main.crsmain_pk1 = course_main.pk1 
	--	where forum_main.description like '%bbcswebdav%'
	--union
	--	select course_main.pk1 as course_pk1, msg_main.pk1 as content_pk1, 'Discussion Post' as cnthndlr_handle, msg_main.subject as content_title, course_main.course_id, array_to_string(regexp_matches(msg_main.msg_text, 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid
	--	from msg_main
	--	left outer join forum_main on msg_main.forummain_pk1 = forum_main.pk1 
	--	left outer join conference_main on forum_main.confmain_pk1 = conference_main.pk1 
	--	left outer join course_main on conference_main.crsmain_pk1 = course_main.pk1 
	--	where msg_main.msg_text like '%bbcswebdav%'
	--union
	--	select course_main.pk1 as course_pk1, blogs.pk1 as content_pk1, 'Blog or Journal' as cnthndlr_handle, blogs.title as content_title, course_main.course_id, array_to_string(regexp_matches(blogs.description, 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid
	--	from blogs
	--	left outer join course_main on blogs.crsmain_pk1 = course_main.pk1 
	--	where blogs.description like '%bbcswebdav%'
	--union
	--	select course_main.pk1 as course_pk1, blog_entry.pk1 as content_pk1, 'Blog Entry' as cnthndlr_handle, blog_entry.title as content_title, course_main.course_id, array_to_string(regexp_matches(blog_entry.description, 'bbcswebdav([^\s"''<\)]+)','g'), '') as xid
	--	from blog_entry 
	--	left outer join blogs on blog_entry.blog_pk1 = blogs.pk1 
	--	left outer join course_main on blogs.crsmain_pk1 = course_main.pk1 
	--	where blog_entry.description like '%bbcswebdav%'
	--union
	--	select course_main.pk1 as course_pk1, '0' as content_pk1, 'Banner' as cnthndlr_handle, 'Banner' as content_title, course_main.course_id, case when substr(course_main.banner_url,1,1) = '/' then course_main.banner_url else '0' end as xid
	--	from course_main 
	--	where course_main.banner_url is not null
) as content
left outer join (
	select *
	from dblink('dbname=DATABASE_cms_doc host=localhost sslmode=require user=USERNAME password=PASSWORD', 'select xyf_files.entry_id, xyf_urls.full_path from xyf_files left outer join xyf_urls on xyf_files.file_id = xyf_urls.file_id')
	as xyf_files(entry_id int, full_path text)
) as xyf_files on content.xid = '/xid-'||cast(xyf_files.entry_id as text)||'_1'
group by content.course_pk1, content.course_id, case when position('internal/' in xyf_files.full_path) > 0 then '/'||split_part(xyf_files.full_path, '/', 2)||'/'||split_part(xyf_files.full_path, '/', 3)||'/'||split_part(xyf_files.full_path, '/', 4)||'/' when position('courses/' in xyf_files.full_path) > 0 or position('orgs/' in xyf_files.full_path) > 0 then '/'||split_part(xyf_files.full_path, '/', 2)||'/'||split_part(xyf_files.full_path, '/', 3)||'/' else '/'||split_part(xyf_files.full_path, '/', 2)||'/' end, content.cnthndlr_handle

Each run, we want to remove the – from in front of a query within the from ( ). In this example we want to only run the query that will help us find cross-linked content that is attached within courses. So we convert this section:


	--select course_main.pk1 as course_pk1, course_contents.pk1 as content_pk1, 'Attached' as cnthndlr_handle, course_contents.title as content_title, course_main.course_id, files.file_name as xid
	--	from course_contents_files
	--	left outer join files on course_contents_files.files_pk1 = files.pk1
	--	left outer join course_contents on course_contents_files.course_contents_pk1 = course_contents.pk1
	--	left outer join course_main on course_contents.crsmain_pk1 = course_main.pk1

to this


	select course_main.pk1 as course_pk1, course_contents.pk1 as content_pk1, 'Attached' as cnthndlr_handle, course_contents.title as content_title, course_main.course_id, files.file_name as xid
		from course_contents_files
		left outer join files on course_contents_files.files_pk1 = files.pk1
		left outer join course_contents on course_contents_files.course_contents_pk1 = course_contents.pk1
		left outer join course_main on course_contents.crsmain_pk1 = course_main.pk1

We only comment that section out of the query. Now run the query. Once complete, you should only get the values associated with the attached files in the course. will appear like this.

Course PK1	Course ID	Course Home Directory	Count of Items	Type of Content
937640	8251-000000-EMSP-2260-001	/courses/8251-000000-EMSP-2260-001/	11	Attached
912160	4567-000000-DFTG-1433-DLS-16wk	/courses/4567-000000-DFTG-1433-DLS-16wk/	7	Attached
879799	1237-000000-UXUI-1370-004	/courses/1237-000000-UXUI-1370-004/	15	Attached
933083	123R-000000-GERM-2311-001	/courses/123R-000000-GERM-2311-001/	8	Attached
920993	593F-000000-CHEM-2123-002	/courses/593F-000000-CHEM-2123-002/	9	Attached
888005	964S-000000-ITSC-2337-003	/courses/964S-000000-ITSC-2337-003/	37	Attached

So with these results, how can you tell what content is cross-linked. That’s a good question. If the Course ID does not match the Course Home Directory, the content is cross-linked. We will manipulate this content later in the steps, but make sure your output comes out like this.

When you run each section, save the output into a folder (such as 2024-cross-linked-content-data) as a CSV file.

Once you complete each process you should have the following files:

announcements-results.csv
discussion-forum-data.csv
discussion-post-data.csv
blog-journal-data.csv
blog-entry-data.csv
course-banner-data.csv
attached-content-data.csv
embedded-content-data.csv
assessments-data.csv

Note that the assessments portion of the query are much more taxing on the database, so you will need to break your query into data sources pk1 values. This can be found by running the following query.


select pk1, batch_uid
from data_source

Select your pk1 values for all the data_source_keys that contain courses.

Manipulating the Data

So now we have the data, but we need to make some slight changes to make it so that we can find files in courses that have a different course home directory.

What we want to do is convert this…

Course PK1	Course ID	Course Home Directory	Count of Items	Type of Content
937640	8251-000000-EMSP-2260-001	/courses/8251-000000-EMSP-2260-001/	11	Attached
912160	4567-000000-DFTG-1433-DLS-16wk	/courses/4567-000000-DFTG-1433-DLS-16wk/	7	Attached
879799	1237-000000-UXUI-1370-004	/courses/1237-000000-UXUI-1370-004/	15	Attached
933083	123R-000000-GERM-2311-001	/courses/123R-000000-GERM-2311-001/	8	Attached
920993	593F-000000-CHEM-2123-002	/courses/593F-000000-CHEM-2123-002/	9	Attached
888005	964S-000000-ITSC-2337-003	/courses/964S-000000-ITSC-2337-003/	37	Attached

into this…

Course PK1	Course ID	Course Content Folder	Course Home Directory	Count of Items	Type of Content
937640	8251-000000-EMSP-2260-001	8251-000000-EMSP-2260-001	/courses/8251-000000-EMSP-2260-001/	11	Attached
912160	4567-000000-DFTG-1433-DLS-16wk	4567-000000-DFTG-1433-DLS-16wk	/courses/4567-000000-DFTG-1433-DLS-16wk/	7	Attached
879799	1237-000000-UXUI-1370-004	1237-000000-UXUI-1370-004	/courses/1237-000000-UXUI-1370-004/	15	Attached
933083	123R-000000-GERM-2311-001	123R-000000-GERM-2311-001	/courses/123R-000000-GERM-2311-001/	8	Attached
920993	593F-000000-CHEM-2123-002	593F-000000-CHEM-2123-002	/courses/593F-000000-CHEM-2123-002/	9	Attached
888005	964S-000000-ITSC-2337-003	964S-000000-ITSC-2337-003	/courses/964S-000000-ITSC-2337-003/	37	Attached

To do this, we need to open each CSV file in Microsoft Excel. Once open in the program. Highlight the column that contains the Course Home Directory, right click and select Copy, then right click again and select “Insert Copied Cells” which will insert the column to the left of the Course Home Directory column.

Select this entire new column as we now need to remove the forward slashes and /courses/ from the cells so we have just the name of the folder. This will make it easier to compare and determine what content is not located within the home directory. Select Find & Select from the Home tab in the Excel program. Select the Replace tab. Input /courses/ into the Find what box and leave the Replace with box empty. Click options, set Within to Sheet and Search to By Columns. Then click on Replace All and this should only change the content within the column. Now follow the same steps, only use just a forward slash.

This will present you with a clean column that you should now title course_content_folder.

Save the file as a .csv file. You may wish to append the word -clean to the end of the file.

Note that some older courses might have /internal/courses/COURSE_ID/ so make sure to search for any /internal/ values that might be in the results.

Adding the Data to a Database

Now that you have this cleaned data, you want to be able to find out what course IDs might contain or rely on cross-linked content. The easiest way to do that is to turn each of these .csv files into a table in a SQLite database. An SQLite database is a simple file based database that can just reside in your local file system. Here is how to create a SQLite datbase or open one in DBeaver.

Setting Up DBeaver for SQLite

Step 1: Download and Install DBeaver

Visit the DBeaver official website.
Download the appropriate version for your operating system.
Install DBeaver following the on-screen instructions.

Step 2: Download SQLite Driver

Open DBeaver.
Click on Database > New Database Connection.
Search for SQLite and select it.
If the SQLite driver is not already installed, click Download to install it.

Step 3: Connect to an SQLite Database

In the New Database Connection wizard: Enter the path to your SQLite database file or create a new one. Provide a name for your connection.
Click Finish to complete the connection setup.

Now that you have the database created, open the database and expand it. Underneath the database will be several folders. Right-click on the Tables folder. Select Import Data from the menu. This will allow you to create a table from the .csv files you have generated. Step through the Data Transfer Wizard.

Select Import Source as CSV. Click Next.
Find your CSV file using the Select Input Files window.
Under Importer settings, make sure that the , character appears as the Column delimiter. Click Next.
Expand the target container so you can see the source and target columns for the table. Change the target table name to be clc_[embed_type]. For example, if this is the file with announcements, the table name would be clc_announcements.
Click Proceed and your table will be generated.

Repeat these steps for all your .csv files. This will give you several tables like the ones shown below.

Generating the Queries and Data to Review

Now all of our data resides in a small and easy to query database of tables. So how do we find the courses that have cross linked content. We can simply query them. I already have a set of queries, let’s go over one of them to find how its generated.


SELECT *
FROM clc_announcement
WHERE course_id != course_content_folder
and (course_id LIKE '222F%' OR course_id LIKE '223F%' OR course_id LIKE '223S%' OR course_id LIKE '223U%' OR course_id LIKE '224F%' OR course_id LIKE '224S%' OR course_id LIKE '224U%')
and (course_content_folder LIKE '213F%' OR course_content_folder LIKE '213S%' OR course_content_folder LIKE '213U%' OR course_content_folder LIKE '214F%' OR course_content_folder LIKE '214S%' OR course_content_folder LIKE '214U%' OR course_content_folder LIKE '214F%' OR course_content_folder LIKE '214S%' OR course_content_folder LIKE '214U%' OR course_content_folder LIKE '215F%' OR course_content_folder LIKE '215S%' OR course_content_folder LIKE '215U%' OR course_content_folder LIKE '216F%' OR course_content_folder LIKE '216S%' OR course_content_folder LIKE '216U%' OR course_content_folder LIKE '217F%' OR course_content_folder LIKE '217S%' OR course_content_folder LIKE '217U%' OR course_content_folder LIKE '218F%' OR course_content_folder LIKE '218S%' OR course_content_folder LIKE '218U%' OR course_content_folder LIKE '219F%' OR course_content_folder LIKE '219S%' OR course_content_folder LIKE '219U%' OR course_content_folder LIKE '220F%' OR course_content_folder LIKE '220S%' OR course_content_folder LIKE '220U%' OR course_content_folder LIKE '221F%' OR course_content_folder LIKE '221S%' OR course_content_folder LIKE '221U%' OR course_content_folder LIKE '222S%' OR course_content_folder LIKE '222U%')

The first three lines of this query should be rather easy to review. Find me all rows in clc_announcement where the course_id value doesn’t match the course_content_folder value. This will find content that is listed in the course_id but the file resides in the course_content_folder of a different course.

The lines after that are a bit crazy. The first AND statement wants to find all the course_ids of courses that are currently active and not scheduled to remove. This means that these courses rely on the content and if any cross-linked content appears within them. It must be saved and not removed.

The next line looks for any courses that are eligible to be removed from the system. If active courses are using cross-linked content from courses scheduled for removal, then it must be retained.

These values will need to be appended to include the past year for both removal and active courses. This is as simple as adding the following

OR course_content_folder LIKE '222F%'

OR course_id LIKE '225S%'

If you are looking for a specific course, you can run the following query.

SELECT *
FROM clc_embedded_content
WHERE course_id != course_content_folder and (course_id LIKE '224F-123345-PSYC-2301-001' OR course_content_folder LIKE '224F-123345-PSYC-2301-001')

This content will allow you to query and find any cross-linked content.

Cleanup, Orphaned Content, Cross Linked

documentation course removal

This post is licensed under CC BY 4.0 by the author.